F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

RayCluster Integration Issues

Summary

The user is setting up integration with Ray and successfully creating a new RayCluster but faces issues connecting to an existing cluster due to the worker_node_config argument, which leads to the creation of a new cluster instead. They are contemplating submitting an issue through the Ray Python SDK, and another user offers assistance if an issue is filed. The user finds it illogical that there is no handling for an empty worker_node_config list to prevent the creation of a new cluster.

Status
resolved
Tags
    Source
    #ask-the-community
      g

      geert.pingen

      10/21/2024

      Added it to an issue https://github.com/flyteorg/flyte/issues/5877|here, thanks for the fast response <@U072ZEKG7V0> <@U04H6UUE78B>!

      g

      geert.pingen

      10/21/2024

      Morning :wave: sure thing

      d

      david.espejo

      10/18/2024

      this is odd, I don't seem to find logic to handle an empty worker_node_config list and avoiding creating a new cluster. <@U05E3N35EEL> Please report this on an issue to track it [flyte-bug}

      s

      sovietaced

      10/18/2024

      Interesting. I’m using Ray myself and will likely need to fix some issues with the plugin so I might be able to take a look at this if you file an issue.

      g

      geert.pingen

      10/18/2024

      I guess we can just submit through the Ray Python SDK directly

      g

      geert.pingen

      10/18/2024

      It looks like the worker_node_config has been required since the https://github.com/flyteorg/flytekit/blame/v1.9.1/plugins/flytekit-ray/flytekitplugins/ray/task.py#L33|initial commit? Not sure how the docs example has ever worked.

      g

      geert.pingen

      10/18/2024

      Hi :wave: I’m setting up the integration with Ray, and it seems to work nicely when creating a fresh RayCluster (using @task(task_config=RayJobConfig(worker_node_config=[WorkerNodeConfig(…)]))). I can see the cluster starting, the job getting scheduled and distributed, and completing successfully. I’m having trouble with using an existing RayCluster (in the same cluster) though. What is the correct approach for that?

      From the docs https://docs.flyte.org/en/latest/flytesnacks/examples/ray_plugin/index.html#submit-a-ray-job-to-existing-cluster|here I read that I should be able to use @task(task_config=RayJobConfig(address="&lt;RAY_CLUSTER_ADDRESS&gt;")). However when trying that it seems worker_node_config is a required argument. I tried using an empty list instead:

          container_image=...,
          task_config=RayJobConfig(
              worker_node_config=[],  # No need to create a Ray cluster but argument is required, maybe just setting to empty list helps?
              address="kuberay-cluster-head-svc.kuberay.svc.cluster.local:8265",
              runtime_env=...
          ),
      )```
      But then it tries to start a new RayCluster instead of using the existing one found at `address`:
      ```❯ k get <http://rayclusters.ray.io|rayclusters.ray.io> -A
      NAMESPACE             NAME                                         DESIRED WORKERS   AVAILABLE WORKERS   CPUS   MEMORY   GPUS   STATUS   AGE
      &lt;flyte-project-&lt;flyte-domain&gt;   ahvfr924w8k2vgvf97wp-n0-0-raycluster-crb9z                                         100m   500Mi    0      ready    2m25s
      kuberay               kuberay-cluster                              1                 1                   2      3G       0      ready    3h37m
      ...```