F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

Multi-cluster Flyte setup inquiry

Summary

The user is discussing their transition to a multi-cluster Flyte setup that uses GPUs and is inquiring about two functionalities: 1) the ability of Flyte to automatically select a cluster for workflow execution based on GPU availability without manual configuration, and 2) the option to set a preferred cluster for resource allocation with a fallback to another cluster if the preferred one is unavailable. They note that manual configuration is required for this, while mentioning that Union has a more advanced routing system.

Status
open
Tags
    Source
    #ask-the-community
      d

      dpapatheodorou

      10/18/2024

      Thanks Ketan, I'll take a look.

      k

      kumare

      10/17/2024

      Sorry was on the phone earlier, but Flyte today has a routing system for workflows

      1. <https://docs.flyte.org/en/latest/deployment/configuration/generated/flyteadmin_config.html#clusterconfigs-interfaces-clusterconfig|Cluster config>
      2. You can route based on https://github.com/flyteorg/flyte/blob/6c4f8dbfc6d23a0cd7bf81480856e9ae1dfa1b27/flyteadmin/pkg/runtime/interfaces/cluster_configuration.go#L12-L52|weights Its mostly geared towards homegenity in resources
      k

      kumare

      10/17/2024

      You will have to manually configure this. Union on the other hand has a special routing system that is smarter. Cc <@UNW4VP36V> <@U049GPARFSQ>

      d

      dpapatheodorou

      10/17/2024

      Hi Flyte community, we are moving our installation to a <https://docs.flyte.org/en/latest/deployment/deployment/multicluster.html|Multi cluster setup> and utilising https://docs.flyte.org/en/latest/user_guide/productionizing/configuring_access_to_gpus.html|GPUs. In this case, not all clusters may be built equally for a variety of reasons. How would Flyte work in these instances:

      1. Is Flyte able to discern which cluster a workflow should be executed on without explicit configuration? Ie, only 1 of n clusters has GPUs enabled or available.
      2. Is it possible to configure a preference for cluster for a specific resource? Ie, there are GPUs available on multiple clusters but there is a preference for them to be scheduled on a specific cluster, but if that cluster fails or is unavailable then it can be scheduled on another cluster.