F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

Flyte Cloud-Based ML Platform Development

Summary

The user is developing a cloud-based ML production platform called Flyte and is learning Terraform for deployment. They are inexperienced with non-managed clusters and are facing issues with 'terraform apply' due to multiple API installations. The user has created a new GCP project for Flyte and is seeking suggestions for improving their code and documentation. They plan to benchmark Flyte against systems like Slurm, Ray, and Kubeflow to assess its cloud feasibility. The user inquires about the expected maintenance hours for a Flyte cluster on GCP, noting that while Flyte is generally maintenance-free, their scale has required some management time and they have encountered surprises during upgrades. They are waiting for a fix from a colleague to deploy Flyte on GCP. Once set up, the platform runs smoothly, but occasional work is needed for upgrades or user-requested features. They mention that maintaining it themselves would require 1 or 2 skilled individuals in K8s, cloud providers, and infrastructure as code.

Status
resolved
Tags
    Source
    #ask-the-community
      f

      fabio.gratz

      10/1/2024

      > We use Slurm, Ray, Kubeflow, and their deployment in cloud is easy. Can’t speak for Slurm but Ray and Kubeflow can be installed in an existing K8s cluster with a single kubectl apply or helm install because all resources that are required are internal to the cluster. Flyte is a bit more complicated than that because it’s a more elaborate but also potent system that makes use of resources outside of the k8s cluster itself like blob storage, managed database, cloud provider IAM permissions, and you’ll need to configure a load balancer and authentication. The terraform module helps with that though.

      f

      fabio.gratz

      10/1/2024

      But if you would like to maintain it yourself you’ll need 1 or 2 people who are good with K8s, cloud providers, infra as code etc.

      f

      fabio.gratz

      10/1/2024

      Once we had the platform set up, it was smooth sailing. Occasional work on upgrades or when platform users would like a feature. Also happy to help on GCP <@U07MS09EZ47> :slightly_smiling_face:

      r

      roman.kazinnik

      9/30/2024

      Thank you for getting back. Right now I am waiting for a fix for deploying Flyte on GCP from <@U04H6UUE78B>. I will give it another try once I hear back from him.

      r

      rafaelraposo

      9/30/2024

      Let me know if you have any questions <@U07MS09EZ47>. Happy to help :slightly_smiling_face:

      r

      rafaelraposo

      9/30/2024

      It's pretty much maintenance free for your everyday case but due to our scale it indeed took us some time to get there, we also have some special cases when it comes to the platform.

      Make sure you size things correctly (like database) but there's not a size fits all.

      We did had some surprises in a couple of upgrades but other than that it runs just fine.

      k

      kumare

      9/30/2024

      It’s open source there are many folks that run Flyte. Flyte In our opinion is very resilient, but usecases, scale, integrations matter

      Cc <@U03CLARPEJ0> (Spotify), <@U04664Z7H37> (recogni), <@U05R4A6N2DN> (Mercedes) may have better answers

      r

      roman.kazinnik

      9/30/2024

      I would appreciate your advice. How many hours should we expect to spend maintaining Flyte cluster installed on GCP? Perhaps you have statistics of how many hours your clients spend maintaining FLyte clusters in cloud?

      k

      kumare

      9/27/2024

      Got it

      r

      roman.kazinnik

      9/27/2024

      Out plan was to evaluate Flyte, and if it works compare to Union. Eventually the goal is to see if Union extra features are worth .

      r

      roman.kazinnik

      9/27/2024

      I need to try and eventually to recommend if our company can use Flyte in cloud. We use Slurm, Ray, Kubeflow, and their deployment in cloud is easy.

      k

      kumare

      9/27/2024

      What does that mean

      r

      roman.kazinnik

      9/27/2024

      I can deploy anything for mybenchmarking tests of Flyte.

      k

      kumare

      9/27/2024

      Hi Roman, would it be better to deploy potentially union if you have low experience in terraform? This way you could do a test pretty swiftly

      r

      roman.kazinnik

      9/26/2024

      <@U04H6UUE78B>

      'terraform apply' failed several times asking me to install Cloud Resource Manager API, Cloud SQL Admin API, Service Usage API. Cloud Resource Manager API, now it is failing with the followin , screenshot attached:

      What is the problem? I created a fresh new GCP project to try Flyte.

      r

      roman.kazinnik

      9/26/2024

      :+1:

      r

      roman.kazinnik

      9/26/2024

      Hi <@U04H6UUE78B> - I am going to create Flyte in our cloud to MVP Flyte as our new ML production platform.

      1. I just started learning terraform, I donexperience with non-managed clusters . What are the most up-to-date resources that help to