F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

Terraform apply issues on GCP

Summary

The user is facing issues with 'terraform apply' failing due to missing APIs and is unsure about the need for a DNS server on GCP. They have set up a GKE cluster but cannot access it, using "comanyname.com" as the DNS domain. After deploying services, 9 out of 61 resources are pending installation. They executed a Flyte deployment in a new Google Cloud project and bucket, which only contains the Flyte deployment. The user is confused about the apply operation starting with cert-manager CRDs and questions the creation of the GKE cluster and NGINX controller. The terraform plan output shows 9 resources to add, but without the -out option, Terraform cannot guarantee actions for 'terraform apply'. They suggest improvements for the module handling resource creation, especially for the ingress module, which tries to create resources on a non-existent GKE cluster. They consider creating an issue to address this and mention that removing state for certain resources allowed plan and apply to work. The user is contemplating whether to remove the resources and re-run 'terraform apply' and prefers to deploy Flyte from the updated repository. Additionally, they are benchmarking various ML platforms, emphasizing the importance of deployment ease and planning to reproduce the exact deployment instructions.

Status
resolved
Tags
    Source
    #flyte-on-gcp
      r

      roman.kazinnik

      10/1/2024

      Thank you for letting me know about this option. tbh I am really thin right now. We are benchmarking many ml platforms,

      Since [the easyness of] deployment is our most important criteria, I better reproduce the exact desploy instructions.

      r

      roman.kazinnik

      9/30/2024

      When are you planning to work on this issue? tbh I 'd prefer to deploy Flyte from the updated repo

      r

      roman.kazinnik

      9/30/2024

      > so I removed state for the resources that were complaining, like terraform state rm helm_release.flyte-core so, would you recommend me to remove the resources and re-run 'terraform apply'?

      r

      roman.kazinnik

      9/30/2024

      > So if you could create an Issue to capture this problem it'd be grea Sure, can you please share the most relevant link/reference

      r

      roman.kazinnik

      9/27/2024

      Here is the output - it is very long:

      kubectl_manifest.cert-manager-issuer will be created

      • resource "kubectl_manifest" "cert-manager-issuer" {

      Plan: 9 to add, 0 to change, 0 to destroy.

      ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

      Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these actions if you run "terraform apply" now.

      r

      roman.kazinnik

      9/27/2024

      > I guess other resources were already created? I I have created a new GCP project and only installed what is required by the flyte deploy. I have run terraform apply several, it failed on missing service APIs which I installed and re-run terraform apply.

      r

      roman.kazinnik

      9/27/2024

      terraform version Terraform v1.5.7

      there are other GKE k8s clusters in our domain. My GCP project is fresh new, no other resources besides Flyte deploy were allocated.

      r

      roman.kazinnik

      9/27/2024

      Could my problem be solved?

      r

      roman.kazinnik

      9/27/2024

      I ran the flyte deploy on a newly created Google Cloud project and a new bucket

      r

      roman.kazinnik

      9/27/2024

      > so it fails reaching out the GKE cluster, but it finished creating the cluster?

      I was running 'terraform apply', waiting for ten minutes after installing the services, which showed it had 9 (out of the initial 61) resources left to install.

      r

      roman.kazinnik

      9/26/2024

      This seems a better channel for my problem.

      For the dns-domain, I use the http://comanyname.com|comanyname.com . Not sure if it needed, but I can also create DNS server on GCP.