F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

Developer Flow for Data Scientists

Summary

The user is looking for a developer flow for data scientists that simplifies the process of specifying packages in image specifications and files like requirements.txt or poetry. They want to configure a default image in client or server settings to avoid passing container_image=... for every task. The user is testing the configuration with --config but encountered issues, as the default image being used is not the one they specified. They suspect there may be a bug and plan to investigate further later. For now, they may resort to using ImageSpec to expedite production deployment.

Status
open
Tags
  • Workflow Configuration
  • requirements.txt
  • Workflow
  • flyte
  • poetry
  • Developer
  • Developer Help
  • Configuration
  • Feature Request
  • Bug Report
Source
#ask-the-community
    t

    thomas571

    10/25/2024

    Thanks, and just to clarify, this is on Flytekit Version: 1.13.9, also no rush I need to head home soon anyway. I just wanted to ask since I couldn't make any sense of this. Most likely we'll just use ImageSpec for now just so we can get something in prod sooner, and later see of this concern was even worth following up on

    k

    kumare

    10/25/2024

    I think the coding may have a bug - AFK, will try later in the day once I get a chance

    t

    thomas571

    10/25/2024

    poetry run pyflyte --config=path/to/config-sandbox.yaml run --remote workflows/example.py wf --input=... with

      # For GRPC endpoints you might want to use dns:///flyte.myexample.com
      endpoint: localhost:30080
      insecure: true
    
    # # This is not a needed configuration, only useful if you want to explore the data in sandbox. For non sandbox, please
    # # do not use this configuration, instead prefer to use aws, gcs, azure sessions. Flytekit, should use fsspec to
    # # auto select the right backend to pull data as long as the sessions are configured. For Sandbox, this is special, as
    # # minio is s3 compatible and we ship with minio in sandbox.
    storage:
      connection:
        endpoint: <http://localhost:30002>
        access-key: minio
        secret-key: miniostorage
    
    images:
      default: localhost:30000/myimage:latest```
    as the config results in the default image `<http://cr.flyte.org/flyteorg/flytekit:py3.11-1.13.9|cr.flyte.org/flyteorg/flytekit:py3.11-1.13.9>` being used
    
    t

    thomas571

    10/25/2024

    I tried passing --config but it didn't seem to work, I'll try again in case I screwed something up while testing

    k

    kumare

    10/25/2024

    It will get complicated for your users

    k

    kumare

    10/25/2024

    Hmm you can pass that config to the client and it will work

    t

    thomas571

    10/25/2024

    I might of course just be holding this wrong and fighting how flyte expects you to do stuff, but I'm kinda hoping there is some way to avoid duplicating the tracking of dependencies

    t

    thomas571

    10/25/2024

    Hi there, I'm trying to figure out a good developer flow for our data scientists where we don't need to specify packages in image spec and other pre-existing places such as requirements.txt/poetry etc. What is the recommended way of solving this? Ideally I don't want to have to pass container_image=... to every task. Callingpyflyte run/register with --image and an image we created using an appropriate docker file works just fine. But what I was hoping to be able to do was to add something like:

      default: localhost:30000/myimage:latest ```
    to the config (on the client or server) and then just have things work?