F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

S3 Upload Path Configuration for ContainerTasks

Summary

The user is looking for a way to set a specific upload path in S3 for containertasks when processing files and returning a FlyteDirectory, instead of using a default random path. They have confirmed this is possible and provided an example code snippet, noting that the Pod must have access to the specified S3 path. The user is confused about the functionality of directories for containertasks and the capabilities of copilot regarding directory I/O. They received a suggestion to use the --raw-data-prefix option in the pyflyte run --remote command, which would apply to all tasks in an execution. Additionally, the user seeks clarification on creating Kubernetes deployments or if the suggestion was about PodTemplates. They have installed the helm chart, registered the workflow, and triggered it from the web UI, but found that the plugin machinery generates a random output prefix and does not see a way to change it when using raw container tasks.

Status
resolved
Tags
    Source
    #ask-the-community
      b

      blaircampbell

      9/18/2024

      I am using containertasks to process some files and return a flytedirectory , has anyone figured out a way to set the upload path in s3 instead of the randomly generated path?

      b

      blaircampbell

      9/20/2024

      I have installed the helm chart, then I register the workflow and trigger it from the web ui. I started looking into the pluginmachinery go code and it looks like it will always just generate a random output prefix, when adding in the init and sidecar containers,I don't see a way to change it when using raw container tasks.

      h

      habuelfutuh

      9/20/2024

      Mind clarifying? How do you create k8s deployments? or you meant PodTemplates?

      b

      blaircampbell

      9/20/2024

      I will give this a shot thanks ! Would this work when using a kubernetes deployment as well ?

      h

      habuelfutuh

      9/20/2024

      Note that this will apply to all tasks within a given execution not just that one raw container task

      h

      habuelfutuh

      9/20/2024

      <@U04F6FE2F27> Have you attempted to configure raw data prefix? In your pyflyte run --remote command, you can append --raw-data-prefix <s3://my-custom-bucket/my-custom-prefix/> that should instruct the system to store in that location instead

      e

      eric901201

      9/20/2024

      > directories work for containertasks? i thought copilot was not able to i/o directories yet. output is able, input is not

      b

      blaircampbell

      9/19/2024

      My ContainerTask is similar to this calculate_ellipse_area_shell = ContainerTask( name="ellipse-area-metadata-shell", input_data_dir="/var/inputs", output_data_dir="/var/outputs", inputs=kwtypes(files=List[FlyteFile]), outputs=kwtypes(output=FlyteDirectory), image="http://ghcr.io/flyteorg/rawcontainers-shell:v1|ghcr.io/flyteorg/rawcontainers-shell:v1", command=[ "./calculate-ellipse-area.sh", "/var/inputs", "/var/outputs", ], ), I have multiple files as input and a flytedirectory as output, I want to control where the FlyteDirectory is uploaded in Flyte. Right now it just goes to a randomly generated path

      h

      habuelfutuh

      9/19/2024

      :face_palm: My bad, I totally glossed over that... https://docs.flyte.org/en/latest/user_guide/customizing_dependencies/raw_containers.html They do not support flyte directories unfortunately at the moment. There is a PR <@UNR3C6Y4T> is reviewing to add support for Flyte Directories as inputs: https://github.com/flyteorg/flyte/pull/5715 Would you be interested in contributing a similar PR to support returning a directory as an output?

      y

      ytong

      9/19/2024

      directories work for containertasks? i thought copilot was not able to i/o directories yet.

      h

      habuelfutuh

      9/18/2024

      Needless to say, the Pod running your task need to have access to that s3 path to upload to.

      h

      habuelfutuh

      9/18/2024

      It got it right! :slightly_smiling_face:

      
      @task
      def process_files() -&gt; FlyteDirectory:
          local_dir = "/path/to/local/dir"
          remote_path = "<s3://your-bucket/specific/path/>"
          return FlyteDirectory(local_dir, remote_path=remote_path)```
      
      h

      habuelfutuh

      9/18/2024

      Actually let me ask on <#C06H1SFA19R|> :slightly_smiling_face:

      h

      habuelfutuh

      9/18/2024

      You can do that yes when you construct the return FlyteDirectory. Let me pull out an example