Summary
The user is looking for a way to set a specific upload path in S3 for containertasks when processing files and returning a FlyteDirectory, instead of using a default random path. They have confirmed this is possible and provided an example code snippet, noting that the Pod must have access to the specified S3 path. The user is confused about the functionality of directories for containertasks and the capabilities of copilot regarding directory I/O. They received a suggestion to use the --raw-data-prefix
option in the pyflyte run --remote
command, which would apply to all tasks in an execution. Additionally, the user seeks clarification on creating Kubernetes deployments or if the suggestion was about PodTemplates. They have installed the helm chart, registered the workflow, and triggered it from the web UI, but found that the plugin machinery generates a random output prefix and does not see a way to change it when using raw container tasks.
blaircampbell
I am using containertasks to process some files and return a flytedirectory , has anyone figured out a way to set the upload path in s3 instead of the randomly generated path?
blaircampbell
I have installed the helm chart, then I register the workflow and trigger it from the web ui. I started looking into the pluginmachinery go code and it looks like it will always just generate a random output prefix, when adding in the init and sidecar containers,I don't see a way to change it when using raw container tasks.
habuelfutuh
Mind clarifying? How do you create k8s deployments? or you meant PodTemplates?
blaircampbell
I will give this a shot thanks ! Would this work when using a kubernetes deployment as well ?
habuelfutuh
Note that this will apply to all tasks within a given execution not just that one raw container task
habuelfutuh
<@U04F6FE2F27> Have you attempted to configure raw data prefix?
In your pyflyte run --remote
command, you can append --raw-data-prefix <s3://my-custom-bucket/my-custom-prefix/>
that should instruct the system to store in that location instead
eric901201
> directories work for containertasks? i thought copilot was not able to i/o directories yet. output is able, input is not
blaircampbell
My ContainerTask is similar to this calculate_ellipse_area_shell = ContainerTask( name="ellipse-area-metadata-shell", input_data_dir="/var/inputs", output_data_dir="/var/outputs", inputs=kwtypes(files=List[FlyteFile]), outputs=kwtypes(output=FlyteDirectory), image="http://ghcr.io/flyteorg/rawcontainers-shell:v1|ghcr.io/flyteorg/rawcontainers-shell:v1", command=[ "./calculate-ellipse-area.sh", "/var/inputs", "/var/outputs", ], ), I have multiple files as input and a flytedirectory as output, I want to control where the FlyteDirectory is uploaded in Flyte. Right now it just goes to a randomly generated path
habuelfutuh
:face_palm: My bad, I totally glossed over that... https://docs.flyte.org/en/latest/user_guide/customizing_dependencies/raw_containers.html They do not support flyte directories unfortunately at the moment. There is a PR <@UNR3C6Y4T> is reviewing to add support for Flyte Directories as inputs: https://github.com/flyteorg/flyte/pull/5715 Would you be interested in contributing a similar PR to support returning a directory as an output?
ytong
directories work for containertasks? i thought copilot was not able to i/o directories yet.
habuelfutuh
Needless to say, the Pod running your task need to have access to that s3 path to upload to.
habuelfutuh
It got it right! :slightly_smiling_face:
@task
def process_files() -> FlyteDirectory:
local_dir = "/path/to/local/dir"
remote_path = "<s3://your-bucket/specific/path/>"
return FlyteDirectory(local_dir, remote_path=remote_path)```
habuelfutuh
Actually let me ask on <#C06H1SFA19R|> :slightly_smiling_face:
habuelfutuh
You can do that yes when you construct the return FlyteDirectory. Let me pull out an example