F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

Advice on sharing large dictionary in ETL

Summary

The user is seeking advice on sharing a 4MB dictionary object in an ETL pipeline, encountering a RuntimeExecutionError due to a 2MB output file size limit. They are working in a local sandbox environment and are considering using flytefile or jsonl for data handling. Currently, they return a JSONLFile after each task and read it in subsequent tasks, but they want to simplify the process to avoid frequent reading from and writing to blob storage, as they are running Flyte on a local cluster with limited compute resources. The user also emphasizes the importance of recording data for reproducibility. They mention that if the process is on one node, they could mount the disk to every task and pass it as the raw output path to eliminate the need for reading and writing.

Status
resolved
Tags
    Source
    #ask-the-community