Summary
The user is seeking advice on how to implement parallel processing with Flyte for a large dataset (approximately 150GB) using a generator that yields rows of data. They want to process the data in batches, specifically by yielding a million rows at a time to a Flyte task that would start a container for processing. The user is concerned that using dynamic workflows or map_task in Flyte requires all data to be loaded in memory first, which contradicts their goal of batch processing. They prefer to avoid converting their Python code to PySpark and are looking for a way to achieve incremental batch processing with Flyte without needing to restructure their existing generator logic. They are asking for suggestions or solutions to accomplish this.