F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

Error Handling in Map Tasks

Summary

The user is experiencing a "TooLarge" error when triggering approximately 8000 map tasks, each with 4 S3 URLs. They have tried serializing data classes into a single file without success and are questioning if the propeller can handle 5000 inputs simultaneously. The user is considering reducing chunk sizes and suggests temporarily increasing the limit or further splitting the map tasks, while mentioning that offloading could be a long-term solution. They reference a feature in Union that helps with offloading to mitigate this issue and inquire about increasing the gRPC message size limit. Additionally, they note that long lists of paths in dynamic workflows are common and propose raising the limit to a higher safe value.

Status
resolved
Tags
  • Error Management
  • Error
  • Workflow Management
  • flyte
  • S3
  • Error Handling
  • Map Tasks
  • Data Processing
  • Union
  • gRPC Error
  • gRPC Limit
  • union
  • gRPC
  • User
  • Task Error
  • Support Need
  • Performance Tuning
  • Question
  • S3 URLs
  • Developer Help
  • TooLarge
  • Performance
  • Support Request
  • Bug Report
Source
#ask-the-community
    d

    david.espejo

    11/25/2024

    this is going to change -for the better- in flytekit 1.14. Coming up next week...

    m

    miha.garafolj249

    11/25/2024

    ah, that's not a bad solution either, thanks!

    r

    rupsha

    11/25/2024

    I didn't bump the limit up.. just got around the problem by serializing and writing the data classes into a file and giving the map task a partition number to read.

    m

    miha.garafolj249

    11/25/2024

    Also <@U03HQE6THNV> did this work for you?

    m

    miha.garafolj249

    11/25/2024

    Running into the same issue, thanks for the solution. I think it's quite common to have long list of paths in the dynamic workflows, would it make sense to bump the limit to a higher safe value? :slightly_smiling_face:

    d

    david.espejo

    10/24/2024

    I think it is

      server:
        grpc:
          maxMessageSizeBytes: &lt;limit-in-bytes&gt;```
    
    r

    rupsha

    10/24/2024

    <@UNR3C6Y4T> where do you propose bumping this limit up?

    k

    kumare

    10/24/2024

    <@U03HQE6THNV> in Union we have automatic offloading - so that this problem will never happen

    y

    ytong

    10/24/2024

    cc <@U0265RTUJ5B>

    y

    ytong

    10/24/2024

    longer term, offloading should help

    y

    ytong

    10/24/2024

    or split up the map task more as you say.

    y

    ytong

    10/24/2024

    the short-term fix is to bump the limit, it’s not that much over the limit.

    r

    rupsha

    10/24/2024

    Guessing it’s at the point of triggering the map tasks in which case the workaround is useless

    r

    rupsha

    10/24/2024

    I currently trigger the map tasks in 2 chunks.. 5k and 3k.. should I make smaller chunks?

    r

    rupsha

    10/24/2024

    Hi team.. I’m running into the same issue as what’s discussed https://discuss.flyte.org/t/16667891/i-m-running-into-this-error-in-an-dynamic-task-toolarge-even|here. It’s a dynamic task that is used to chunk up and trigger around 8000 map tasks… the input to the dynamic task is a list of inputs (dataclasses) for these map tasks… and each one is just 4 s3 urls..

    To get around this I serialized the data classes into a single file and read that in the dynamic task to recreate the inputs to the map tasks.. but still running into this error TooLarge: Event message exceeds maximum gRPC size limit, caused by [rpc error: code = ResourceExhausted desc = grpc: received message larger than max (6192640 vs. 4194304)]