F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

Improving Task Failure Communication in Flyte

Summary

The user is seeking advice on effectively communicating task failures during model training while providing useful information like checkpoint locations. They want richer information to help users decide whether to retry from a checkpoint or start over. The user compares their experience with an in-house orchestrator that allowed streaming side artifacts during task execution to Flyte, which they feel lacks this capability. They are looking for a clean and type-safe way to produce output even in case of task failure. Additionally, they mention that if the parent workflow fails, the subworkflow will also fail, and they plan to involve someone with more expertise for further insights.

Status
resolved
Tags
    Source
    #ask-the-community