Improving Task Failure Communication in Flyte

The user is seeking advice on effectively communicating task failures during model training while providing useful information like checkpoint locations. They want richer information to help users decide whether to retry from a checkpoint or start over. The user compares their experience with an in-house orchestrator that allowed streaming side artifacts during task execution to Flyte, which they feel lacks this capability. They are looking for a clean and type-safe way to produce output even in case of task failure. Additionally, they mention that if the parent workflow fails, the subworkflow will also fail, and they plan to involve someone with more expertise for further insights.

SiftZendeskTest

Improving Task Failure Communication in Flyte