F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

Flyte workflow execution issue

Summary

The user is experiencing an issue with Flyte workflow executions where the overall status is 'SUCCESS', but some task instances remain 'Running' despite all K8 pods completing. They mention that the maximum task parallelism is lower than the number of task instances and ask if others have faced a similar tracking issue. The user references a related GitHub issue and a potential fix, noting that the configuration for min_successes or min_success_ratio for the array node map task is set to 1.0, with no failures in the run. They suspect the ArrayNode may be dropping events and are seeking access to propeller logs for further investigation. The issue is particularly prevalent with complex jobs, making it difficult to reproduce. The user has observed warnings in the propeller logs about failed task event recordings due to existing events. They confirm that the problem persists with the latest version of Flyte and have included this issue in their current sprint. After almost 10 days, the job still shows the same state, and they have full propeller logs available. They inquire about updates on the issue, to which a response indicates that a fix was merged last week and they will check on the timeline for a beta release.

Status
resolved
Tags
    Source
    #ask-the-community