Summary
The user is working on a project using a FlyteRemote connection to analyze failed executions and relaunch them in bulk. They need to programmatically access detailed error logs for these failures, as the FlyteWorkflowExecution
objects from remote.recent_executions()
only provide truncated error messages. The user is looking for a way to access the full error logs or alternative methods to obtain or set failure reasons, while noting that the complete logs are available in the Flyte UI. They mention the possibility of using the Python kube API to pull data directly from the worker pod.
charlie
Thanks both! <@U07655DJTDM> this might be a bit tricky as we'd need to do it for each workflow (so it has the potential for triggering a lot of additional API calls) - but it's definitely a good idea if we can't get the info from the flyte details directly. <@U04H6UUE78B> this could be worth a shot - I'll have a look into what the dict looks like! Thanks!
david.espejo
<@U06RTQ8FEP4> what about iterating through the node_executions
dict?
josh210
You can pull from the worker pod directly with the python kube API
charlie
Hi :wave: I'm currently working with using a FlyteRemote connection to retrieve information about failed executions, so that we can assess failure reasons and relaunch executions in bulk. For this project, we would like to be able to programatically access some information (eg. logs) on why the execution failed.
The FlyteWorkflowExecution
objects returned by remote.recent_executions()
have a closure.error.message
property that allows us to get some information on the error logs - but this appears to be truncated to 100 characters:
> Traceback (most recent call last):
>
> File "/opt/conda/envs/orchestrator/lib/python3.11/site-pac
Is there any way to access the full error log from a failed execution using FlyteRemote? Or any alternative ways to get or set a failure reason? We can see the full error log in the Flyte UI, so I'm assuming this means it's accessible somewhere....