Summary
The user inquires about the possibility of setting an execution timeout while keeping the queuing timeout unlimited, as they are experiencing long queue times due to compute unavailability, resulting in timeouts. They note that both active deadline and execution deadline are set together when a timeout is specified. The response clarifies that this feature is not currently available but offers a workaround by using Python's signal module to implement timeout logic in their code. The user also seeks clarification on how execution times should reconcile with tasks and nodes across retries.
kumare
So across multiple retries you may want to limit the time
kumare
Maybe we want a task timeout and a deadline
kumare
How should executions time reconcile with task and node across retries
habuelfutuh
Unfortunately not. This sounds like a good feature to add though :slightly_smiling_face: if you are willing to make the change, I'll be happy to expedite it.
One thing you can do to work around the limitation is adding that logic in the user code itself, something like this:
import time
def handler(signum, frame):
raise TimeoutError("Timeout reached")
@task
def my_task():
# Set the signal handler
signal.signal(signal.SIGALRM, handler)
signal.alarm(10) # Set the timeout to 10 seconds
# Your long-running code here```
I've not tried this but looks plausible