Summary
The user is integrating a sampling Python call stack profiler with a Flyte task by creating a new PodSpec in flytekit
for a shared process namespace. They have configured the profiler to access the task's memory maps but are having difficulty identifying the correct process ID (PID) of the user task, as their method using pgrep -f pyflyte-execute
is ineffective due to multiple processes. The user is looking for advice on the appropriate process name for obtaining the PID or considering a custom PythonFunctionTask
that records the task's PID to a shared volume. Their goal is to identify hot paths in individual tasks using a sampling profiler and eventually render them as flamegraphs. They are currently using Austin to sample the task process frame stack and are exploring the possibility of attaching the profiler programmatically at the entry point, although they see limited advantages to this approach.
mhagel
Awesome! We added memray to our task wrappers internally already, in an almost equivalent fashion — only difference is we have profiling as a task arg/flag and we materialize the flame graph by reaching into the Memray internals
The sampling profiler here within a sidecar “worked,” but we currently have decided not to worry about looking for PIDs or the like and sticking with memray for now.
Pylon
Michael, FYI we're about to land https://github.com/flyteorg/flytekit/pull/2875|https://github.com/flyteorg/flytekit/pull/2875, which brings in memray as an option for profiling. This works at the task level (so we don't need to worry about figuring out PIDs). Enabling this will be as easy as adding the @memray_profiling
decorator.
Let me know if this fits your use case.