Summary
The user faced Out of Memory (OOM) errors with their pods, which were being killed. They tried to increase resource limits, but the settings reverted to defaults. They provided an example with Flyte, where task and workflow definitions had specified resource requests and limits, yet the pod limits still showed default values. The user's task resource configuration had default values of 500m CPU and 10Gi memory. A suggestion was made to upgrade the propeller's image due to a previous fix. The user resolved the issue by specifying both defaults and limits in the resource configuration, noting that limits might not be necessary since Flyte should default to requests=limits
, which is better for the K8s scheduler. However, Flyte was not respecting this setting and reverted to defaults, indicating a need for adjustments in the flightadmin settings. They also encountered an error stating that the requested CPU limit exceeded the current limit set in the platform configuration. The user believes better documentation is needed, as the differences between defaults
, requests
, and limits
are not well explained.
eric901201
this is how it works in my memory
eric901201
you have to make the limit of cpu in the admin's config larger than your request limit
rmalla
It simply throws this error: Details: Requested CPU limit [2] is greater than current limit set in the platform configuration [500m]. Please contact Flyte Admins to change these limits or consult the configuration
rmalla
David, I tried that, but Flyte is not resepcting the requests=limits, and is reverting to defaults. Without override, it shows flightadmin needs to adjust limits
rmalla
Thanks for your help.
rmalla
Hi Han-Ru, I believe I solved it. The issue was that in resource config, we need to specify both the defaults as well as limits, else it is ignoring. Like so: task_resources: defaults: cpu: 500m memory: 10Gi limits: cpu: 500m memory: 100Gi
eric901201
after you set your config, did you restart your propeller?
eric901201
I think you have to update your propeller's deployment to the latest
eric901201
cc <@U04H6UUE78B>, can you help him use the latest flytepropeller image? I haven't had experience with the "Hard Way".
rmalla
Han-Ru, I am using flyte-binary, and I have installed the latest version, using Helm Chart, as specified in the “Hard Way”.
rmalla
Oh. let me check. Thanks
eric901201
upgrade your propeller's image to the latest version.
eric901201
it's fixed 4 month ago I think
eric901201
did you use the latest propeller?
rmalla
Hi there, my pods are getting killed with OOM. I tried increasing the limits, but it still defaults to the presets. I am trying this toy example:
@task(
requests=Resources(
cpu="2",
mem="0.5Gi",
),
limits=Resources(
cpu="2",
mem="0.5Gi",
),
)
def foo():
print('task')
@workflow
def my_wf():
foo()
foo().with_overrides(
requests=Resources(
cpu="1",
mem="2Gi",
),
limits=Resources(
cpu="1",
mem="4Gi",
),
)```
Here is the output of limits on the pod:
```NAMESPACE POD CONTAINER MEM_REQ MEM_LIM CPU_REQ CPU_LIM
flyte flyte-backend-flyte-binary-548f5d59fc-ln6q8 flyte <none> <none> <none> <none>
flytesnacks-development azpd24c4qpc2w2jlqvhz-n0-0 azpd24c4qpc2w2jlqvhz-n0-0 512Mi 512Mi 2 2
flytesnacks-development azpd24c4qpc2w2jlqvhz-n1-0 azpd24c4qpc2w2jlqvhz-n1-0 1Gi 1Gi 1 1```
Here is the task resource config:
``` task_resources:
defaults:
cpu: 500m
memory: 10Gi```