F

Flyte enables you to build & deploy data & ML pipelines, hassle-free. The infinitely scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks. Explore and Join the Flyte Community!

EKS Cluster Shared Memory Configuration Issue

Summary

The user is facing issues with configuring shared memory in an EKS cluster (version 1.30). They have created a "Memory" volume with a 20Gi limit, but only 4.3Gi is being allocated in the container despite the node having 64GB of RAM. They are looking for advice to resolve this issue and note that SizeMemoryBackedVolumes is likely enabled by default. The user is using the flyte-core helm chart with high limits in task_resources. Ultimately, they decided to abandon the memory approach and are now using file backing instead.

Status
resolved
Tags
    Source
    #ask-the-community
      r

      rmalla

      10/3/2024

      Hi Eduardo, Yes I do have the resource limits on the pod. Eventually I abandoned the idea, and instead of medium: Memory , I am using filebacking

      r

      rmalla

      9/25/2024

      Hi Eduardo, I am running 1.30 version in EKS. and I believe SizeMemoryBackedVolumes is turned on by default. I have setup high limits in task_resources, and yes I am using flyte-core helm chart.

      r

      rmalla

      9/20/2024

      And here is the pod description, showing 20Gi for /dev/shm:

      r

      rmalla

      9/20/2024

      It shows

      r

      rmalla

      9/20/2024

      Yes,

      k

      kumare

      9/20/2024

      Did you try ‘df /dev/shm’

      r

      rmalla

      9/20/2024

      Hi Ketan, when I ssh into container, and check for size , the shm mount shows only 4 GB. On the pod description I see the correct settings .

      k

      kumare

      9/20/2024

      What do you mean you only see 4.3 allocated to container

      k

      kumare

      9/20/2024

      This is the right way

      r

      rmalla

      9/20/2024

      I am having trouble configuring shm (shared memory for the cluster). I have created a “Memory” volume, and also assigned sizeLimit of “20Gi”. The pod description accurately reflects the new tmpfs file system and size. However when I SSH into the container running on the Pod, I see only 4.3Gi allocated. The node has 64GB of RAM. Any pointers, how I can solve this?. Thank you !!