Managing Pod Memory and Preventing those OOMKills
Tim Nichols
CEO/Founder
tldr - engineers spend too much time managing the cpu and memory of their kubernetes pods. Here's a guide to make this common task easier, with some suggestions for shortcuts and alternatives.
Platform engineering pushes engineers to manage and monitor their Pod Memory
It’s 2024 and the platform engineering community has aligned on a few premises:
- Kubernetes is the standard orchestrator for an enterprise’s diverse workloads
- Central platform teams make Kubernetes accessible by building golden paths and standardizing cloud resources (node pools, load balancers, etc)
- Code owners should be able to manage their SLAs, availability and cloud costs
Does your organization agree?
If so, you and your team need to manage the resources your workloads are using on Kubernetes. Specifically, you need to actively manage the resource limits and requests of the pods used by your workloads.
- If your workload is CPU starved, you’ll hit performance and scheduling issues.
- If your pod doesn’t have enough Memory, Kubernetes will terminate the container and restart the pod (the dreaded OOMKilled).
- Spare CPU and/or spare Memory? You’re inflating your cloud bill and there’s a chance that your greedy pods are causing problems for someone else on your team.
In short, code owners need to manage pod resource consumption in order to hit their SLAs and avoid wasteful spending.
At Flightcrew we want to delegate this sort of task to AI, but if you insist on doing it manually here’s a guide for how to find and optimize how much Memory is used by your pods.
We’ll use sock-shop as an example microservice app on Kubernetes.
Step 0: Brush up on Kubectl
Kubectl is the native CLI for communicating with Kubernetes, it provides basic commands for accessing metadata, metrics, and logs so that you can check on your pods without bugging an SRE (or having an SRE bug you).
To be self sufficient, read the kubectl docs and ensure your kubectl config current-context is set to the correct Kubernetes cluster.
- Your main tools are kubectl get, kubectl describe and kubectl top commands.
- You can use '-A' if you don't know which namespace to find something. Use this to your advantage to get familiar with your cluster!
- There are several ways to format the resource that you want to get ('resource-type/resource-name', 'deployment/front-end'), and several resources types have a shorthand, which 'kubectl' will tell you how to find by doing 'kubectl get' or 'kubectl describe' without anything else following ('deploy/front-end').
- You can find resources by labels that you put on your workloads and pods: 'kubectl get pods -l name=front-end -n sock-shop'
Step 1: Find the Pods & Nodes where your code is running
Kubernetes is designed so that large enterprises can run diverse workloads with a high degree of redundancy and customization. This flexibility means it can be difficult to know where your code is running and how to maintain it.
As a refresher, you’ll need to work with the following objects
- Pods are groups of containers with shared resources. They're the smallest deployable unit in Kubernetes.
- Nodes are the physical or virtual machines where pods run.
- Workloads are higher-level abstractions to manage pods at scale. You’ll probably manage pods - with a deployment.
- Namespaces allow you to isolate objects within your cluster based on team, environment, etc
Your first step is to find the pods where your container is running, and identify any workload objects that are controlling said pods.
If you used recommended labels, then it’s it trivial to find your pods with kubectl get
❯ kubectl get pods -l name=front-end -A
NAMESPACE NAME READY STATUS RESTARTS AGE
sock-shop front-end-57f45b79fc-h5xzj 1/1 Running 0 12d
Or if you know what your pod is generally going to be named …
❯ kubectl get pod –all-namespaces | grep front-end
sock-shop front-end-57f45b79fc-h5xzj 1/1 Running 0 12d
However, if you’ve neglected labels then you’ll have to do some bruteforce checking
❯ kubectl get pod –all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
argocd argocd-application-controller-0 1/1 Running 0 12d
argocd argocd-applicationset-controller-79d54f5d64-rgtf9 1/1 Running 0 12d
argocd argocd-dex-server-7f4b99d696-9ln8s 1/1 Running 0 12d
argocd argocd-notifications-controller-cdd87f4f5-tc65x 1/1 Running 0 12d
argocd argocd-redis-74d77964b-l7w2f 1/1 Running 0 12d
…
At Flightcrew we love using the kubectl krew plugin which shows a tree based on the .metadata.ownerReferences that trace a pod back to the object that manages it
❯ kubectl tree deployment/front-end -n sock-shop
NAMESPACE NAME READY REASON AGE
sock-shop Deployment/front-end - 476d
sock-shop ├─ReplicaSet/front-end-57f45b79fc - 333d
sock-shop │ └─Pod/front-end-57f45b79fc-h5xzj True 5d16h
sock-shop ├─ReplicaSet/front-end-74c6cb7766 - 390d
sock-shop ├─ReplicaSet/front-end-7b66ff8446 - 452d
sock-shop ├─ReplicaSet/front-end-7d89d49d6b - 476d
sock-shop └─ReplicaSet/front-end-9b64b6d49 - 452d
And then to find the node
❯ kubectl get pod -n sock-shop front-end-57f45b79fc-h5xzj -ojson | jq '.spec.nodeName'
"gke-sandbox-dev-default-pool-fed3013f-awph"
So kubectl can tell you the pods and nodes where your workload is running, and whether those pods are being controlled by a deployment or other intermediate object.
Step 2: Check Pod health
Now it’s time to use kubectl to check how our pods are doing. Once more this is easy with kubectl get
❯ kubectl get pods -l name=front-end -A
NAMESPACE NAME READY STATUS RESTARTS AGE
sock-shop front-end-57f45b79fc-h5xzj 1/1 Running 0 12d
or
sh
❯ kubectl get pod -n sock-shop carts-645b945d94-qlvfp
NAME READY STATUS RESTARTS AGE
carts-645b945d94-qlvfp 1/1 Running 3044 (5m48s ago) 12d
What are you looking for?
'Running' in the status column is good. 100% Ready is also good. Restarts? Bad!
If you're having issues, the status and restarts will give you clues. In this case, the carts pod has 3044 restarts, and the most recent one was 5m48s ago.
To dig in, you can use kubectl describe:
❯ kubectl describe pod -n sock-shop carts-645b945d94-qlvfp
Name: carts-645b945d94-qlvfp
Namespace: sock-shop
Priority: 0
Service Account: default
Node: gke-sandbox-dev-default-pool-fed3013f-hk56/10.128.0.92
Start Time: Thu, 05 Sep 2024 01:40:23 -0700
Labels: name=carts
pod-template-hash=645b945d94
Annotations: <none>
Status: Running
....
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Tue, 17 Sep 2024 12:21:37 -0700
Finished: Tue, 17 Sep 2024 12:22:26 -0700
Ready: True
Restart Count: 3044
....
If you see OOMKilled, then you've probably solved your mystery. This means the oom killer process is terminating your memory starved pods. Very bad!
If you see any other error, looking in the Events at the bottom of the kubectl describe can give hints (ImagePullBackoff, etc.). You can use kubectl logs to see if an error caused by code caused the container to Error.
Step 3: Find How Much Memory each Pod is Using
You now know whether or not your pods are healthy - next let’s check usage patterns to confirm if resource utilization is causing you to crash or overpay.
The quick way to find current memory usage for pods and nodes is to use 'kubectl top pod name -n namespace' or 'kubectl top node name' to check current resource usage. Note that the kubectl top command only works for pods in State: Running. Keep reading for a few tips for managing crashing pods.
❯ kubectl top pod carts-645b945d94-qlvfp -n sock-shop
NAME CPU(cores) MEMORY(bytes)
carts-645b945d94-qlvfp 15m 15Mi
or
sh
❯ kubectl top node gke-sample-default-pool-fed3013f-12j8
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-sandbox-dev-default-pool-fed3013f-12j8 149m 15% 1529Mi 54%
However, just getting the current Memory and CPU usage isn't very helpful. What if there are spikes? Or a sudden increase in usage? You want to set resource limits and requests for longer, representative usage patterns.
Your observability tool (Datadog, Prometheus) should have resource utilization metrics, but you can check cpu and memory utilization yourself by using 'watch' in the CLI to refresh the usage periodically
For example, to check every 5 seconds:
watch -n5 "kubectl top pod carts-645b945d94-qlvfp -n sock-shop"
What are you looking for?
Well after making sure nothing is obviously wrong (ex: a memory leak), try to find the median and peak utilization for your workload. You want to build a mental model of your past resource consumption so that you can correctly estimate future allocation.
Step 4: Kubernetes Requests, Limits and Resource Contention
You now understand the usage patterns for your workload and have some sense of ‘correct’ resource allocation.
But before you change the size of your pod, it’s a courtesy (if not a requirement) to see if there’s space on the Node. If Pods are equivalent to containers, then you can think of Nodes as Hosts.
Kubernetes has many tools to help manage resource contention. For now you’ll want to make sure there’s enough space for your desired larger pod. If not, you could OOM again, and cause problems for other workloads on the node.
❯ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
gke-sandbox-dev-default-pool-fed3013f-12j8 204m 21% 1502Mi 53%
gke-sandbox-dev-default-pool-fed3013f-171i 502m 53% 1394Mi 49%
gke-sandbox-dev-default-pool-fed3013f-awph 251m 26% 1246Mi 44%
gke-sandbox-dev-default-pool-fed3013f-hk56 205m 21% 1550Mi 55%
gke-sandbox-dev-default-pool-fed3013f-o7dh 148m 15% 1208Mi 43%
gke-sandbox-dev-default-pool-fed3013f-qh11 621m 66% 928Mi 33%
gke-sandbox-dev-default-pool-fed3013f-we02 605m 64% 1197Mi 42%
gke-sandbox-dev-default-pool-fed3013f-xbuv 130m 13% 1075Mi 38%
If there's not enough space, find out whether your cluster has cluster autoscaler. If not, use your cloud provider to provision an additional node.
Expressing your needs
Kubernetes has a flexible system for managing pod resource allocation.
- Memory and CPU are your Kubernetes Resources
- Resource Requests are your Pod’s steady-state allocation of CPU and Memory
- Resource Limits are the ‘hard cap’ enforced by the kubelet & container runtime
There are various debates online about the true best practice for how and if you should assign these values. Right now your workload is getting killed, so let’s increase our Memory Request.
How much should you ask for?
In this case, our ‘carts’ workload was measuring at 15Mi before it crashed, so we can increase the Memory limit by 15Mi until it's no longer crashing. Once the pod is healthy, you can always go back to Step 3 to find true resource consumption and eliminate spare CPU and Memory.
Ask and you shall receive
Using GitOps and IaC to manage Kubernetes resources is the best practice, but if you want to manually change your configs through kubectl, you can run kubectl edit. Keep in mind from Step 1 that if your pod is managed by a different objectt, you have to edit the top-level object.
In this case, our carts pod is owned by a deployment
❯ kubectl tree deployment/carts -n sock-shop
NAMESPACE NAME READY REASON AGE
sock-shop Deployment/carts - 483d
sock-shop └─ReplicaSet/carts-645b945d94 - 483d
sock-shop └─Pod/carts-645b945d94-qlvfp False ContainersNotReady 12d
Let's first check how much resources our deployment allocated for our crashing pod.
❯ kubectl get deployment/carts -n sock-shop -oyaml | yq '.spec.template.spec.containers[0].resources' -o yaml
limits:
cpu: 3m
memory: 5Mi # Memory Limit is HERE!
requests:
cpu: 1m
memory: 2Mi
So we currently have 5Mi as our limit, but earlier we saw a maximum usage of 15Mi. Let's increase the memory limit!
So, we’ll run
to open the config in your editor of choice (set in 'KUBE_EDITOR' or 'EDITOR' environment variables)kubectl edit deployment carts -n sock-shop
…
spec:
replicas: 1
selector:
matchLabels:
name: carts
template:
metadata:
labels:
name: carts
spec:
containers:
image: weaveworksdemos/carts:0.4.8
name: carts
resources: # Edit the numbers HERE!!
limits:
cpu: 3m
memory: 20Mi # Make this higher than 15Mi, keep increasing if we need to.
requests:
cpu: 1m
memory: 2Mi
…
There you have it, your pod should have the Memory it needs to stay online.
How long does it take to update Pod Memory & CPU?
To recap, we ….
- Connected to your cluster and brushed up on kubectl docs
- Found the pods where our code is running, and checked to see if another object (ex: a deployment) was managing these pods
- Checked if our pods were starved on Memory or CPU
- Checked trends in resource utilization to estimate ‘correct’ resource capacity for our workload
- Made sure our Node has enough ‘room’ - and then updated our pod to the correct size
This ‘Pod sizing’ workflow is a routine task that should only take 1-2 few hours. That said it could take more time if:
- You’re unfamiliar with the workload, its labels and traffic patterns
- You’re unfamiliar with Pods, Nodes and Kubernetes resource usage
- You’re not used to thinking like an SRE: capacity, buffers, etc
- You’re reducing resources and so there’s risk of oomkill, CPU throttling, etc
- You’re working on a critical workload and need to be careful
When a team moves onto Kubernetes, each engineer loses 2.2 hours a week of ‘coding time’ to manual toil. Kubernetes resource management is one of these random tasks which can interrupt your day.
Shortcuts & Alternatives for Managing Pod CPU & Memory
Resource management is a ubiquitous problem on kubernetes and so there are many tools that can help you monitor and manage resource usage:
- The AWS, GCP and Azure Kubernetes dashboards offer basic insights into resource health.
- Install Kube-State-Metrics (KSM) and Prometheus for basic metrics and visibility
- Vertical Pod Autoscaling will automatically update kubernetes memory requests and limits based on usage patterns … but this can be disruptive for many types of workloads.
- You can run Vertical Pod Autoscaling in ‘Recommender’ mode, or use Goldilocks for basic resource request estimates.
- Have your platform team build and deploy autoscaling policies for each workload.
Let Flightcrew worry about Pod CPU and Memory
Flightcrew is an AI tool that helps engineers automate infrastructure and IAC.
One of our first applications is using Flightcrew to rightsize pods and nodes like dependabot.
Flightcrew continuously analyzes resource utilization of your infrastructure, generates Pull Requests to fix OOMs or waste, and warns you when a configuration change could break something important.
Send us a note to try it out :)
Tim Nichols
CEO/Founder
Tim was a Product Manager on Google Kubernetes Engine and led Machine Learning teams at Spotify before starting Flightcrew. He graduated from Stanford University and lives in New York City. Follow on Bluesky