An Honest Look at Vertical Pod Autoscaling

Flightcrew is an AI tool that handles configuration tasks like managing Kubernetes resources. Vertical Pod Autoscaling (VPA) is an open source tool that engineers can use to optimize Kubernetes pod resources. 5 years after its release, we give our honest take on VPA, what it's missing for enterprise workloads, and how it interacts with Flightcrew.

Vertical Pod Autoscaling 101

Vertical Pod Autoscaling is an official Kubernetes add-on that dynamically adjusts the compute and memory assigned to a pod based on its usage patterns. In plain english, the VPA updates your workloads so that they have enough resources to stay online and efficient. Sounds useful!

VPA was introduced in Beta in 2018 and went GA in 2023 but it’s never got much respect because of three weakness:

VPA Weakness

The VPA needs to restart pods to apply recommendations: Because restarts are slow and disruptive, this can cause downtime or instability when responding to changes.
Recommendations are simple and metric logic is poorly understood. Even with tight polling windows, the VPA can seem slow and unstable when dealing with sharp corners.
Early Kubernetes node workflows were less mature: Adjusting pod capacity up and down felt risky, especially before teams gained confidence in Kubernetes autoscaling and node scaling capabilities.

In short, VPA was imperfect and seemed scary while the community was still getting a handle on Kubernetes autoscaling.

Vertical Pod Autoscaling is going Mainstream

In 2024 we talked to hundreds of engineering teams and we were surprised by the number evaluating or using vertical pod autoscaling.

What’s changed?

First, Kubernetes is everywhere and enterprises are comfortable with autoscaling.

Second, managed Kubernetes offerings have made subtle but important improvements to node workflows and how VPA interacts with them. For example, the GKE VPA:

Uses windowing and trend smoothing for more representative recommendations
Cross-references node sizes and quotas when generating recommendations
Asks GKE’s cluster autoscaler to adjust capacity based on changes

These enhancements help create a more stable experience with VPA and make it appropriate for a small, but growing class of workloads.

‘Recommender Mode’ is the hip way to VPA

Another driver of VPA’s growing popularity is the concept of “Recommender Mode.” This means setting the VPA’s UpdateMode to Off so that the VPA still generates recommendations but never applies them to Pods. No restarts. No disruption.

At Flightcrew, we still don’t think VPA’s recommendations are fully ready for production-grade automation, but we do believe they are accurate enough to serve as a kind of accountability mechanism for service owners.

By plugging VPA recommendations into an observability dashboard, Slack alerts, or even a spreadsheet, engineers can understand at a high level whether their workloads are using the resources allocated to them. The goal here isn’t to shame teams or apply recommendations blindly. Instead, it is to encourage informed discussions about resource buffers, efficiency, and load patterns.

But VPA still isn’t ready to to scale Enterprise Workloads

So here's our honest PoV on the limitations of VPA

Not suitable for all workloads: VPA struggles with stateful applications (those restarts!) and workloads that use garbage-collected runtimes (e.g., JVM) where memory usage can be spiky and less predictable.
Limited context: VPA only looks at historical metrics; it has no awareness of your application’s framework, architecture, or dependencies. It can’t predict future needs based on upcoming release cycles or known traffic surges.
Won’t prevent OOM scenarios during sudden spikes: Even with frequent metric polling and UpdateMode: Auto, VPA reacts too slowly for sudden traffic spikes. Recreating pods takes time, making it ineffective as a rapid response tool.
HPA or KEDA is still needed: You’ll likely need horizontal scaling solutions to handle redundancy, burst traffic, and cold starts.
Confusing metric logic: During periods of volatility, it can be hard to understand how the VPA is interpreting metrics and recommendations can appear unstable.
Limited customization: While you can apply or exclude recommendations for certain containers, the policy and tuning options are still basic.
Not well-integrated with IaC and policy workflows: The VPA doesn’t know if the same deployment runs in multiple clusters, nor can it factor in cost constraints or global policies.
Human oversight is unavoidable: You’ll still need someone to define, monitor, and update utilization targets. Instead of tuning pod resources which you'll tune the VPA (which will tune pod resources).
Divergence from GitOps and change management principles: Most engineering teams want all infrastructure configuration defined in code, reviewed, and tracked. The VPA’s dynamic changes may create configuration drift from what’s declared in IaC. While this drift isn’t a serious risk, at scale it will become a workflow and compliance headache.

In short, the VPA doesn't work with all workloads, won’t save you during a traffic spike, and isn’t well connected with modern IAC workflows. It’s a cool but limited tool.

So what is VPA good for ?

We recommend running VPA in Recommender Mode as an accountability mechanism. Consider VPA’s suggestions as conversation starters rather than unquestionable truths. This approach helps service owners understand their workloads’ actual resource consumption and encourages more informed discussions on resource management.
VPA can also be useful for controlling costs in stateless, non-critical workloads where rapid scaling and perfect precision aren’t as critical. Earlier this year we wrote a guide on how to match your workloads to the correct autoscaler

Our Wishlist for VPA in 2025

This blog might sound like we’re negative on VPA, but the reality is that we're fans of this quirky, specialized tool:

We believe that engineers should automate tasks like Kubernetes resource management - that’s why we’re building Flightcrew!
The Kubernetes community and Autoscaling SIG are transparent about the limitations of the VPA and it’s easy to find the correct use cases.

We've discussed broad themes but the VPA backlog shows that we're moving in the right direction.

VPA restart in place is Beta, once in GA it will open up VPA to more workloads and could even speed up the VPA enough to help with OOMing.
Include VPA in the Kubernetes distro … or package it as a helm chart. This might be silly, but we think VPA adoption is slowed by its disrespect in the repo. Some of the managed services Some managed K8s services moved VPA to the control plane and so we're curious if OSS K8s follows suit.
VPA support for Pod Level resource Specification would be a big win for customization
Clarification on Recommender Windowing Behavior would build trust in the VPA and help understand traffic spike response.

Flightcrew + VPA in 2025

We wrote this piece to give our friends and customers our honest view of VPA. With some teams we replace VPA functionality and rightsize pods like Dependabot. Other teams use Flightcrew to manage their VPAs, and make sure that they work in concert with horizontal and node autoscaling systems.

Regardless of your approach, we'd love to chat :)

An Honest Look at Vertical Pod Autoscaling

Vertical Pod Autoscaling 101

Vertical Pod Autoscaling is going Mainstream

‘Recommender Mode’ is the hip way to VPA

But VPA still isn’t ready to to scale Enterprise Workloads

So what is VPA good for ?

Our Wishlist for VPA in 2025

Flightcrew + VPA in 2025

Keep reading

Horizontal vs Vertical Scaling for Kubernetes Workloads

Managing Pod Memory and Preventing those OOMKills

Don’t miss out!