Why Kubernetes Cost Optimization Keeps Failing

Cloud native apps and Kubernetes are dynamic, making it hard to contain resource costs. Yodar Shafrir of ScaleOps offers a solution in this episode of Makers.

Apr 29th, 2025 6:00am by Heather Joslyn

Featued image for: Why Kubernetes Cost Optimization Keeps Failing

Businesses always care about how much money they’re spending. But these days, with an uncertain global economy and new demands to invest in AI, they’re reining in costs more aggressively than ever.

Running Kubernetes, with clusters that scale up or down in a flash, can be a major cost center for organizations. So why is it so hard to optimize for K8s costs?

“It’s a huge challenge,” said Yodar Shafrir, co-founder and CEO of ScaleOps, in this On the Road episode of The New Stack Makers, recorded at KubeCon + CloudNativeCon Europe, in London.

With dynamic, cloud native applications, “the load always changes,” he told me. “And for every application you need to have, you have different business contexts; every application has different needs. In addition to that, the load and resource consumption of every application is really dynamic and can always change.”

Engineers who “own” the applications, he added, “need to set a static size of how many resources they should allocate to every single application, and it’s really difficult, because you need to make sure you always have enough capacity to handle every spike. While on the other end, you don’t want to spend and waste too much money on resources that are under the provision now.”

In a production cluster, potentially thousands of applications are running, Shafrir said, adding to the complexity.

At scale, optimization “just doesn’t work, and it leads to a lot of application-causing performance issues, and also cost, a lot of costs and a lot of waste,” he said. “Usually we see 80 or 70% waste on over-provisioned resources, and actual waste, and companies just fail to manage this.”

Automation to Optimize K8s Resources

A number of other factors also make Kubernetes cost optimization harder. Developers are incentivized to care about application performance, not operational costs. AI workloads gobble up computing power.

The solutions organizations use to try to optimize Kubernetes resource usage, by offering visibility to the workloads and recommendations for optimization, are often not suited to the real-time needs of K8s environments, Shafrir said.

“The problem with those solutions is that, again, the resource consumption is dynamic,” he said. “It always changes, and recommendations that were relevant in the past would not necessarily be relevant for the future. You might have some unpredictable spikes that do not match the recommendation, and then it would lead to performance issues. It can even lead to downtime.”

What does work? Full automation that can respond to that dynamic nature in real time, according to Shafrir.

ScaleOps, for instance, manages “the resource allocation at the container level in real time, in a fully automated way,” he said. “We make sure that every application gets exactly what it needs. We know to monitor the resource allocation and many other different business metrics to always make sure that the performance is at its best.”

This approach, he said, can “improve the reliability and the performance of every single application, while we eliminate all the ways that exist, all the over-provisioning, we actually shift Kubernetes from a world where all the resource allocation is static, we change it to be dynamic, so the pod level, every container would get exactly what it needs at any given time.”

Check out the full episode to learn more about the new capabilities that ScaleOps has recently put into general availability, and a glimpse of the roadmap for what’s ahead.

Heather Joslyn is the former editor-in-chief of The New Stack. She previously worked as editor-in-chief of Container Solutions, a Cloud Native consulting company, and as an editor/reporter at The Chronicle of Philanthropy and the Baltimore City Paper.