Products

Solutions

Resources

Docs Pricing

Products

Solutions

Resources

Products

Solutions

Resources

Kubernetes PDB: Why we swapped to using maxUnavailable

Mon Sep 30 2024

At Statsig, we prioritize the stability and performance of our services, which handle live traffic at scale.

In the early days, we configured a simple Pod Disruption Budget (PDB) across a majority of our service deployments. However, as we’ve scaled and refined our infrastructure, we encountered a few inefficiencies that required a more nuanced solution.

Our original PDB setup was straightforward:


kind: PodDisruptionBudget
apiVersion: policy/v1
metadata:
    name: my-live-service-pdb
    labels:
        app: my-live-service
spec:
    minAvailable: 80%
    selector:
        matchLabels:
            app: my-live-service

This worked well for many scenarios, but we quickly discovered a flaw in the way minAvailable was calculated, particularly for services with fewer than five pods.

According to the Kubernetes documentation, the minAvailable calculation always rounds up, meaning that service deployments running fewer than five pods were effectively locked from any disruptions (minAvailable == total number of pods).

While this ensures high availability, it also introduced several long-term challenges:

Resource waste: During traffic slowdowns, hosts would end up stuck running a single pod that couldn’t be removed. This resulted in wasted resources during periods of low traffic, leading to higher costs and less efficient scaling.
Blocked updates: Automated rolling updates to node pools were often blocked, slowing down our ability to deploy improvements or patches. This became especially frustrating during time-sensitive updates, such as security patches, when certain nodes were left running outdated configurations.

Finding a better solution

To resolve these issues, we revisited our approach to Pod Disruption Budgets. Instead of using minAvailable, we switched to maxUnavailable. This change addressed our problem in a simple yet effective way.

Just as minAvailable rounds up its calculations, so does maxUnavailable. By defining maxUnavailable, we were able to guarantee that at least one pod could always be disrupted, even in deployments with fewer than five pods.

This small change had a big impact, allowing us to handle disruptions in a more controlled and flexible manner while preventing resource waste and blocking issues.

Here’s what our updated PDB configuration looks like:


kind: PodDisruptionBudget
apiVersion: policy/v1
metadata:
    name: my-live-service-pdb
    labels:
        app: my-live-service
spec:
    maxUnavailable: 20%
    selector:
        matchLabels:
            app: my-live-service

Now, with this updated configuration, our service deployments can gracefully handle disruptions, even during low-traffic periods. By ensuring that at least one pod can be interrupted, we’ve mitigated the long-term challenges and improved both our resource management and deployment velocity.

For teams managing Kubernetes service deployments, especially those with variable pod counts, swapping to maxUnavailable can be a small but impactful change.

By allowing one pod to be disrupted at all times, we prevent unnecessary resource waste and ensure that rolling updates can happen without unnecessary friction.

Questions?

Questions? We've got answers. Drop us a line and we'll get you whatever information you need.

Ask the pros

Permalink: https://www.statsig.com/blog/kubernetes-pdb-maxunavailable

Products

Solutions

Resources

Products

Solutions

Resources

Docs

Pricing

Back to Blog home

Brent Echols

Kubernetes PDB: Why we swapped to using maxUnavailable

At Statsig, we prioritize the stability and performance of our services, which handle live traffic at scale.

Finding a better solution

Questions?

Recent Posts

Helping customers move faster: the story behind Statsig University

Julie Leary

Full support for Statsig Experimentation & Analytics in Microsoft Fabric

Sid Kumar, Xin Huang

Statsig is joining OpenAI

Vijaye Raji

How we created count distinct in Statsig Cloud

Aamodit Acharya

Sink, swim, or scale: What startups teach us about launching AI

Alexey Komissarouk, Yuzheng Sun, PhD

Optimizing cloud compute costs with GKE and compute classes

Pablo Beltran