This post describes configurable tolerance for horizontal Pod autoscaling,
a new alpha feature first available in Kubernetes 1.33.
What is it?
Horizontal Pod Autoscaling
is a well-known Kubernetes feature that allows your workload to
automatically resize by adding or removing replicas based on resource
utilization.
Let’s say you have a web application running in a Kubernetes cluster with 50
replicas. You configure the Horizontal Pod Autoscaler (HPA) to scale based on
CPU utilization, with a target of 75% utilization. Now, imagine that the current
CPU utilization across all replicas is 90%, which is higher than the desired
75%. The HPA will calculate the required number of replicas using the formula:
In this example:
So, the HPA will increase the number of replicas from 50 to 60 to reduce the
load on each pod. Similarly, if the CPU utilization were to drop below 75%, the
HPA would scale down the number of replicas accordingly. The Kubernetes
documentation provides a
detailed description of the scaling algorithm.
In order to avoid replicas being created or deleted whenever a small metric
fluctuation occurs, Kubernetes applies a form of hysteresis: it only changes the
number of replicas when the current and desired metric values differ by more
than 10%. In the example above, since the ratio between the current and desired
metric values is (90/75), or 20% above target, exceeding the 10% tolerance,
the scale-up action will proceed.
This default tolerance of 10% is cluster-wide; in older Kubernetes releases, it
could not be fine-tuned. It’s a suitable value for most usage, but too coarse
for large deployments, where a 10% tolerance represents tens of pods. As a
result, the community has long
asked to be able to
tune this value.
In Kubernetes v1.33, this is now possible.
How do I use it?
After enabling the HPAConfigurableTolerance
feature gate in
your Kubernetes v1.33 cluster, you can add your desired tolerance for your
HorizontalPodAutoscaler object.
Tolerances appear under the spec.behavior.scaleDown
and
spec.behavior.scaleUp
fields and can thus be different for scale up and scale
down. A typical usage would be to specify a small tolerance on scale up (to
react quickly to spikes), but higher on scale down (to avoid adding and removing
replicas too quickly in response to small metric fluctuations).
For example, an HPA with a tolerance of 5% on scale-down, and no tolerance on
scale-up, would look like the following:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: my-app
spec:
...
behavior:
scaleDown:
tolerance: 0.05
scaleUp:
tolerance: 0
I want all the details!
Get all the technical details by reading
KEP-4951
and follow issue 4951
to be notified of the feature graduation.