Kubernetes helps ship software faster to users and rapidly respond to their requests. Typically, developers design a Kubernetes cluster’s capacity according to the load users are estimated to generate on it. However, if the number of user requests grows faster than you estimated, the cluster might run out of resources, leading to the service slowing down and users getting frustrated.
Manually allocating resources does not enable you to respond quickly to an application's changing needs. Kubernetes provides various autoscaling tools you can use to ensure your clusters can automatically handle the load. You can use pod-based options like the vertical pod autoscaler and the horizontal pod autoscaler or cluster-level options like the Kubernetes cluster autoscaler. Kubernetes autoscaling is an important part of cloud optimization strategies.
Autoscalers enable Kubernetes to automate the scaling process, scaling up a cluster as soon as demand increases and scaling it down to the regular size when the load decreases. Kubernetes autoscaling ensures each pod and cluster can achieve the optimal performance to serve the application’s current needs.
Kubernetes Autoscaling Methods
Kubernetes is inherently scalable. It provides a range of tools that allow applications and the infrastructure they host to grow and scale based on demand, efficiency, and other metrics.
Kubernetes has three main scalability tools: Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA), which operate at the application abstraction layer, and the cluster autoscaler, which operates at the infrastructure layer.
Horizontal Pod Autoscaler (HPA)
When application load changes over time, an application may need to add or remove pod replicas to support current loads. Horizontal Pod Autoscaler (HPA) can automatically manage this process.
For HPA-configured workloads, the HPA controller monitors the pods in the workload to see if the number of replicas in the pod needs to be changed. In most cases, the controller uses CPU utilization as a metric, takes the average metric value for each pod, and calculates whether adding or removing replicas brings the current value closer to the target value.
HPA adjustment calculations can also use custom or external metrics. Custom metrics are designed to show pod utilization other than CPU utilization, such as network traffic, memory, or values related to pod applications. External metrics can measure values that are not related to pods.
Vertical Pod Autoscaler (VPA)
VPA automatically sets container resource requests and limits based on usage. VPA aims to reduce the maintenance overhead of configuring container resource requests and limits, and to increase cluster resource utilization.
Vertical Pod Autoscaler can:
- Decrease the request value for containers whose resource usage is consistently lower than requested.
- Increase the request value for containers with a consistently high percentage of requested resources.
- Automatically set resource limits based on the demand limit percentage specified in the container template.
The cluster autoscaler increases or decreases the size of a Kubernetes cluster (by adding or removing nodes) based on the presence of pending pods and various node utilization metrics.
The cluster autoscaler cycles through two main tasks. It monitors pods that cannot be scheduled and calculates whether all currently deployed pods can be consolidated onto a smaller number of nodes.
The Autoscaler checks the cluster for any pods that cannot be scheduled on existing nodes, either because of insufficient CPU or memory resources or because the pod's node affinity rule or taint tolerance does not match existing nodes. If there are pods in the cluster that cannot be scheduled, the autoscaler checks the managed node pool to determine whether adding more nodes will unblock the pods. In this case, if you can increase the node pool size, more nodes will be added.
Troubleshooting Common Kubernetes Autoscaling Errors
Insufficient Time to Scale
A common HPA issue is the time required to add another pod to scale the workload. Loads can change quickly, and existing pods can reach 100% utilization within the time it takes to scale up, causing service degradation or failure.
For example, suppose you have a pod that can serve 100 requests with less than 70% CPU usage, and HPA is configured to scale out when this CPU threshold is reached. Assume it takes 5 seconds to start a new pod. Now, if the load rapidly increases from 80 to 120 requests within 2 or 3 seconds, a scale-up event will be triggered, but it won't happen fast enough to handle the existing loads.
- Lower the scaling threshold to allow for a margin of safety so that each pod has spare capacity to handle sudden traffic spikes. The cost is multiplied by the number of pods running your application.
- Always have a spare pod ready for sudden traffic spikes.
When scaling workloads in a cluster, Kubernetes can run into issues trying to pull container images from container registries. When an error occurs, the pod goes into the ImagePullBackOff state.
When a Kubernetes cluster creates a new deployment or updates an existing deployment and needs to pull an image, this is done by the kubelet process on each worker node. For the kubelet to pull images successfully, it must be reachable from all nodes in the cluster, matching the scheduling request.
An ImagePullBackOff error can occur if the image path is incorrect, the network is down, or the kubelet cannot authenticate with the container registry.
Common reasons and solutions:
- Pod spec uses incorrect repository name -> edit the pod spec and provide the correct registry.
- Unable to access container registry -> restore network connectivity and allow pods to retry pulling images.
- The pod does not have the correct credentials to access the image -> add a secret with the correct credentials and reference it from the pod spec.
Pending Nodes Exist, But Cluster Does Not Scale Up
Here are a few reasons cluster autoscaler might not be able to scale your cluster and what you can do about them:
- Pod specifications prevent certain pods from being evicted from a node—change pod specs or ensure you have nodes available with the required criteria.
- Node group has a minimum size—reduce the minimum size in CA configuration.
- Node has scale-down-disabled": "true" annotation—remove the annotation from the pod spec.
If Cluster Autoscale seems to have completely stopped working, follow the steps below:
- Make sure cluster autoscaler is running—you can view the latest events emitted by the kube-system/cluster-autoscaler-status ConfigMap.
- Check that cluster and node groups are okay—this should be reported by the same ConfigMap.
- Check for unprepared nodes—in cluster autoscaler version 1.24 or later, if nodes don't seem ready, check the resourceUnready count. This could mean a device driver is failing to mount a required hardware resource.
In this article, I explained the basics of Kubernetes autoscaling and three autoscaling tools you can use:
- HPA—HPA is a form of autoscaling that increases or decreases the number of pods based on CPU utilization.
- VPA—VPA automatically sets container resource requests and limits based on usage.
- Cluster autoscaler—The cluster autoscaler increases or decreases the size of a Kubernetes cluster based on the presence of pending pods and various node utilization metrics.
I hope this will be useful as you implement autoscaling in your Kubernetes clusters.