K8s Resource Management: An Autoscaling Cheat Sheet
Virtually everyone running Kubernetes infrastructure tries to use only the resources they need — nothing more and nothing less. Therefore, the resources in use must be able to scale in either direction to meet the required demands. Fortunately, Kubernetes enables us to automate our scaling processes out of the box, preventing us from manually scaling and provisioning every time the need arises. It saves time and human effort, which lowers costs as well. However, the numerous controls configuring these autoscaling functionalities can be overwhelming, even for seasoned experts. This cheat sheet aims to demystify Kubernetes autoscaling on a resource-by-resource basis. Let’s explore the types of resources that we can autoscale and work through several practice exercises.
Types of Kubernetes Resources That Accommodate Autoscaling
In the Kubernetes ecosystem, resources are things we create. All of the following resources can be autoscaled:- Pod — The smallest unit of Kubernetes workload deployed
- Node — A group of pods
- ReplicaSet — A process that runs multiple instances of a pod to maintain stable numbers of them
- Deployment — Manages ReplicaSets and declaratively manages pod instances. All replica instances share a single volume and PersisentVolumeClaim (PVC).
- StatefulSet — Used in stateful applications (when deployments are used for stateless applications). A StatefulSet ensures that each pod has its own volume and PVC.
- RAM and CPU cores — Memory and computation resources for a Kubernetes cluster
Requests and Limits
Kubernetes allows us to specify controls and checks on resource requests and limits.- Requests — Specifies resources that must be reserved for a pod container.
- Limits — Specifies the maximum allowable number of resources that a container can use.
Pod Quality of Service
Kubernetes also enables us to assign a class to a pod’s quality of service (QoS) in our preferred order of priority:- Guaranteed QoS class — Highest priority pods. The top-priority pods or all the containers in a pod can be assigned the Guaranteed QoS.
```
status:
qosClass: Guaranteed
```
- Best-Effort QoS class — Lowest priority pods. The pods with little priority can use the Best-Effort QoS. The containers in Best-Effort pods do not have CPU requests or memory limits.
```
status:
qosClass: BestEffort
```
- Burstable QoS class — Midrange priority pods, falling in between high and low priority classes, can use Burstable. One container in the pod must have a CPU or memory limit or request.
```
status:
qosClass: Burstable
```
| Guaranteed | Best Effort | Burstable |
| Highest priority pods | Lowest priority pods | Mid-priority pods |
| All containers must have CPU or memory limits and requests. | Containers have no CPU or memory limits and requests. | At least one container must have a CPU or memory limit or request. |
Scaling Objects and Definitions
- Custom Resource Definition (CRD) — Enables introducing unique objects to clusters
- HorizontalPodAutoscaler (HPA) — A CRD object for increasing the number of a pod’s replicas based on metrics such as CPU utilization
- VerticalPodAutoscaler (VPA) — A CRD object for setting the requests and limits for containers in pods
- Horizontal scaling — Increases the number of pods to handle the increase in workload. The HPA manages this.
- Vertical scaling — Adds more memory or CPU capacity to currently available pods rather than adding more pods
Prerequisites for Resource Autoscaling
- Create your Kubernetes cluster.
- Optional but recommended: Configure role-based access control (RBAC).
- Install the kubectl command-line tool in your workstation and connect to the cluster.
- Ensure that you have a deployment running for your cluster. This guide uses a deployment named python-hpa.
- Optional: To scale based on custom metrics, deploy your metrics server to your cluster.
Horizontal Scaling
1. Create an HPA. Use the kubectl autoscale subcommand.
bash
kubectl autoscale deployment python-hpa --cpu-percent=50 --min=1 --max=10
- cpu-percent=50 sets the threshold for CPU usage across all pods to 50%. The HPA will increase or decrease the number of pods to meet the threshold.
- min=1 and –max=10 mean that the number of replicas will remain between 1 and 10. The replicas will be controlled by the
python-hpadeployment in this demo. Substitute any other deployment you had to create before running thekubectl autoscalesubcommand.
```bash
kubectl get hpa php-apache -o yaml > /tmp/hpa-v2.yaml
```
```yaml
type: Object
object:
metric:
name: metric_name
```
```yaml
spec:
...
metrics:
- type: Pods
pods:
metricName: node_network_receive_bytes
targetAverageValue: 100000m
```
kubectl describe hpa subcommand:
kubectl describe hpa python-hpa
The conditions are in the status.conditions field of the command-line interface (CLI) output.
Vertical Scaling
1. VPA Components- Admission Plugin — Sets the correct resource requests for new pods
- Recommender — Compares past and present rates of resource consumption to predict and recommend values for memory and CPU requests
- Updater — Checks which pods have correct resources, and then kills those with incorrect resources so that updated requests can recreate them
- Recreate — The VPA assigns resource requests when creating pods and updates them by evicting and recreating them. Updates occur when the resources are substantially different from the recommendations.
- Auto — Recreates pods based on recommendation
- Off — The VPA only gives recommendations without automatically updating requests.
- Initial — The VPA creates requests while creating pods but never updates them later.
- Git installed and configured on your workstation
- Preferred metrics server deployed to your cluster
- Tear down previous versions of VPA (if they exist) with the following command:
```bash
./hack/vpa-down.sh
```
```bash
git clone https://github.com/kubernetes/autoscaler.git
```
```bash
./hack/vpa-up.sh
```
```bash
./hack/vpa-down.sh
```
- KEDA controls Kubernetes deployments to scale up and down to zero events.
- KEDA also serves as a metrics server for Kubernetes.
- ScaledObject — Describes the desired mapping between an event source (such as RabbitMQ) and your deployment
- ScaledJob — Describes the mapping between an event source and your Kubernetes job
- TriggerAuthentications — Contains the authentication config to monitor the event source for a ScaledObject
- ClusterTriggerAuthentications — Contains the authentication configurations to monitor the event source for a ScaledJob