K8s Resource Management: An Autoscaling Cheat Sheet

A concise but comprehensive guide to using and managing horizontal and vertical autoscaling in the Kubernetes environment.

Nov 30th, 2022 6:34am by Brian Likosar

Featued image for: K8s Resource Management: An Autoscaling Cheat Sheet

Image via Pixabay.

Virtually everyone running Kubernetes infrastructure tries to use only the resources they need — nothing more and nothing less. Therefore, the resources in use must be able to scale in either direction to meet the required demands. Fortunately, Kubernetes enables us to automate our scaling processes out of the box, preventing us from manually scaling and provisioning every time the need arises. It saves time and human effort, which lowers costs as well. However, the numerous controls configuring these autoscaling functionalities can be overwhelming, even for seasoned experts. This cheat sheet aims to demystify Kubernetes autoscaling on a resource-by-resource basis. Let’s explore the types of resources that we can autoscale and work through several practice exercises.

Types of Kubernetes Resources That Accommodate Autoscaling

In the Kubernetes ecosystem, resources are things we create. All of the following resources can be autoscaled:

Pod — The smallest unit of Kubernetes workload deployed
Node — A group of pods
ReplicaSet — A process that runs multiple instances of a pod to maintain stable numbers of them
Deployment — Manages ReplicaSets and declaratively manages pod instances. All replica instances share a single volume and PersisentVolumeClaim (PVC).
StatefulSet — Used in stateful applications (when deployments are used for stateless applications). A StatefulSet ensures that each pod has its own volume and PVC.
RAM and CPU cores — Memory and computation resources for a Kubernetes cluster

Notably, some resources like DaemonSets cannot be scaled. Read through this StormForge article for in-depth insight into Kubernetes resource types.

Requests and Limits

Kubernetes allows us to specify controls and checks on resource requests and limits.

Requests — Specifies resources that must be reserved for a pod container.
Limits — Specifies the maximum allowable number of resources that a container can use.

In an ideal Kubernetes environment, we must specify the CPU, memory and memory requests. Otherwise, Kubernetes will automatically set generous default requests, which can inflate costs.

Pod Quality of Service

Kubernetes also enables us to assign a class to a pod’s quality of service (QoS) in our preferred order of priority:

Guaranteed QoS class — Highest priority pods. The top-priority pods or all the containers in a pod can be assigned the Guaranteed QoS.

```
status:
qosClass: Guaranteed
```

Best-Effort QoS class — Lowest priority pods. The pods with little priority can use the Best-Effort QoS. The containers in Best-Effort pods do not have CPU requests or memory limits.

```
status:
qosClass: BestEffort
```

Burstable QoS class — Midrange priority pods, falling in between high and low priority classes, can use Burstable. One container in the pod must have a CPU or memory limit or request.

```
status:
qosClass: Burstable
```

Guaranteed	Best Effort	Burstable
Highest priority pods	Lowest priority pods	Mid-priority pods
All containers must have CPU or memory limits and requests.	Containers have no CPU or memory limits and requests.	At least one container must have a CPU or memory limit or request.

Scaling Objects and Definitions

Custom Resource Definition (CRD) — Enables introducing unique objects to clusters
HorizontalPodAutoscaler (HPA) — A CRD object for increasing the number of a pod’s replicas based on metrics such as CPU utilization
VerticalPodAutoscaler (VPA) — A CRD object for setting the requests and limits for containers in pods
Horizontal scaling — Increases the number of pods to handle the increase in workload. The HPA manages this.
Vertical scaling — Adds more memory or CPU capacity to currently available pods rather than adding more pods

Prerequisites for Resource Autoscaling

Create your Kubernetes cluster.
Optional but recommended: Configure role-based access control (RBAC).
Install the kubectl command-line tool in your workstation and connect to the cluster.
Ensure that you have a deployment running for your cluster. This guide uses a deployment named python-hpa.
Optional: To scale based on custom metrics, deploy your metrics server to your cluster.

Horizontal Scaling

1. Create an HPA. Use the kubectl autoscale subcommand.

bash
kubectl autoscale deployment python-hpa --cpu-percent=50 --min=1 --max=10

The kubectl autoscale command creates the HPA for a deployment named python-hpa.

cpu-percent=50 sets the threshold for CPU usage across all pods to 50%. The HPA will increase or decrease the number of pods to meet the threshold.
min=1 and –max=10 mean that the number of replicas will remain between 1 and 10. The replicas will be controlled by the python-hpa deployment in this demo. Substitute any other deployment you had to create before running the kubectl autoscale subcommand.

2. Stabilization Window It usually takes a few minutes to stabilize the number of replicas added or removed. Autoscaling all of it together isn’t instant. It takes a few minutes. 3. Autoscaling on Specific Metrics You can state specific monitored metrics in the YAML of the HPA created. This enables scaling based on metrics defined by your monitoring or observability platform. Use the autoscaling/v2 version to get the YAML of your HPA with the following command:

```bash
kubectl get hpa php-apache -o yaml > /tmp/hpa-v2.yaml
```

You can then add a custom metric in this format:

```yaml
type: Object
object:
  metric:
   name: metric_name   
```

An example for the Prometheus adapter might look like this:

```yaml
spec:
  ...
  metrics:
  - type: Pods
   pods:
    metricName: node_network_receive_bytes
    targetAverageValue: 100000m
```

4. Status Conditions You should check if your autoscaler can scale or if there’s a restriction that might prevent it from scaling. You can find this information with the kubectl describe hpa subcommand: kubectl describe hpa python-hpa The conditions are in the status.conditions field of the command-line interface (CLI) output.

Vertical Scaling

1. VPA Components

Admission Plugin — Sets the correct resource requests for new pods
Recommender — Compares past and present rates of resource consumption to predict and recommend values for memory and CPU requests
Updater — Checks which pods have correct resources, and then kills those with incorrect resources so that updated requests can recreate them

2. VPA Operation Modes The VPA operates in the following four modes:

Recreate — The VPA assigns resource requests when creating pods and updates them by evicting and recreating them. Updates occur when the resources are substantially different from the recommendations.
Auto — Recreates pods based on recommendation
Off — The VPA only gives recommendations without automatically updating requests.
Initial — The VPA creates requests while creating pods but never updates them later.

3. Limits Control Resource policies determine the limits set by VPA. Basically, it maintains its recommendations between the minimum (min) and maximum (max) values of limit ranges in the spec.limits section of the YAML definition. 4. Commands for Setting Up VPA and Their Parameters For this section, you need the following:

Git installed and configured on your workstation
Preferred metrics server deployed to your cluster
Tear down previous versions of VPA (if they exist) with the following command:

```bash
./hack/vpa-down.sh
```

Install Vertical Pod Autoscaler (VPA): Download the VPA source code:

```bash
git clone https://github.com/kubernetes/autoscaler.git
```

Set up the VPA:

```bash
./hack/vpa-up.sh
```

Tear down the VPA if you want to stop running it:

```bash
./hack/vpa-down.sh
```

Note that VPA should not be used with the HPA unless scaling based on custom metrics. See the official guide for more information on how to use the VPA. 5. Kubernetes-Based Event-Driven Autoscaler (KEDA) KEDA is a lightweight tool that helps you manage your autoscaling by the number of events to be processed. You can add it to your cluster to extend functionality alongside the standard HPA and other components.

KEDA controls Kubernetes deployments to scale up and down to zero events.
KEDA also serves as a metrics server for Kubernetes.

Installing KEDA gives you these four custom resources:

ScaledObject — Describes the desired mapping between an event source (such as RabbitMQ) and your deployment
ScaledJob — Describes the mapping between an event source and your Kubernetes job
TriggerAuthentications — Contains the authentication config to monitor the event source for a ScaledObject
ClusterTriggerAuthentications — Contains the authentication configurations to monitor the event source for a ScaledJob

6. KEDA Events KEDA emits events, such as ScaledObjectReady the first time a ScaledObject is ready and ScaledJobDeleted when a ScaledJob is deleted. Check the KEDA Events webpage to see an exhaustive list of events emitted by KEDA.

Conclusion

This cheat sheet provides a concise but comprehensive guide to using and managing horizontal and vertical autoscaling in the Kubernetes environment. With this information, you can confidently implement autoscaling in your Kubernetes workload. StormForge helps you save money, time, and effort by using machine learning to automate the management and optimization of your Kubernetes resources, including acting as a VPA and adjusting your HPA settings to increase efficiency. Using a solution like StormForge can provide improved efficiency compared to using the HPA or VPA standalone, and it allows for vertical and horizontal autoscaling to work together without thrashing.

Brian Likosar is an open source geek with a passion for working at the intersection of people and technology. Throughout his career, he's been involved in open source, whether that was with Linux, Ansible and OpenShift/Kubernetes while at Red Hat,...