4 Ways to Use Kernel Security Features for Process Monitoring

An effective runtime security strategy takes each layer into consideration and monitors the process within each container.

Jan 5th, 2023 7:13am by Amit Gupta

Featued image for: 4 Ways to Use Kernel Security Features for Process Monitoring

Image via Unsplash.

The large attack surface of Kubernetes’ default pod provisioning is susceptible to critical security vulnerabilities, some of which include malicious exploits and container breakouts. I believe one of the most effective workload runtime security measures to prevent such exploits is layer-by-layer process monitoring within the container. It may sound like a daunting task that requires additional resources, but in reality, it is actually the opposite. In this article, I will walk you through how to use existing Linux kernel security features to implement layer-by-layer process monitoring and prevent threats.

Threat Prevention and Process Monitoring

Containerized workloads in Kubernetes are composed of numerous layers. An effective runtime security strategy takes each layer into consideration and monitors the process within each container, also known as process monitoring. Threat detection in process monitoring involves integrating mechanisms that isolate workloads or control access. With these controls in place, you can effectively prevent malicious behavior, reduce your workload’s attack surface and limit the blast radius of security incidents. Fortunately, we can use existing Kubernetes mechanisms and Linux defenses to achieve this.

Kernel Security Features

By pulling Linux defenses closer to the container, we can leverage existing Kubernetes mechanisms to monitor processes and reduce the attack surface of individual layers. Let’s take a look at seccomp, AppArmour, SELinux and systcl, which are all kernel security features capable of controlling what system calls are necessary for your containerized applications and virtually isolating and customizing individual containers for the workloads they are running. These features can also prevent container breakouts by using mandatory access control (MAC) to provide access to resources such as volume or filesystem. You can considerably reduce the attack surface within your cluster by simply using the default settings of these four features.

Seccomp

A Linux kernel feature, seccomp has the ability to filter system calls made by the container at a granular level. Kubernetes will allow you to automatically implement seccomp profiles loaded onto a node by container runtimes, including podman, Docker and CRI-O. A simple seccomp profile will have a list of syscalls and a corresponding action whenever a syscall is made. With this feature enabled, your attack surface is reduced to allowed syscalls and forbids the use of dangerous syscalls. Dangerous syscalls can lead to a kernel exploit, privilege escalation and container breakouts.

SELinux

If you take a look at CVE-2019-5736, CVE-2016-9962, CVE-2015-3627, and others, you will find that every recent container runtime breakout was a type of filesystem breakout. You can mitigate this problem by using SELinux, which provides control over who can access the filesystem and the interaction between resources, such as directories, files and memory. I recommend applying SELinux profiles to workloads in cloud computing, as this helps reduce the attack surface by limiting the host kernel’s access to the filesystem and allows for better isolation practices. SELinux can also reinforce the traditional Linux discretionary access control (DAC) system effectively as it offers a mandatory access control (MAC). The traditional Linux DAC allows users to change files and directories and process permissions that are owned by the user. The same applies to root users. However, with SELinux MAC, the kernel will label every OS resource, which is then stored as extended file attributes. These labels are used to examine SELinux policies within the kernel to allow interactions. By implementing SELinux, root users in a container will no longer have access to host files in a mounted volume, even if the label isn’t accurate. Enforcing, Permissive and Disabled are the three modes SELinux operates in, and they can be further categorized into Targeted and Strict. Enforcing and Disabled, as their names suggest, enforces or disables SELinux policies, while Permissive sends out warnings. You can also enforce policies on specific workloads using Targeted, or apply policies on all processes using Strict. To further reinforce SELinux, I recommend labeling resources with a category using multicategory security (MCS). This option ensures users or processes can only access files labeled with the category the user or process belongs to. Once you’ve enabled SELinux, Docker, CRI-O, podman and other container runtimes will randomly pick an MCS label to run the container. Unless labeled correctly, a container won’t access a file on a host or Kubernetes volume. This creates an essential barrier between the resources that will help prevent vulnerabilities related to container breakouts. Take a look at the example below. A pod is deployed with the SELinux profile. Unless labeled as s0:c123,c456 on the host, this pod won’t be able to access any host volume mount files. Although you can see the entire host, the filesystem is mounted to the pod.

apiVersion: v1 metadata:
name: pod-se-linux-label namespace: default labels:
app: normal-app spec:
containers:
- name: app-container
image: alpine:latest args: ["sleep", "10000"] securityContext:
seLinuxOptions:
level: "s0:c123,c456"
volumes:
- name: rootfs
hostPath: path: /

SELinux policies can be challenging to maintain. However, SELinux policies are vital for a defense-in-depth strategy. The table below lists container escape CVEs that can be prevented by simply implementing and enforcing SELinux on hosts.

AppArmor

Similar to SELinux, an AppArmor profile defines what a process can access. Below is an example of an AppArmor profile:

 #include <tunables/global>
    /{usr/,}bin/ping flags=(complain) {
      #include <abstractions/base>
      #include <abstractions/consoles>
      #include <abstractions/nameservice>
      capability net_raw,
      capability setuid,
      network inet raw,
      /bin/ping mixr,
    /etc/modules.conf r,
      # Site-specific additions and overrides. See local/README for details.
      #include <local/bin.ping>
    }

As you can see, a ping here only has three capabilities: net_raw, setuid and read access to /etc/modules.conf. With controls in place, the ping utility has a reduced attack surface: It cannot modify or write to the filesystem, including keys, settings and binaries, or load any modules. In case of compromise, the ping utility will have a limited area to execute any malevolent activities. Your container runtime, such as Docker, CRI-O and podman will provide you with an AppArmor profile by default. Since AppArmor is flexible and easy to maintain, I recommend that you have a policy per microservice.

Sysctl

With Kubernetes sysctl, you can use the sysctl interface to configure kernel parameters in your cluster. Sysctl can also allow you to modify kernel behavior for specific workloads without affecting the rest of the cluster. For example, you can use sysctls to manage containers and resource-hungry workloads together while dealing with a large number of concurrent connections, or if you require a special parameter set to run workloads efficiently. Sysctls are categorized into two groups, safe sysctls and unsafe sysctls. You can set both groups at your own discretion. Safe sysctls only affect containers, while unsafe sysctls affect both the container and the node they are running on. If you ever need to use a sysctl that applies to the node, I recommend using node affinity to schedule workloads on nodes that have sysctls applied.

Summary

Layer-by-layer process monitoring is one of the most effective solutions to combat security incidents such as container breakouts and unauthorized access to host resources. While you should always remember to choose a solution that is tailored to your threat model, the solutions I’ve offered above are great ways to start or improve your process monitoring by leveraging existing Kubernetes mechanisms with Linux defenses. If you would like to see in-depth examples of seccomp, SELinux, AppArmour and systcl in action, I recommend taking a look at Chapter 4 of “Kubernetes Security and Observability: A Holistic Approach to Securing Containers and Cloud Native Applications,” an e-book I co-authored. To learn more about new cloud native approaches for establishing security and observability for containers and Kubernetes, check out this O’Reilly e-book by Tigera.

Amit Gupta is chief product officer at Tigera where he is responsible for the strategy and vision of Tigera’s products and leads the delivery of the company’s roadmap. Amit is a hands-on product executive with expertise in building software products...