Join our community of software engineering leaders and aspirational developers. Always
stay in-the-know by getting the most important news and exclusive content delivered
fresh to your inbox to learn more about at-scale software development.
REQUIRED
It seems that you've previously unsubscribed from our newsletter
in the past. Click the button below to open the re-subscribe form
in a new tab. When you're done, simply close that tab and continue
with this form to complete your subscription.
The New Stack does not sell your information or share it with
unaffiliated third parties. By continuing, you agree to our
Terms of Use and
Privacy Policy.
Welcome and thank you for joining The New Stack community!
Please answer a few simple questions to help us deliver the news and resources you are interested in.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Great to meet you!
Tell us a bit about your job so we can cover the topics you find most relevant.
REQUIRED
REQUIRED
REQUIRED
REQUIRED
REQUIRED
Welcome!
We’re so glad you’re here. You can expect all the best TNS content to arrive
Monday through Friday to keep you on top of the news and at the top of your game.
What’s next?
Check your inbox for a confirmation email where you can adjust your preferences
and even join additional groups.
Follow TNS on your favorite social media networks.
Tigera sponsored this post. Insight Partners is an investor in Tigera and TNS.
The large attack surface of Kubernetes’ default pod provisioning is susceptible to critical security vulnerabilities, some of which include malicious exploits and container breakouts. I believe one of the most effective workload runtime security measures to prevent such exploits is layer-by-layer process monitoring within the container.
It may sound like a daunting task that requires additional resources, but in reality, it is actually the opposite. In this article, I will walk you through how to use existing Linux kernel security features to implement layer-by-layer process monitoring and prevent threats.
Threat Prevention and Process Monitoring
Containerized workloads in Kubernetes are composed of numerous layers. An effective runtime security strategy takes each layer into consideration and monitors the process within each container, also known as process monitoring.
Threat detection in process monitoring involves integrating mechanisms that isolate workloads or control access. With these controls in place, you can effectively prevent malicious behavior, reduce your workload’s attack surface and limit the blast radius of security incidents. Fortunately, we can use existing Kubernetes mechanisms and Linux defenses to achieve this.
Tigera provides Calico, a unified network security and observability platform to prevent, detect and mitigate security breaches in Kubernetes clusters. Tigera’s open-source offering, Calico Open Source, is the most widely adopted container networking and security solution.
Learn More
The latest from Tigera
Kernel Security Features
By pulling Linux defenses closer to the container, we can leverage existing Kubernetes mechanisms to monitor processes and reduce the attack surface of individual layers.
Let’s take a look at seccomp, AppArmour, SELinux and systcl, which are all kernel security features capable of controlling what system calls are necessary for your containerized applications and virtually isolating and customizing individual containers for the workloads they are running. These features can also prevent container breakouts by using mandatory access control (MAC) to provide access to resources such as volume or filesystem. You can considerably reduce the attack surface within your cluster by simply using the default settings of these four features.
Seccomp
A Linux kernel feature, seccomp has the ability to filter system calls made by the container at a granular level. Kubernetes will allow you to automatically implement seccomp profiles loaded onto a node by container runtimes, including podman, Docker and CRI-O.
A simple seccomp profile will have a list of syscalls and a corresponding action whenever a syscall is made. With this feature enabled, your attack surface is reduced to allowed syscalls and forbids the use of dangerous syscalls. Dangerous syscalls can lead to a kernel exploit, privilege escalation and container breakouts.
SELinux
If you take a look at CVE-2019-5736, CVE-2016-9962, CVE-2015-3627, and others, you will find that every recent container runtime breakout was a type of filesystem breakout. You can mitigate this problem by using SELinux, which provides control over who can access the filesystem and the interaction between resources, such as directories, files and memory. I recommend applying SELinux profiles to workloads in cloud computing, as this helps reduce the attack surface by limiting the host kernel’s access to the filesystem and allows for better isolation practices.
SELinux can also reinforce the traditional Linux discretionary access control (DAC) system effectively as it offers a mandatory access control (MAC). The traditional Linux DAC allows users to change files and directories and process permissions that are owned by the user. The same applies to root users.
However, with SELinux MAC, the kernel will label every OS resource, which is then stored as extended file attributes. These labels are used to examine SELinux policies within the kernel to allow interactions. By implementing SELinux, root users in a container will no longer have access to host files in a mounted volume, even if the label isn’t accurate.
Enforcing, Permissive and Disabled are the three modes SELinux operates in, and they can be further categorized into Targeted and Strict. Enforcing and Disabled, as their names suggest, enforces or disables SELinux policies, while Permissive sends out warnings. You can also enforce policies on specific workloads using Targeted, or apply policies on all processes using Strict.
To further reinforce SELinux, I recommend labeling resources with a category using multicategory security (MCS). This option ensures users or processes can only access files labeled with the category the user or process belongs to. Once you’ve enabled SELinux, Docker, CRI-O, podman and other container runtimes will randomly pick an MCS label to run the container.
Unless labeled correctly, a container won’t access a file on a host or Kubernetes volume. This creates an essential barrier between the resources that will help prevent vulnerabilities related to container breakouts.
Take a look at the example below. A pod is deployed with the SELinux profile. Unless labeled as s0:c123,c456 on the host, this pod won’t be able to access any host volume mount files. Although you can see the entire host, the filesystem is mounted to the pod.
SELinux policies can be challenging to maintain. However, SELinux policies are vital for a defense-in-depth strategy. The table below lists container escape CVEs that can be prevented by simply implementing and enforcing SELinux on hosts.
AppArmor
Similar to SELinux, an AppArmor profile defines what a process can access. Below is an example of an AppArmor profile:
As you can see, a ping here only has three capabilities: net_raw, setuid and read access to /etc/modules.conf. With controls in place, the ping utility has a reduced attack surface: It cannot modify or write to the filesystem, including keys, settings and binaries, or load any modules. In case of compromise, the ping utility will have a limited area to execute any malevolent activities.
Your container runtime, such as Docker, CRI-O and podman will provide you with an AppArmor profile by default. Since AppArmor is flexible and easy to maintain, I recommend that you have a policy per microservice.
Sysctl
With Kubernetes sysctl, you can use the sysctl interface to configure kernel parameters in your cluster. Sysctl can also allow you to modify kernel behavior for specific workloads without affecting the rest of the cluster. For example, you can use sysctls to manage containers and resource-hungry workloads together while dealing with a large number of concurrent connections, or if you require a special parameter set to run workloads efficiently.
Sysctls are categorized into two groups, safe sysctls and unsafe sysctls. You can set both groups at your own discretion. Safe sysctls only affect containers, while unsafe sysctls affect both the container and the node they are running on. If you ever need to use a sysctl that applies to the node, I recommend using node affinity to schedule workloads on nodes that have sysctls applied.
Summary
Layer-by-layer process monitoring is one of the most effective solutions to combat security incidents such as container breakouts and unauthorized access to host resources. While you should always remember to choose a solution that is tailored to your threat model, the solutions I’ve offered above are great ways to start or improve your process monitoring by leveraging existing Kubernetes mechanisms with Linux defenses.
If you would like to see in-depth examples of seccomp, SELinux, AppArmour and systcl in action, I recommend taking a look at Chapter 4 of “Kubernetes Security and Observability: A Holistic Approach to Securing Containers and Cloud Native Applications,” an e-book I co-authored.
To learn more about new cloud native approaches for establishing security and observability for containers and Kubernetes, check out this O’Reilly e-book by Tigera.
Tigera provides Calico, a unified network security and observability platform to prevent, detect and mitigate security breaches in Kubernetes clusters. Tigera’s open-source offering, Calico Open Source, is the most widely adopted container networking and security solution.