Using Linux kernel hardening tools to secure a K8s cluster

Rafael Natali
Mar 3
6 min read

Some Linux kernel hardening tools can be integrated with Kubernetes to control how Pods and containers interact with the host operating system. For example, we can restrict Pods from creating files or executing programs. This article will discuss two of these tools: AppArmor and seccomp. In the following subsections, we will use these tools to restrict some operations between Kubernetes and the host operating system.

AppArmor

AppArmor is a security kernel module for Linux that offers fine-grained access control for programs running on Linux systems. An AppArmor profile consists of rules that specify what a program is allowed or not allowed to do.

An AppArmor profile can be loaded at the server level and activated in one of two modes. The first mode is called complain. In this mode, AppArmor doesn't take any action. It only generates a report with the actions the program is executing. This is useful for discovering what commands/functions a Pod is executing. The second mode is enforce. When a profile is loaded in this mode, AppArmor will actively prevent the Pod from executing anything that the profile does not allow. AppArmor profiles must be activated in every single worker node.

Applying an AppArmor profile to a Pod

Let’s create an AppArmor profile that deny all writes to disk and apply this profile to a Pod. In a worker node, create a file called k8s-deny-write with the following content:

#include <tunables/global>
profile k8s-deny-write flags=(attach_disconnected) {
  #include <abstractions/base>
  file,
  # Deny all file writes.
  deny /** w,
}

To activate an AppArmor profile, use the apparmor_parse command. By default, the profile is loaded in enforce mode. Use the -C flag to activate it in complain mode.

>sudo apparmor_parser ./k8s-deny-write

To verify if a profile is loaded execute the aa-status command:

>sudo aa-status 
apparmor module is loaded.
56 profiles are loaded.
52 profiles are in enforce mode.
...
k8s-deny-write
...
0 processes are unconfined but have a profile defined.

Using the following manifest file, we'll create a Pod that outputs a simple message:

apiVersion: v1
kind: Pod
metadata:
  name: hello-apparmor
spec:
  containers:
  - name: hello
    image: busybox:1.28
    command: [ "sh", "-c", "while true; do echo 'Hello AppArmor!' > /tmp/hello && cat /tmp/hello; sleep 10; done" ]

After creating the Pod, looking at the its logs, we can see that the message was written:

>kubectl logs hello-apparmor -f
Hello AppArmor!
Hello AppArmor!

Note: Prior to Kubernetes v1.30, AppArmor was specified through annotations. Check documentation for previous versions at https://kubernetes.io/docs/tutorials/security/apparmor/

Now, we will configure the AppArmor profile in the Pod manifest file using the securityContext. The new manifest file is the following:

apiVersion: v1
kind: Pod
metadata:
  name: hello-apparmor
spec:
  securityContext:
    appArmorProfile:
      type: Localhost
      localhostProfile: k8s-deny-write 
  containers:
  - name: hello
    image: busybox:1.28
    command: [ "sh", "-c", "while true; do echo 'Hello AppArmor!' > /tmp/hello && cat /tmp/hello; sleep 10; done" ]

Now, delete the current Pod, apply the new manifest files and check the logs again:

>kubectl delete pods hello-apparmor 
pod "hello-apparmor" deleted

>kubectl apply -f apparmor.yaml 
pod/hello-apparmor created

>kubectl logs hello-apparmor -f
sh: can't create /tmp/hello: Permission denied
sh: can't create /tmp/hello: Permission denied
sh: can't create /tmp/hello: Permission denied

The hello-apparmor Pod is unable to create files because of the AppArmor profile. You can also verify if the profile was applied executing:

>kubectl exec hello-apparmor -- cat /proc/1/attr/current
k8s-deny-write (enforce)

Here we can see that the k8s-deny-write profile was applied and it’s in enforce mode.

To finalise this AppArmor section, let me show what happens if we configure a Pod with a profile that hasn’t been loaded by creating the following Pod:

>kubectl create -f /dev/stdin <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: hello-apparmor-2
spec:
  securityContext:
    appArmorProfile:
      type: Localhost
      localhostProfile: k8s-apparmor-example-allow-write
  containers:
    - name: hello
      image: busybox:1.28
      command: [ "sh", "-c", "echo 'Hello AppArmor!' && sleep 1h" ]
EOF
pod/hello-apparmor-2 created

Verify the status of the Pod:

>kubectl get pods hello-apparmor-2
NAME               READY   STATUS                 RESTARTS   AGE
hello-apparmor-2   0/1     CreateContainerError   0          12s

Let’s use kubectl describe to investigate the error with the Pod:

>kubectl describe pods hello-apparmor-2
...
  Warning  Failed     79s (x12 over 3m15s)  kubelet            Error: failed to get container spec opts: failed to generate apparmor spec opts: apparmor profile not found k8s-apparmor-example-allow-write
...

In the Events sections, we can clearly see that the AppArmor profile was not found. Therefore, a Pod will not start if it’s configured to use a profile that hasn’t been loaded.

Seccomp

Seccomp is a Linux kernel feature that allows you to restrict the system calls that applications can make. This is useful for enhancing the security of applications by limiting their ability to interact with the underlying operating system, thus reducing the potential attack surface. Seccomp operates by defining a filter that specifies which system calls are allowed and which are denied.

In the context of Kubernetes, seccomp can be used to define security policies for pods, ensuring that they only use the necessary system calls required for their operation, thus reducing the risk of exploitation.

Creating example seccomp profiles

We will download three example seccomp profiles that we will use to test our Pods. These files must be present in all the nodes in the Kubernetes cluster. The first profile is the audit.json, which logs all system calls of a process. The second is the violation.json, which does not allow for any system calls. The third is the fine-grained.json, which allows some system calls in the "action": "SCMP_ACT_ALLOW" block. To download and save to the local directory /var/lib/kubelet/seccomp/seccomp_profiles, execute the following commands:

>sudo mkdir -p /var/lib/kubelet/seccomp/seccomp_profiles

>curl -L -o seccomp_profiles/audit.json https://k8s.io/examples/pods/security/seccomp/profiles/audit.json 

>curl -L -o seccomp_profiles/violation.json https://k8s.io/examples/pods/security/seccomp/profiles/violation.json 

>curl -L -o seccomp_profiles/fine-grained.json https://k8s.io/examples/pods/security/seccomp/profiles/fine-grained.json

>ls seccomp_profiles/
audit.json  fine-grained.json  violation.json

Creating a Pod that logs all system calls

Configure the audit.json profile in the .spec.securityContext field of a Pod. Here is a Pod manifest file that will use the seccomp profile:

apiVersion: v1
kind: Pod
metadata:
  name: audit-pod
  labels:
    app: audit-pod
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: seccomp_profiles/audit.json
  containers:
  - name: test-container
    image: hashicorp/http-echo:1.0
    args:
    - "-text=just made some syscalls!"
    securityContext:
      allowPrivilegeEscalation: false

Let’s create the Pod:

>kubectl apply -f audit.yaml  
pod/audit-pod created

This profile doesn’t block any action so, the Pod will execute without any problem. Now, log into the worker node where the Pod is running and verify in the syslog the system calls the audit-pod is executing:

>sudo tail -f /var/log/syslog | grep 'http-echo'

Jul 23 22:09:48 5600791c132c kernel: [ 2070.311535] audit: type=1326 audit(1721772588.695:700): auid=4294967295 uid=65532 gid=65532 ses=4294967295 subj=cri-containerd.apparmor.d pid=16021 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=35 compat=0 ip=0x4685d7 code=0x7ffc0000

Jul 23 22:09:48 5600791c132c kernel: [ 2070.311662] audit: type=1326 audit(1721772588.695:701): auid=4294967295 uid=65532 gid=65532 ses=4294967295 subj=cri-containerd.apparmor.d pid=16021 comm="http-echo" exe="/http-echo" sig=0 arch=c000003e syscall=202 compat=0 ip=0x468ba3 code=0x7ffc0000

Creating a Pod with a profile that does not allow any system call

The violation.json profile does not allow any system call. Configuring it in a Pod, the Pod should fail to start. The following Pod uses the violation.json profile:

apiVersion: v1
kind: Pod
metadata:
  name: violation-pod
  labels:
    app: violation-pod
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: seccomp_profiles/violation.json
  containers:
  - name: test-container
    image: hashicorp/http-echo:1.0
    args:
    - "-text=just made some syscalls!"
    securityContext:
      allowPrivilegeEscalation: false

Now, we will create the Pod and verify its status:

>kubectl apply -f violation.yaml 
pod/violation-pod created

>kubectl get pods violation-pod 
NAME            READY   STATUS              RESTARTS     AGE
violation-pod   0/1     RunContainerError   2 (1s ago)   26s

Looking in the syslog, we will see an error indicating that the Pod is unable to start a process:

>sudo tail -f /var/log/syslog | grep 'http-echo'

Jul 23 22:28:14 5600791c132c kubelet[769]: E0723 22:28:14.220131     769 kuberuntime_manager.go:1256] container &Container{Name:test-container,Image:hashicorp/http-echo:1.0,Command:[],Args:[-text=just made some syscalls!],WorkingDir:,Ports:[]ContainerPort{},Env:[]EnvVar{},Resources:ResourceRequirements{Limits:ResourceList{},Requests:ResourceList{},Claims:[]ResourceClaim{},},VolumeMounts:[]VolumeMount{VolumeMount{Name:kube-api-access-lmsz5,ReadOnly:true,MountPath:/var/run/secrets/kubernetes.io/serviceaccount,SubPath:,MountPropagation:nil,SubPathExpr:,RecursiveReadOnly:nil,},},LivenessProbe:nil,ReadinessProbe:nil,Lifecycle:nil,TerminationMessagePath:/dev/termination-log,ImagePullPolicy:IfNotPresent,SecurityContext:&SecurityContext{Capabilities:nil,Privileged:nil,SELinuxOptions:nil,RunAsUser:nil,RunAsNonRoot:nil,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:*false,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,AppArmorProfile:nil,},Stdin:false,StdinOnce:false,TTY:false,EnvFrom:[]EnvFromSource{},TerminationMessagePolicy:File,VolumeDevices:[]VolumeDevice{},StartupProbe:nil,ResizePolicy:[]ContainerResizePolicy{},RestartPolicy:nil,} start failed in pod violation-pod_default(9743bb58-804b-40c8-9775-babfa69e80d1): RunContainerError: failed to start containerd task "28a0542f2cb20f3747de7518a1bafeeb0c470bd0cbac07482f4bca82f1b32279": cannot start a stopped process: unknown

As expected, the seccomp profile prevented the Pod from executing any system call.

Creating a Pod with a profile that allows the required system calls

For the image http-echo to run successfully, it needs to have permission to perform some system calls. The fine-grained.json profile allows the required system calls. The following is a Pod using the fine-grained.json profile:

apiVersion: v1
kind: Pod
metadata:
  name: fine-grained-pod
  labels:
    app: fine-grained-pod
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: seccomp_profiles/fine-grained.json
  containers:
  - name: test-container
    image: hashicorp/http-echo:1.0
    args:
    - "-text=just made some syscalls!"
    securityContext:
      allowPrivilegeEscalation: false

Applying the Pod and checking the status:

>kubectl apply -f fine-grained.yaml 
pod/fine-grained-pod created

>kubectl get pods fine-grained-pod 
NAME               READY   STATUS    RESTARTS   AGE
fine-grained-pod   1/1     Running   0          67s

As the profile allows all system calls necessary for the image, there is no error log in the syslog and the Pod is running successfully.

Summary

Using Linux kernel hardening tools to protect the host operating system enhances the isolation, minimising the attack surface of your Kubernetes cluster. Some points to remember:

· Before Kubernetes version 1.30, AppArmor and seccomp profiles were configured in the Pod using annotations. After this version, they are configured in the .spec.securityContext fields.

· AppArmor and seccomp profiles must be configured in all nodes of the cluster

· Review the Kubernetes documentation about using AppArmor and seccomp available at https://kubernetes.io/docs/tutorials/security/

1 Comment

Api Connects

Jun 16

API Connects is a leading IT firm in New Zealand, specializing in IoT development, IoT solutions, and data engineering services. We provide cutting-edge IoT solutions to enhance business operations and data engineering services for seamless data migration and optimization. Our expert DevOps team ensures secure core banking data migration for financial institutions. Visit- https://apiconnects.co.nz/iot-development-testing-consulting/ , https://apiconnects.co.nz/data-engineering-services/

Rafael Natali