A member of a Slack channel I frequent recently asked how you go about patching existing Kubernetes resources. After responding, I thought that others might benefit from the same information. In today’s example, we’re going to resize a PersistentVolumeClaim
(PVC).
Category: kubernetes
How I Passed the CKA
After passing the CKAD at the end of 2019, I was amped and ready to knock out the CKA. I was ready to take the exam in March and then all hell broke loose. 2020 is the year of Covid and nothing seems to be working out as expected. Fast forward to November and I decided it was finally time to get back to it. So, what did I do this time around?
Static Pods
Static pods are not managed by the kube-apiserver
, but rather by the kubelet
itself. While there is no Deployment, ReplicaSet, etc., the kubelet
will work to keep the pod(s) up and running.
Upgrading Your Cluster
One of the things I noticed while studying for my CKAD exam is that my test cluster was a bit behind. It’s been up and running for over 200 days at this point and I’m several versions behind. Version 1.17 is out, the exam was based on 1.16, and I’m running 1.13.
NAME STATUS ROLES AGE VERSION runlevl41c.mylabserver.com Ready master 207d v1.13.5 runlevl42c.mylabserver.com Ready <none> 207d v1.13.5 runlevl43c.mylabserver.com Ready <none> 207d v1.13.5
How I Passed the CKAD Exam
Updated 12/2020
Last year, I decided that since I’d been working on pushing my employer to embrace a Cloud-native mentality that I should lead by example and get certified. We had chosen Kubernetes as our base container management platform and ultimately inked a deal with Red Hat to roll out OpenShift throughout the enterprise.
If you’ve read the blog for any time, you’ll remember that I was originally pursuing RHEL certification. At the time, even though I was in a development role, I was finding myself needing more and more Linux skills. However, my role transitioned to a dedicated focus on Cloud technologies. I abandoned RHEL and began focusing on Kubernetes. However, it wasn’t until about November of this year that I really buckled down to prepare.
In the spirit of openness, I’ll confess that I’ve had the benefit of spending the past several months working in a dedicated fashion deploying multiple OpenShift clusters. This has involved a great deal of tweaking, testing, and troubleshooting which has provided invaluable real-world, hands-on experience working with the Kubernetes system. Now, with that out of the way, I thought I’d share my personal take on how I prepared…and passed…the CKAD exam.
Separate, But Equal
One of the things we noticed in our dev cluster at work during the initial stages of our OpenShift deployment is that while we were deliberately mucking with things, Pods would end up getting scheduled together on the same node. This didn’t seem prudent to us since we were playing around with purposely killing nodes, and such affinity could potentially lead to a negative impact to our clients. This is what led me down the path of anti-affinity. I have to admit that I find the entire premise and behavior of the Kubernetes scheduler fascinating.
The goal of this post is to describe how to spread your workload amongst your nodes for greater fault tolerance. We’ll discuss both hard and soft affinities. Unfortunately, I only have two worker nodes in my lab environment so it’ll be a little more challenging to demonstrate, but I think you’ll get the gist.
Let’s say I just run a simple imperative command to create a quick Deployment with four NGINX replicas:
$ kubectl run no-affinity --replicas=4 --image=nginx --labels=app=no-affinity deployment.apps/no-affinity created
I can see the the scheduler actually did a great job of evenly distributing the workload.
$ kubectl get po -o wide -l app=no-affinity NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES no-affinity-6dc48758bf-lwdd5 1/1 Running 0 23s 10.244.2.25 runlevl43c.mylabserver.com <none> <none> no-affinity-6dc48758bf-nbgzl 1/1 Running 0 23s 10.244.1.111 runlevl42c.mylabserver.com <none> <none> no-affinity-6dc48758bf-sx6mr 1/1 Running 0 23s 10.244.2.24 runlevl43c.mylabserver.com <none> <none> no-affinity-6dc48758bf-xmq2r 1/1 Running 0 23s 10.244.1.112 runlevl42c.mylabserver.com <none> <none>
Let’s presume that I don’t want any identical pods running together. This is where hard affinity comes into play. Let’s look at a Deployment descriptor which defines hard affinity. Since our goal is to separate our Pods, rather than keep them together, we’re going to use anti-affinity.
apiVersion: apps/v1 kind: Deployment metadata: labels: app: affinity name: affinity spec: replicas: 4 selector: matchLabels: app: affinity template: metadata: labels: app: affinity spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - affinity topologyKey: kubernetes.io/hostname containers: - image: nginx name: affinity
In this example, we’re telling the scheduler that we are going to require that any Pods whose label matches app=affinity
cannot be scheduled together. Let’s create the Deployment and see what happens.
kubectl apply -f req.yaml
$ kubectl get po -o wide -l app=affinity NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES affinity-dc6c5999-7n6bk 0/1 Pending 0 39s <none> <none> <none> <none> affinity-dc6c5999-brtsx 0/1 Pending 0 39s <none> <none> <none> <none> affinity-dc6c5999-gn4k8 1/1 Running 0 39s 10.244.2.26 runlevl43c.mylabserver.com <none> <none> affinity-dc6c5999-mhjzt 1/1 Running 0 39s 10.244.1.113 runlevl42c.mylabserver.com <none> <none>
Notice that now we only have two out of our four declared Pods running. The first two listed are in a Pending
state because the scheduler is adhering to our rule that the Pods cannot be scheduled together. This probably isn’t what we really want. Our aim is to have the schedule do its best, though. Let’s change the descriptor so that we tell the scheduler that we prefer this behavior, but that it isn’t required.
apiVersion: apps/v1 kind: Deployment metadata: creationTimestamp: null labels: app: affinity name: affinity spec: replicas: 4 selector: matchLabels: app: affinity strategy: {} template: metadata: creationTimestamp: null labels: app: affinity spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 podAffinityTerm: topologyKey: kubernetes.io/hostname labelSelector: matchExpressions: - key: app operator: In values: - affinity containers: - image: nginx name: affinity
In this example, we’re now saying that the anti-affinity pattern is preferred, not required. This is known as soft affinity. Note that the weight
1 field is required with preferred scheduling. If we delete the existing Deployment and re-create it with this new Descriptor we should see behavior similar to our initial example.
$ kubectl get po -o wide -l app=affinity NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES affinity-944d8c9f9-498hz 1/1 Running 0 10s 10.244.1.114 runlevl42c.mylabserver.com <none> <none> affinity-944d8c9f9-jljmq 1/1 Running 0 10s 10.244.1.115 runlevl42c.mylabserver.com <none> <none> affinity-944d8c9f9-m582c 1/1 Running 0 10s 10.244.2.27 runlevl43c.mylabserver.com <none> <none> affinity-944d8c9f9-plmbf 1/1 Running 0 10s 10.244.2.28 runlevl43c.mylabserver.com <none> <none>
And this is what we see. Again, if you have more worker nodes available, it may be easier to see how you can get some disparate balancing. Whether or not this would be a problem for us in production is TBD. However, I plan on accounting for it in our deployments regardless to err on the side of safety and resiliency.
- The weight field in preferredDuringSchedulingIgnoredDuringExecution is in the range 1-100. For each node that meets all of the scheduling requirements (resource request, RequiredDuringScheduling affinity expressions, etc.), the scheduler will compute a sum by iterating through the elements of this field and adding “weight” to the sum if the node matches the corresponding MatchExpressions. This score is then combined with the scores of other priority functions for the node. The node(s) with the highest total score are the most preferred. Source ↩︎
Executing Commands Against a Pod
I recently responded to another user in a Slack channel regarding this topic and thought I’d post it here as well. The discussion revolved around when you need to use the --
command demarcation with the kubectl exec
command.
As for the necessity or not of the — command, it depends on what you’re doing. If you’re not passing any arguments to the command, it’s not needed.
Let’s create a new pod based on busybox to test the theory: kubectl run bb --image=busybox --restart=Never --command -- sleep 3600
If we want to list the directories with ls, we can do it your way:
$ kubectl exec bb ls
bin
dev
etc
home
proc
root
sys
tmp
usr
var
Let’s say we want to know more about the contents so we try to do a long listing. This will generate an error.
$ kubectl exec bb ls -al
Error: unknown shorthand flag: 'a' in -al
However, if we include the --
, the command and argument(s) are handled properly and we get the desired results.
$ kubectl exec bb -- ls -al
total 44
drwxr-xr-x 1 root root 4096 Dec 8 21:17 .
drwxr-xr-x 1 root root 4096 Dec 8 21:17 ..
-rwxr-xr-x 1 root root 0 Dec 8 21:17 .dockerenv
drwxr-xr-x 2 root root 12288 Dec 2 20:12 bin
drwxr-xr-x 5 root root 360 Dec 8 21:17 dev
drwxr-xr-x 1 root root 4096 Dec 8 21:17 etc
drwxr-xr-x 2 nobody nogroup 4096 Dec 2 20:12 home
dr-xr-xr-x 238 root root 0 Dec 8 21:17 proc
drwx------ 2 root root 4096 Dec 2 20:12 root
dr-xr-xr-x 13 root root 0 Dec 8 21:16 sys
drwxrwxrwt 2 root root 4096 Dec 2 20:12 tmp
drwxr-xr-x 3 root root 4096 Dec 2 20:12 usr
drwxr-xr-x 1 root root 4096 Dec 8 21:17 var
Hopefully this will clear up any confusion.
CKAD Speed Tip
As I prepare for the Certified Kubernetes Application Developer exam, I’m looking for every possible way to shave time off the tasks at hand. One of the things that’s annoyed me when trying to move quickly from task to task is the delay in waiting for resources to be killed off.
One of the things you need to take into consideration is that Kubernetes resources have a default grace period (30 seconds).
Let’s take a look at the time it takes (your numbers will vary) to kill your average pod. We’ll use the time
command to capture the numbers. In this example, I’ve created a simple pod running nginx. I’m also efficient and I have kubectl
aliased to k
.
$ time k delete po nginx pod "nginx" deleted real 0m13.416s user 0m0.080s sys 0m0.008s
Now let’s run the same command using the --now
flag to tell kubectl
that we want to set the grace period to 1.
$ time k delete po nginx --now pod "nginx" deleted real 0m2.012s user 0m0.080s sys 0m0.000s
In this example (and several other tests), I was able to reduce the time by 10 or more seconds each run. Now, if you’re really in a hurry, and don’t care about waiting at all, you can take it one step further. Be forewarned (and the command will warn you as well) that Kubernetes won’t confirm that the pod was terminated and may remain running. We can set the grace period to 0 and force the deletion.
$ time k delete po nginx --grace-period=0 --force=true warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely. pod "nginx" force deleted real 0m0.094s user 0m0.064s sys 0m0.020s