Kubernetes – Set max pod (replica) limit per node

We have to allocate exactly N pod of service A per node. When new pod of A (N+1) is coming, pod cannot schedule due to lack of capacity and new nodes are added by Cluster Autoscaler

We can find a similar user case from this issue of Kubernetes Github repo add a new predicate: max replicas limit per node · Issue #71930

From Kubernetes 1.6, it seems we can use Pod Topology Spread Constraints to resolve.

You can use topology spread constraints to control how Pods are spread across your cluster among failure-domains such as regions, zones, nodes, and other user-defined topology domains. This can help to achieve high availability as well as efficient resource utilization.

Unfortunately, i am working on a old version of Kubernetes cluster (v1.12). It requires a workaround for this problem.

I found a workaround using Kubernetes QoS class Guaranteed setting to implement it.

When Kubernetes creates a Pod it assigns one of these QoS classes to the Pod:

  • Guaranteed
  • Burstable
  • BestEffort

Kubernetes QoS class

For a Pod to be given a QoS class of Guaranteed: Pods where both limit and (optionally) request are set for all resources (CPU and memory) and their values are the same. These pods are high priority, therefore they are terminated only if they are over the limit and there are no lower priority pods to terminate.

Example, 1 node t3.xlarge have 4 CPUs, 16 GBs. We want to spread 3 pods per node the resources/limits, requests are the same value 1 CPU (1 CPU for Kubernetes system pod, log pod such as Fluentd, …)

Pod manifest

containers:
  name: pong
    resources:
      limits:
        cpu: 1
        memory: 1Gi
      requests:
        cpu: 1
        memory: 1Gi

This workaround works like a charm for my case.

[til] Prevent deleting running pods on Kubernetes

Image by opskumu via Github

Kubernetes sends the preStop event immediately before the Container is terminated. Kubernetes’ management of the Container blocks until the preStop handler completes, unless the Pod’s grace period expires. For more details, see Termination of Pods.

So we can add a preStop event handler to prevent deleting a running pod.

pods/prestop-test.yaml
apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  containers:
  - name: lifecycle-demo-container
    image: nginx
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
      preStop:
        exec:
          # Stop Nginx process and wait for all processes be killed
          command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done"]

Or just simple waiting until running job counter be delete as bellow sample

        lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "while [ -f \"/tmp/running-job-counter\" ]; do sleep 1; done"]

Container Lifecycle Hooks – Kubernetes

Attach Handlers to Container Lifecycle Events – Kubernetes