Improve Kubernetes Pod Availability and Resilience with Liveness Probe

Hazel Raoult

Share this on:

Improve Kubernetes Pod Availability Could there be anything wrong with a pod that is identified as “running” by the Kubernetes controllers? Yes. There are instances when a pod is detected as running under the default Kubernetes automatic checking system, but the application within the pod may not be functioning properly.

Scenarios like these are not uncommon and should be addressed as soon as possible. They are not that different from pod or container failures, since the malfunctioning itself of the application affects the operation of an organization.

What’s great about Kubernetes is that an effective solution for situations like these already exists. What Kubernetes users need to do is simply learn or master it.

Want More Tech News? Subscribe to ComputingEdge Newsletter Today!

The health probe mechanism

Through the standard controllers, namely Deployment, DaemonSet, and StatefulSet, Kubernetes is designed to examine whether a pod is running automatically. If it is not, Kubernetes forcibly restarts it based on the restart policy configured in the pod. The drawback of the default controller operation, however, is that it cannot detect anomalies wherein pods appear to be working, but the applications in them are not.

This is where the health probe mechanism comes in. Kubernetes allows the implementation of a health probe to achieve a more granular evaluation of the status of a pod. This makes it possible to check if the application’s state within a pod reflects the health state retrieved from the pod.

With the health probe mechanism implemented, Kubernetes continuously keeps track of the health state of pods to establish patterns and develop better decisions regarding load balancing and traffic routing. Three types of Kubernetes health probes can be implemented: the liveness, readiness, and startup probes. For this discussion, the focus is on the liveness probe, particularly its role in improving pod availability and resilience.

The Kubernetes liveness probe

The liveness probe determines if containers are operational. If the probe is successful, it means that there are no issues found. The container is working as intended, so there are no logged events, and no action is undertaken. If the probe fails, the container is terminated and restarted.

Just like the other Kubernetes probes, the liveness probe is managed by the kubelet. As such, it is the kubelet that terminates and restarts a container if it is found to have issues. It is important to see to it, however, that the “restartPolicy” is set as “Always” or “OnFailure” to ensure that the kubelet stops and restarts a container if an error is detected.

It is advisable to use the liveness probe when the admin cannot ascertain if the containers will crash if there is a significant failure. As mentioned, there are instances when pods appear to be normal even when the applications in them are malfunctioning. The liveness probe provides the kubelet with more granular details regarding the application in a container to more accurately determine if a pod is operational or not. A running pod with a malfunctioning application, after all, is essentially not a normally operating pod.

The liveness probe is unnecessary when there is reasonable assurance that the container crashes whenever errors come up. For example, if the pod is configured to crash when there are errors encountered, it is safe to assume that Kubernetes will likely detect application problems and terminate and restart pods accordingly. Having a liveness probe will be redundant in these cases.

Maximizing the liveness probe function

Liveness probes play a significant role in enhancing pod resilience and availability. It is important to define the probe properly with the help of the following best practices.

Using the liveness probe for apps with varying startup times – Applications known for fluctuating or irregular startup times can be troublesome for the default Kubernetes controllers. The controllers may be confused as to when applications are operating normally and when they are having issues. For these applications, the liveness probe is more suitable.
More than one probe on the same endpoint – It is possible to use the liveness and readiness probes on the same endpoint. However, they should be performing different functions. The readiness probe should examine startup behavior, while the liveness probe checks for container health. Also, the liveness and startup probes can be used on the same endpoint. However, the startup probe’s “failureThreshold” value should be higher than that of the liveness probe to work with long startup times.
Preference for lightness – Make sure the liveness probe is as light as possible. Steer clear from “expensive operations” to ensure that checks are executed rapidly and efficiently. Also, take advantage of the lightweight nature of HTTP probes. Even if the app being probed is not an HTTP server, it is possible to create inside the app a light HTTP server, which will be used to respond to the liveness probe. In this case, if K8s pings a path, an HTTP response in the 200 or 300 range indicates that the app has no issues.
Ensuring command independence – Ascertain that the target of the probe’s command is independent of the main application. This is important to be able to run and complete the command even when the application fails or suffers some issues. Inaccuracies can mire a probe served by the standard or main application entry point. This can happen because of the unavailability of the required external dependencies or the failure of the framework to start as expected.
Execution consistency and timing – A liveness probe that runs too often entails resource wastage. It is inefficient and can defeat the purpose of running the probe. It is important to set execution times that match the circumstances of a system. There are no benchmarks or rules of thumb for this. To determine the best execution, delay, and timeout times, it is necessary to observe real-world workloads and develop the best values to use. Do not rely on the defaults supplied by Kubernetes.
Updating probe configurations – Probes need to change as the system changes. Introducing new features, regressions, and implementation of optimization measures can affect the impact of the liveness probe on pod resilience. Whenever system changes happen, it is advisable to revisit the probe’s settings.
Not for all – It is not necessary to have liveness probes for all containers. Relatively simple containers configured to terminate whenever errors or failures arise do not require a liveness probe. Similarly, it is not necessary to use the probe on low-priority services. Doing so would be inefficient, especially when complex commands are used to achieve accurate health readings.

The takeaway

Liveness probes are highly useful. They address the possible disconnect between the actual state of apps in pods and the somewhat superficial perception of Kubernetes on pod state. However, they can become agents of inefficiency when used improperly. It is advisable to use the liveness probe mechanism but only after becoming proficient enough in using it. This proficiency can be achieved faster, though, when using specialized Kubernetes tools that can guide probe configuration, usage, and modification/updating with help of change intelligence, in-depth visibility, integration, and seamless notifications.