Is it guaranteed that user pods always receive a SIGTERM and have up to 15s for graceful shutdown, even on preemptible node shutdowns?
No, it is not guaranteed that user pods will always receive a SIGTERM
signal or be given the full 15 second graceful shutdown window during preemptible node shutdowns in GKE. This states that a 15 second termination period for non-system Pods is provided on a best-effort basis. The kubelet
tries to send a SIGTERM
to non-system pods and allows up to 15 seconds for them to shut down, followed by another 15 seconds for system pods with system-cluster-critical
or system-node-critical
priority classes. However, this process is not guaranteed, particularly in situations involving resource constraints, node overload, or rapid VM termination.
Are there any known scenarios where this best-effort period is skipped or shortened (e.g. under load, node problems, shutdown method)?
Yes, there are possible scenarios where the best-effort 15 second graceful termination period may be skipped. These include :
Compute Engine enforces a 30 second window for preemptible VM termination. If the kubelet’s graceful shutdown process such as SIGTERM
delivery and pod cleanup takes longer than expected, the VM may be forcibly terminated before all pods finish their graceful shutdown.
System pod prioritization with system-cluster-critical
or system-node-critical
priority classes are given priority during the second 15 second window of the 30 second shutdown period. If these pods consume significant resources or time, non-system pods may receive less than the intended 15 seconds.
If the node is under heavy CPU, memory, high resource usage by pods or system processes the kubelet
may struggle to process pod terminations promptly.
If the kubelet
is overloaded, misconfigured or crashes during the shutdown process, SIGTERM
delivery may be skipped entirely. This could happen due to bugs, misconfigurations, or resource exhaustion.
How can I diagnose if kubelet failed to deliver the SIGTERM or the Pod didn’t get time to shut down?
You can check the pod or node events. Use kubectl describe pod <pod name\>
/ kubectl describe node <node name\>
to inspect events related to the pod termination.
You can inspect the kubelet logs access the kubelet logs on the affected node (if still available) to check for errors or warnings during the shutdown process. Look for messages about SIGTERM
.
Check GKE node logs or Container Runtime Logs.
Monitor Node Preemption Metrics.
For further information and reference you can refer to these documentations :