As already mentionned in this topic, a process that runs with PID 1 in its own pid namespace inherits a specific behaviour on how to deal with SIGINT
and SIGTERM
which is to ignore them.
This is precisely what happens when running a docker container, but not limited to it.
For example, run this command in a shell as root :
# unshare --pid --fork --mount-proc sleep infinity
This runs a sleep infinity
command in its own pid namespace. You can verify it running the lsns
command in another shell.
# lsns
NS TYPE NPROCS PID USER COMMAND
4026532363 pid 1 292 root sleep infinity
If you tries to send a SIGINT
to this process (with Ctrl+C
in the first shell, or with the kill -s SIGINT <PID>
command in the second shell), it will has no effect.
If you want to get rid of this process, you have to hard-kill it with kill -s SIGKILL <PID>
command in the second shell.
You can check that this process was running with PID 1 in its pid namespace running the ps
command the same way.
# unshare --pid --fork --mount-proc ps
PID TTY TIME CMD
1 pts/0 00:00:00 ps
Essentially you can observe the same.
# docker run -d --rm --name ubuntu ubuntu sleep infinity
d13fc1da3609407332c511f68d5b0513b31fa55df2e9b545044f53bfd0b2dc4b
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d13fc1da3609 docker.io/library/ubuntu:latest sleep infinity 2 seconds ago Up 3 seconds ubuntu
# lsns
4026532384 pid 1 1062 root sleep infinity
Try killing the sleep infinity
process with SIGHUP
, SIGTERM
and SIGKILL
will result in the same behaviour as previously explained, because this process is running with PID 1 in it's own pid namespace.
# docker exec ubuntu ps x
PID TTY STAT TIME COMMAND
1 ? Ss 0:00 sleep infinity
2 ? R 0:00 ps x
docker stop
does ?Without any fancy option, sends a SIGTERM
to the process running with PID 1 in the container pid namespace. If the process is still running after a 10 seconds timeout, sends a SIGKILL
.
This is why a container that runs a process that does not handle signals properly is slow to stop. The first signal is ignored, the second is not.
Documentation here : Docker stop docs
You can verify it with the commands :
# TIMEFORMAT="==> Execution time = %Rs"
# time docker stop ubuntu
ubuntu
==> Execution time = 10.518s
The simplest way consists in using the --init
option when creating the container, which add a binary developped on the tini GitHub project in the newly created container and run it (with PID 1 in the container pid namespace) and asks it to run as a fork the command to run in the container.
Running the same commands as before show this :
# docker run --init -d --rm --name ubuntu ubuntu sleep infinity
27fc4026c264f48c8ee148796f77e7705411691845e4267467b5bc9f2aba609a
# docker exec ubuntu ps x
PID TTY STAT TIME COMMAND
1 ? Ss 0:00 /sbin/docker-init -- sleep infinity
7 ? S 0:00 sleep infinity
8 ? Rs 0:00 ps x
A simple docker stop is very quick, showing that the SIGTERM
signal is handled by the docker-init
process which kills its forks and gracefully stops.
# time docker stop ubuntu
ubuntu
==> Execution time = 0.501s
docker --init
option ?You want to make sure that your init
process declares it's own signal handlers. If you're planning to run a simple sleep infinity
command in your container, you can wrap it a bash script that runs the trap
command prior.
BUT when you run the exec sleep
command from bash, the sleep binary code is run in a blocking way, meaning that it waits to finish before the signals are interpreted again. As a consequence, the trap
command becomes uneffective.
A workaround could consist in using a non blocking (signal responsive) waiting command, like read
when reading from an read/write opened unix pipe created with mkfifo
.
Note that you can simlink a file descriptor to this unix pipe file (and even delete it !) to preserve a non-blocking read
without polluting your container with unecessary pipe files.
This is an example :
#!/bin/bash
trap "exit 0" SIGINT SIGTERM
tmpdir="$(mktemp -d)"
mkfifo "$tmpdir/pipe"
exec 3<>"$tmpdir/pipe"
rm -r "$tmpdir"
read -u3
Put this content in a scripts/run.sh
file on you docker host and do not forget to chmod +x
it.
And now, let's run the whole bunch of commands previously mentionned, using this script as the "init" program, with PID 1 in the container.
# docker run -d --rm -v "$PWD/scripts:/scripts" --name ubuntu ubuntu /scripts/run.sh
8d947443ae6eaf0093378ffb4480c3a67ea221ff240bab251d9f92c9216385f6
# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d13fc1da3609 docker.io/library/ubuntu:latest sleep infinity 2 seconds ago Up 3 seconds ubuntu
# lsns
4026532384 pid 1 2551 root /bin/bash /scripts/run.sh
# TIMEFORMAT="==> Execution time = %Rs"
# time docker stop ubuntu
ubuntu
==> Execution time = 0.441s
Here's a quick docker stop
, without the --init
option, mimicking the sleep command with bash, with the necessary signal handling to stop without hard kill. :-)
Short answer : not a good idea. It is the responsibility of the init process (with PID1 in its pid namespace) on a linux system to reap zombie processes forked from it. Of course the given minimalistic bash script above does not this. More informations about zombie processes at : this link
You can spawn a 100 seconds zombie process adding the (sleep 1 & exec sleep 101) &
command before the read
command in the previous bash script and show it with docker exec ubuntu ps fx
.
Your init process in your container must handle signals properly and reap zombie processes. The --init
option in the docker command line ensures that.