Suppose we write a script which spawns one or more long running processes; if said script receives a signal such as SIGINT or SIGTERM, we probably want its children to be terminated too (normally when the parent dies, the children survives). We may also want to perform some cleanup tasks before the script itself exits. To be able to reach our goal, we must first learn about process groups and how to execute a process in background.
In this tutorial you will learn:
- What is a process group
- The difference between foreground and background processes
- How to execute a program in background
- How to use the shell
waitbuilt in to wait for a process executed in background - How to terminate child processes when the parent receives a signal
Software requirements and conventions used
| Category | Requirements, Conventions or Software Version Used |
|---|---|
| System | Distribution independent |
| Software | No specific software needed |
| Other | None |
| Conventions | # – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command$ – requires given linux commands to be executed as a regular non-privileged user |
A simple example
Let’s create a very simple script and simulate the launch of a long running process:
#!/bin/bash
trap "echo signal received!" SIGINT
echo "The script pid is $"
sleep 30
The first thing we did in the script was to create a trap to catch SIGINT and print a message when the signal is received. We than made our script print its pid: we can get by expanding the $$ variable. Next, we executed the sleep command to simulate a long running process (30 seconds).
We save the code inside a file (say it is called test.sh), make it executable, and launch it from a terminal emulator. We obtain the following result:
The script pid is 101248
If we are focused on the terminal emulator and press CTRL+C while the script is running, a SIGINT signal is sent and handled by our trap:
The script pid is 101248 ^Csignal received!
Although the trap handled the signal as expected, the script was interrupted anyway. Why this happened? Furthermore, if we send a SIGINT signal to the script using the kill command, the result we obtain is quite different: the trap is not immediately executed, and the script goes on until the child process doesn’t exit (after 30 seconds of “sleeping”). Why this difference? Let’s see…
Process groups, foreground and background jobs
Before we answer the questions above, we must better grasp the concept of process group.
A process group is a group of processes which share the same pgid (process group id). When a member of a process group creates a child process, that process becomes a member of the same process group. Each process group have a leader; we can easily recognize it because its pid and the pgid are the same.
We can visualize pid and pgid of running processes using the ps command. The output of the command can be customized so that only the fields we are interested in are displayed: in this case CMD, PID and PGID. We do this by using the -o option, providing a comma-separated list of fields as argument:
$ ps -a -o pid,pgid,cmd
If we run the command while our script is running the relevant part of the output we obtain is the following:
PID PGID CMD 298349 298349 /bin/bash ./test.sh 298350 298349 sleep 30
We can clearly see two processes: the pid of the first one is 298349, same as its pgid: this is the process group leader. It was created when we launched the script as you can see in the CMD column.
This main process launched a child process with the command sleep 30: as expected the two processes are in the same process group.
When we pressed CTRL-C while focusing on the terminal from which the script was launched, the signal was not sent only to the parent process, but to the entire process group. Which process group? The foreground process group of the terminal. All processes member of this group are called foreground processes, all the others are called background processes. Here is what the Bash manual has to say on the matter:
When we sent the SIGINT signal with the kill command, instead, we targeted only the pid of the parent process; Bash exhibits a specific behavior when a signal is received while it’s waiting for a program to complete: the “trap code” for that signal is not executed until that process has finished. This is why the “signal received” message was displayed only after the sleep command exited.
To replicate what happens when we press CTRL-C in the terminal using the kill command to send the signal, we must target the process group. We can send a signal to a process group by using the negation of the pid of the process leader, so, supposing the pid of the process leader is 298349 (as in the previous example), we would run:
$ kill -2 -298349
Manage signal propagation from inside a script
Now, suppose we launch a long running script from a non interactive shell, and we want said script to manage signal propagation automatically, so that when it receives a signal such as SIGINT or SIGTERM it terminates its potentially long running child, eventually performing some cleanup tasks before exiting. How we can do this?
Like we did previously, we can handle the situation in which a signal is received in a trap; however, as we saw, if a signal is received while the shell its waiting for a program to complete, the “trap code” is executed only after the child process exits.
This is not what we want: we want the trap code to be processed as soon as the parent process receives the signal. To achieve our goal, we must execute the child process in the background: we can do this by placing the & symbol after the command. In our case we would write:
#!/bin/bash
trap 'echo signal received!' SIGINT
echo "The script pid is $"
sleep 30 &
If we would leave the script this way, the parent process would exit right after the execution of the sleep 30 command, leaving us without the chance to perform clean up tasks after it ends or is interrupted. We can solve this problem by using the shell wait built in. The help page of wait defines it this way:
After we set a process to be executed in the background, we can retrieve its pid in the $! variable. We can pass it as an argument to wait to make the parent process wait for its child:
#!/bin/bash
trap 'echo signal received!' SIGINT
echo "The script pid is $"
sleep 30 &
wait $!
Are we done? No, there is still a problem: the reception of a signal handled in a trap inside the script, causes the wait builtin to return immediately, without actually waiting for the termination of the command in background. This behavior is documented in the Bash manual:
To solve this problem we have to use wait again, perhaps as part of the trap itself. Here is what our script could look like in the end:
#!/bin/bash
cleanup() {
echo "cleaning up..."
# Our cleanup code goes here
}
trap 'echo signal received!; kill "${child_pid}"; wait "${child_pid}"; cleanup' SIGINT SIGTERM
echo "The script pid is $"
sleep 30 &
child_pid="$!"
wait "${child_pid}"
In the script we created a cleanup function where we could insert our cleanup code, and made our trap catch also the SIGTERM signal. Here is what happens when we run this script and send one of those two signals to it:
- The script is launched and the
sleep 30command is executed in the background; - The pid of the child process is “stored” in the
child_pidvariable; - The script waits the termination of the child process;
- The script receives a
SIGINTorSIGTERMsignal - The
waitcommand returns immediately, without waiting for the child termination;
At this point the trap is executed. In it:
- A
SIGTERMsignal (thekilldefault) is sent to thechild_pid; - We
waitto make sure the child is terminated after receiving this signal. - After
waitreturns, we execute thecleanupfunction.
Propagate the signal to multiple children
In the example above we worked with a script which had only one child process. What if a script has many children, and what if some of them have children of their own?
In the first case, one quick way to get the pids of all the children is to use the jobs -p command: this command displays the pids of all the active jobs in the current shell. We can than use kill to terminate them. Here is an example:
#!/bin/bash
cleanup() {
echo "cleaning up..."
# Our cleanup code goes here
}
trap 'echo signal received!; kill $(jobs -p); wait; cleanup' SIGINT SIGTERM
echo "The script pid is $"
sleep 30 &
sleep 40 &
wait
The script launches two processes in the background: by using the wait built in without arguments, we wait for all of them, and keep the parent process alive. When the SIGINT or SIGTERM signals are received by the script, we send a SIGTERM to both of them, having their pids returned by the jobs -p command (job is itself a shell built-in, so when we use it, a new process is not created).
If the children have children process of their own, and we want to terminate them all when the ancestor receives a signal, we can send a signal to the entire process group, as we saw before.
This, however, presents a problem, since by sending a termination signal to the process group, we would enter a “signal-sent/signal-trapped” loop. Think about it: in the trap for SIGTERM we send a SIGTERM signal to all members of the process group; this includes the parent script itself!
To solve this problem and still be able to execute a cleanup function after child processes are terminated, we must change the trap for SIGTERM just before we send the signal to the process group, for example:
#!/bin/bash
cleanup() {
echo "cleaning up..."
# Our cleanup code goes here
}
trap 'trap " " SIGTERM; kill 0; wait; cleanup' SIGINT SIGTERM
echo "The script pid is $"
sleep 30 &
sleep 40 &
wait
In the trap, before sending SIGTERM to the process group, we changed the SIGTERM trap, so that the parent process ignores the signal and only its descendants are affected by it. Notice also that in the trap, to signal the process group, we used kill with 0 as pid. This is a sort of shortcut: when the pid passed to kill is 0, all the processes in the current process group are signaled.
Conclusions
In this tutorial we learned about process groups and what is the difference between foreground and background processes. We learned that CTRL-C sends a SIGINT signal to the entire foreground process group of the controlling terminal, and we learned how to send a signal to a process group using kill. We also learned how to execute a program in the background, and how to use the wait shell built in to wait for it to exit without loosing the parent shell. Finally, we saw how to setup a script so that when it receives a signal it terminates its children before exiting. Did I miss something? Do you have your personal recipes to accomplish the task? Don’t hesitate to let me know!
