Killing sons (Linux)

Maybe this isn’t strictly a Ruby question, but I hope someone here can
help:

I have a job-management application, with a central daemon which
receives job requests. Upon receiving this request, it forks and then
runs “system” to run bash, which in turn runs the Matlab job. I use bash
for this in order to redirect the input and output from Matlab. pstree
output looks like this:

init-±apache2—8*[apache2]
|-atd

|-ruby-±4*[ruby—bash—MATLAB-±matlab_helper]
| | -15*[{MATLAB}]] |-{ruby}

Legend:
daemon ^ ^ daemon fork

Now, my system also allows a ‘kill’ command, intended to stop the job in
progress. This has been causing me a lot of trouble, and I suddenly
(after quite a while the system has been in production, how embarassing)
realized why - the PID I’m keeping is of the daemon fork. Killing it
doesn’t kill all of its sons - it causes bash to get reparented to init!

Any idea of a clean, quick way to fix this?

On 5/28/07, Ohad L. [email protected] wrote:

Now, my system also allows a ‘kill’ command, intended to stop the job in
progress. This has been causing me a lot of trouble, and I suddenly
(after quite a while the system has been in production, how embarassing)
realized why - the PID I’m keeping is of the daemon fork. Killing it
doesn’t kill all of its sons - it causes bash to get reparented to init!

Any idea of a clean, quick way to fix this?

Make your parent process the leader of its own process group with
setpgid(0,0). When you fork, add each child to the parent’s process
group
with setpgid(0, getppid()). If you fork subchildren, make sure they get
added to the same process group. Now, to send a signal to the whole
group,
send it to (0 - pid), where pid is that of the parent. If you want them
all
to die without killing the leader, use a signal whose default behavior
is
terminate-process and ignore it in the parent.

Francis C. wrote:

On 5/28/07, Ohad L. [email protected] wrote:

Now, my system also allows a ‘kill’ command, intended to stop the job in
progress. This has been causing me a lot of trouble, and I suddenly
(after quite a while the system has been in production, how embarassing)
realized why - the PID I’m keeping is of the daemon fork. Killing it
doesn’t kill all of its sons - it causes bash to get reparented to init!

Any idea of a clean, quick way to fix this?

Make your parent process the leader of its own process group with
setpgid(0,0). When you fork, add each child to the parent’s process
group
with setpgid(0, getppid()). If you fork subchildren, make sure they get
added to the same process group. Now, to send a signal to the whole
group,
send it to (0 - pid), where pid is that of the parent. If you want them
all
to die without killing the leader, use a signal whose default behavior
is
terminate-process and ignore it in the parent.

Just to be sure - if I run the following Ruby code on a Linux system:

child = fork do
Process::setpgid 0,0
system ‘bash -c “sleep 300”’
end
Process::kill 9, -child

Then I am guaranteed that no child bash, sleep or ruby process will
remain? It works, I just want to be sure I can count on that behaviour.
For contrast, in my original code, bash gets reparented to init:

child = fork do
system ‘bash -c “sleep 300”’
end
Process::kill 9, child

And this code doesn’t even work (ESRCH: No such process)

child = fork do
system ‘bash -c “sleep 300”’
end
Process::kill 9, -child

Thank you for your help!

On 5/28/07, Ohad L. [email protected] wrote:

remain? It works, I just want to be sure I can count on that behaviour.
With signals, nothing is ever “guaranteed.”

Why are you using signal 9 instead of something more system-friendly? 9
should be your last resort when all else fails, because it doesn’t give
your
processes a chance to exit cleanly.

I have the uncomfortable feeling that your code is working by accident,
because the subprocesses which the “system” call creates aren’t
explicitly
added to the process group of the fork child. If you run this program in
a
shell, it will probably work, because of the process-group semantics
defined
by most shells. However, if you run it as a headless daemon or from a
cron
job, it may not work. Try it and see.

When I have to do what you’re trying to do, I usually avoid calling
system.
Instead I call fork, and in the child I call setpgid(0, getppid()), and
then
exec.

I’ve learned how to do redirection from within ruby
($stdwhatever.reopen), and switched to using exec instead of system - so
now I avoid two levels of depth. I also use setpgid and kill the whole
group for good measure.

Much thanks for your help!