Forum: Mongrel Re: Why not ignore stale PID files?

204784d162fece694532d2ef5cdc5ca5?d=identicon&s=25 Hongli Lai (Guest)
on 2008-06-11 01:26
(Received via mailing list)
Zed A. Shaw wrote:
 > That would be the ideal situation, but Ruby doesn't have good enough
 > process management APIs to do this portably.

Erik Hetzner:
 > ... but not the edge case where a process is running, with
 > the same owner, but is no longer a mongrel process.

I feel obligated to reply. :) PID files suck. I think it's really stupid
that modern operating systems don't provide some kind of mechanism to
automatically delete a file when a process exits (even when it exits
abnormally).

Anyway, I've written a fair share of daemons in the past. What I tend to
do is to combine PID files with a number of lock files:
- foo.pid. This is obviously the PID file.
- foo.lock. This is a lock file whose lock is acquired during the life
time of the daemon. If the daemon exits, whether normally or abnormally,
the lock on that file is released. To check whether foo.pid is stale, we
simply check whether foo.lock is locked.

The only way to check whether foo.lock is locked, is to lock it with the
non-blocking parameter. If locking fails then it means it's already
locked, meaning that the PID file is not stale. However, this could
result in a racing condition. Suppose that you are starting a daemon,
while simultaneously checking whether the daemon is already started:
1. The checker acquires a non-blocking lock on foo.lock. This succeeds,
so it knows that the PID file is stale. It prints "stale PID file
detected" on screen, and is about to release the lock on foo.lock.
2. All of a sudden, before the lock is released, a context switch
occurs. The daemon that is being started tries to acquire a lock on
foo.lock. This fails because the checker still has the lock, so the
daemon thinks that there's already a daemon running, and exits.

So we need another lock file to serialize all PID file related actions:
- foo.global.lock

So the code for checking whether the daemon's running is something like
this:
   def check():
      lock(foo.global.lock)
      if try_lock(foo.lock):
         # Locking succeeded, so we have a stale PID file here.
         unlock(foo.lock)
         unlock(foo.global.lock)
         return nil
      else:
         # Locking failed. Process is still running.
         pid = read_pid_file(foo.pid)   # Of course, your code should
also check whether the PID file actually exist.
         unlock(foo.global.lock)
         return pid

Daemon code:
   lock(foo.global.lock)
   write_pid_file(foo.pid)
   lock(foo.lock)
   unlock(foo.global.lock)

   main_loop()

   lock(foo.global.lock)
   delete_file(foo.pid)
   unlock(foo.lock)
   unlock(foo.global.lock)

NOTE: lock() creates the lock file if it doesn't already exist.

This works great, even on Windows. The only gotchas are:
- flock() doesn't work over NFS. You'll have to use some kind of fcntl()
call to lock files over NFS, but I'm not sure whether Ruby provides an
API for that.
- foo.global.lock is never deleted. You cannot safely delete it without
creating some kind of racing condition.
204784d162fece694532d2ef5cdc5ca5?d=identicon&s=25 Hongli Lai (Guest)
on 2008-06-11 01:29
(Received via mailing list)
Hongli Lai wrote:
> This works great, even on Windows. The only gotchas are:
> - flock() doesn't work over NFS. You'll have to use some kind of fcntl()
> call to lock files over NFS, but I'm not sure whether Ruby provides an
> API for that.
> - foo.global.lock is never deleted. You cannot safely delete it without
> creating some kind of racing condition.

I forgot to mention that it is safe to delete foo.lock. So the shutdown
part of the daemon code should look like this:

   lock(foo.global.lock)
   delete_file(foo.pid)
   unlock(foo.lock)
   delete_file(foo.lock)     # added this line
   unlock(foo.global.lock)
0107ef1bc42d0626a706ca6af9a43060?d=identicon&s=25 Jos Backus (Guest)
on 2008-06-11 01:57
(Received via mailing list)
On Wed, Jun 11, 2008 at 01:25:41AM +0200, Hongli Lai wrote:
> PID files suck.

Agreed. Just use daemontools or runit or some other process manager - no
pidfiles or complicated locking code needed.
3ef412153effaffc2b433df58b13ed7e?d=identicon&s=25 Scott Windsor (Guest)
on 2008-06-11 03:25
(Received via mailing list)
Has anyone considering turning the mongrel_cluster into a process
manager
daemon?

I know that generally many people rely on other applications (such as
monit)
to ensure that mongrels are up and running, but it seems that integrated
process management out of the box would be a large win.  The
mongrel_cluster
could remain running (rather than exiting) and keep track of the running
mongrels (potentially restarting them if they die or zombie).  At that
point, pid files become uneeded for tracking running mongrels.  The only
exception would be if the mongrel cluster itself dies - at this point it
would orphan the child processes and it would up to the cluster to kill
off
(or resume ownership) of any orphaned processes.

thoughts?

- scott
0107ef1bc42d0626a706ca6af9a43060?d=identicon&s=25 Jos Backus (Guest)
on 2008-06-11 03:42
(Received via mailing list)
On Tue, Jun 10, 2008 at 06:24:58PM -0700, Scott Windsor wrote:
> Has anyone considering turning the mongrel_cluster into a process manager
> daemon?

I'm not using this myself (I use standalone daemontools) but
mongrel_runit
should fit the bill at least somewhat:

    https://wiki.hjksolutions.com/display/MR/Home
8c43ed7f065406bf171c0f3eb32cf615?d=identicon&s=25 Zed A. Shaw (Guest)
on 2008-06-11 22:29
(Received via mailing list)
On Tue, 10 Jun 2008 16:50:39 -0700
Jos Backus <jos@catnook.com> wrote:

> On Wed, Jun 11, 2008 at 01:25:41AM +0200, Hongli Lai wrote:
> > PID files suck.
>
> Agreed. Just use daemontools or runit or some other process manager - no
> pidfiles or complicated locking code needed.

You ever read the code to runit?  I wouldn't touch that thing with a
10' pole.  Haven't used daemontools though.

--
Zed A. Shaw
- Hate: http://savingtheinternetwithhate.com/
- Good: http://www.zedshaw.com/
- Evil: http://yearofevil.com/
0107ef1bc42d0626a706ca6af9a43060?d=identicon&s=25 Jos Backus (Guest)
on 2008-06-12 01:47
(Received via mailing list)
On Wed, Jun 11, 2008 at 04:23:10PM -0400, Zed A. Shaw wrote:
> You ever read the code to runit?  I wouldn't touch that thing with a
> 10' pole.  Haven't used daemontools though.

Haven't looked at runit code, no. Daemontools so far has worked great
for me
for over a decade.
This topic is locked and can not be replied to.