Hi, I have an application which is dying horrible deaths (i.e. segmentation faults) in mid-flight, in production... And of course, I should fix it. But while I find and fix the bugs, I found something I think should be different - I can work on submitting a patch, as it is quite simple, but I might be losing something on my rationale. When Mongrel segfaults, it does not -obviously- get to clean up after itself, so it does not remove the PID files. As an example: $ sudo /etc/init.d/mongrel-cluster start Starting mongrel-cluster: Starting all mongrel_clusters... mongrel-cluster. $ sudo cat tmp/pids/mongrel.8203.pid | xargs kill -9 $ sudo /etc/init.d/mongrel-cluster status (...) found pid_file: tmp/pids/mongrel.8203.pid missing mongrel_rails: port 8203 (...) $ sudo /etc/init.d/mongrel-cluster restart Restarting mongrel-cluster: Restarting all mongrel_clusters... ** !!! PID file tmp/pids/mongrel.8203.pid already exists. Mongrel could be running already. Check your log/mongrel.8203.log for errors. ** !!! Exiting with error. You must stop mongrel and clear the .pid before I'll attempt a start. mongrel-cluster. So, what's the solution? I must manually do: $ sudo rm tmp/pids/mongrel.8203.pid $ sudo /etc/init.d/mongrel-cluster restart And now it works. What should happen? Well, 'status' already found that there is a stale PID. Of course, the 'status' action means exactly that: Get the status, do nothing else. But the 'stop' action should clean the PIDs if they do no longer exist, and the 'start' action should check whether the process with that PID is alive, and ignore it if it's not. At least, this behaviour should be specifiable via the configuration file. What do you think? -- Gunnar Wolf - gwolf@iiec.unam.mx - (+52-55)5623-0154 / 1451-2244 PGP key 1024D/8BB527AF 2001-10-23 Fingerprint: 0C79 D2D1 2C4E 9CE4 5973 F800 D80E F35A 8BB5 27AF
on 2008-06-06 04:19
on 2008-06-06 07:08
On Thu, 5 Jun 2008 16:08:06 -0500 Gunnar Wolf <gwolf@gwolf.org> wrote: > What should happen? Well, 'status' already found that there is a stale > PID. Of course, the 'status' action means exactly that: Get the > status, do nothing else. But the 'stop' action should clean the PIDs > if they do no longer exist, and the 'start' action should check > whether the process with that PID is alive, and ignore it if it's > not. At least, this behaviour should be specifiable via the > configuration file. That would be the ideal situation, but Ruby doesn't have good enough process management APIs to do this portably. To make it work you'd have to portably be able to take a PID and see if there's a mongrel running with that PID. You can't use /proc or /sys because that's linux only. You can't use `ps` because the OSX morons changed everything, Solaris has different format, etc. If you were to do this, you'd have to dip into C code to pull it off. Now, if you're only on linux then you could write yourself a small little hack to the mongrel_rails script that did this with info out of /proc. -- Zed A. Shaw - Hate: http://savingtheinternetwithhate.com/ - Good: http://www.zedshaw.com/ - Evil: http://yearofevil.com/
on 2008-06-06 20:08
_______________________________________________ Mongrel-users mailing list Mongrel-users@rubyforge.org http://rubyforge.org/mailman/listinfo/mongrel-users
on 2008-06-06 20:58
kill -0 `cat pid_file` >& /dev/null more like kill -0 $(<pid_file) >& /dev/null regards, Istvan
on 2008-06-06 23:41
Zed A. Shaw dijo [Fri, Jun 06, 2008 at 01:01:32AM -0400]: > > Now, if you're only on linux then you could write yourself a small > little hack to the mongrel_rails script that did this with info out > of /proc. Oh, silly me... I thought Ruby's Process class did with the architectural incompatibilities... What I wrote to check for the status is quite straightforward: ------------------------------------------------------------ #!/usr/bin/ruby require 'yaml' confdir = '/etc/mongrel-cluster/sites-enabled' restart_cmd = '/etc/init.d/mongrel-cluster restart' needs_restart = false (Dir.open(confdir).entries - ['.', '..']).each do |site| conf = YAML.load_file "#{confdir}/#{site}" pid_location = [conf['cwd'], conf['pid_file']].join('/').gsub(/\.pid$/, '*.pid') pid_files = Dir.glob(pid_location) pid_files.each do |pidf| pid = File.read(pidf) begin Process.getpgid(pid.to_i) rescue Errno::ESRCH warn "Process #{pid} (cluster #{site}) is dead!" File.unlink pidf needs_restart = true end end end system(restart_cmd) if needs_restart ------------------------------------------------------------ (periodically run via cron) I guess this works in any Unixy environment... I have no idea on whether Windows implements something similar to Process.getpgid, or for that matter, anything on Windows' process management. Greetings, -- Gunnar Wolf - gwolf@gwolf.org - (+52-55)5623-0154 / 1451-2244 PGP key 1024D/8BB527AF 2001-10-23 Fingerprint: 0C79 D2D1 2C4E 9CE4 5973 F800 D80E F35A 8BB5 27AF
on 2008-06-06 23:41
Tikhon Bernstam dijo [Thu, Jun 05, 2008 at 07:29:22PM -0700]:
> use the mongrel_cluster --clean option
Very good addition to the overall logic, keeps things cleaner :-)
--
Gunnar Wolf - gwolf@gwolf.org - (+52-55)5623-0154 / 1451-2244
PGP key 1024D/8BB527AF 2001-10-23
Fingerprint: 0C79 D2D1 2C4E 9CE4 5973 F800 D80E F35A 8BB5 27AF
on 2008-06-07 01:18
Gunnar Wolf <gwolf@gwolf.org> wrote: > > If you were to do this, you'd have to dip into C code to pull it off. > > > I guess this works in any Unixy environment... I have no idea on > whether Windows implements something similar to Process.getpgid, or > for that matter, anything on Windows' process management. Process.kill(0, pid) also works and is (in my experience) more widely used.
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.