Try this:
- set the “disable-time” => 0 inside your SCGI lighttpd config (same
place you put host and port).
- restart lighttpd.
- shutdown your scgi.
- Hit the web page.
Now, if you get a 500 error, that’s good. Then do this:
- Start scgi, right away hit the page.
You should get the page back immediately.
Do this about 10 times until you’re confident that yes, lighttpd has now
stopped putting a ridiculous 60 second death time on your scgi (and
fastcgi) processor.
Once you do this, you can restart any of the three servers
independently.
Extra geek points for making a special 500 page that tell people the
server is down temporarily.
The cause? As mentioned before lighttpd will disable a processor (scgi
or
fcgi) if it is down for a default time of 60 seconds. Since this means
that a quick restart of the scgi process could knock your app out for 60
seconds, this turns out to be an incredibly bad design choice.
Another alternative I’ve toyed with is “scgi flopping”. Haven’t
implemented it, but the idea is that you’d run a cluster like normal of
say 3 processors. The “flop” command would do the following:
- Take down processors 1 and 2 and wait for them to exit. Lighttpd
sees
that those are down, so starts redirecting all traffic to #3.
- Lighttpd disables them for 60 seconds like normal, so you now have
about a minute to get them back up. The flop command would restart #1,
and #2.
- Once #1 and #2 were up, it would shutdown #3. Lighttpd disables #3
for 60 seconds and starts sending requests to #1 and #2. Tricky part
here
is figuring out when lighttpd has enabled #1 and #2. Probably look at
the
log files.
- Finally, flop would then restart #3 with the new code and you’d be
back in business.
Again, this is ultra freaking complicated, but would probably give you a
very very graceful restart to a new code base.
Anyway, try the trick above.
Zed A. Shaw