[ANN] Mongrel 0.3.13 Pre-Release -- Caulking Release

Zed_S · June 15, 2006, 9:08pm

Hello Everyone,

Francois S. inadvertently found a way to replicate a rare but deadly
bug right as I was working up the official release of Mongrel 0.3.13.
This bug only happened to a few people, but thanks to the wonderful
fuzzing tool Apache Bench[1] he could replicate the slow select
starvation people were seeing.

TESTING

This bug is now fixed in the current pre-release, and I’d like everyone
to grab it in the usual way:

$ gem install mongrel --source=http://mongrel.rubyforge.org/releases/

And run Francois little mongrel killer script (in a bash shell):

while true; do ab -n 1000 -c 30 http://localhost:3000/ 2>/dev/null |
grep quest.*mean ; echo – ; done

You will need Apache Bench for this to work.

Specifically hit this against a file and potentially a Rails action or
two. Also try varying the -c option to higher levels.

EXPECTED RESULTS

You will probably see a bunch of Broken Pipe or other errors when you
hit files, but otherwise you should see the same performance for the
same -c settings. If you see the performance degrade over time (very
quickly) then shoot me an e-mail with the errors you see and the
operating system you use.

OFFICIAL RELEASE

If nobody has problems with this release then I’ll make it official
tonight.

BUG DESCRIPTION

In general, if you are using Ruby threads and a socket produces an
error, then don’t use that socket anymore. Turns out that Ruby will
happily let you continue using the socket, but most OS select()
functions have strange semantics with dead sockets. In this case, Ruby
is most likely waiting for a read or write event on the socket, but
since it’s dead there will be none.

On most operating systems this turns out to put any thread that uses
that socket into a permanently sleeping state.

What should happen is that any socket that throws an exception should
be put into an invalid state so that further operations on it blow up
and then any invalid sockets are not put into the select loop for
threads.

I’ve got tests going which exercise this bug and have refactored all of
the socket usage so that exceptions are detected and the socket isn’t
used anymore after that. Hopefully this squashes it permanently.

Thanks for the help.

–
Zed A. Shaw

http://mongrel.rubyforge.org/

[1] Apache Bench is the worst piece of crap on the planet. The reason
this bug shows up is because ab actually violently closes sockets on any
connections that “take too long”. It of course doesn’t define what “too
long” is and doesn’t tell you it’s going to do this. Use httperf. This
behavior is available with httperf but you have to turn it on
explicitly. If you need to see what happens when really nasty clients
hit your server in the thousands, then use ab. Otherwise it’s
performance measurements are total crap since it can’t possibly be
getting an accurate reading if it’s closing most of the sockets.

Zed_S · June 15, 2006, 9:24pm

Caulking Release? Is this still Enterprisey ready?

Joe

Zed_S · June 16, 2006, 3:16pm

Zed S. wrote:

If nobody has problems with this release then I’ll make it official
tonight.

How did the testing go? I’m just getting my new Mac systems up and
running and looking forward to dropping Mongrel on them.

Zed_S · June 17, 2006, 12:58am

On Fri, 2006-06-16 at 15:16 +0200, Roy Jenkins wrote:

Zed S. wrote:

If nobody has problems with this release then I’ll make it official
tonight.

How did the testing go? I’m just getting my new Mac systems up and
running and looking forward to dropping Mongrel on them.

So far so good. I’ve got a fresh little pre-release that fixes up some
of the logging and outputs, and prints a warning if you run mongrel in
daemon mode twice. Just now trying the install on Mac OSX, and finished
win32.

I’m looking to have it out in about an hour unless something horrible
comes up.

–
Zed A. Shaw

http://mongrel.rubyforge.org/