Mongrel 0.1.1 -- A Fast Ruby Web Server (It Works Now, Maybe

Zed_S · January 22, 2006, 4:30pm

On Jan 21, 2006, at 3:07 PM, PA wrote:

Requests per second: 164.92 [#/sec] (mean)
[LuaWeb][3]
Requests per second: 948.32 [#/sec] (mean)

This is great.
Can you add Apache to your list of benchmarks?

Jim F.

Zed_S · January 22, 2006, 6:42pm

Noob question here.

No intent to impugn Zed’s mad skilz or the need for something like
Mongrel. I’m just confused by why it would be common to develop or
deploy a ruby on rails app with something other than production servers
like Apache.

So far, all of the rails demos I have seen are using webrick. This has
been true even for setups like macosx that come with apache already set
up and running.

Does apache not come standard with everything needed to serve a rails
app? If not, is there an add-on module for apache that makes it
rails-savvy?

Or is it the case that all rails apps have to be served by a special
rails server like mongrel or webrick?

thanks,
jp

Zed_S · January 22, 2006, 7:14pm

On Jan 22, 2006, at 16:28, Jim F. wrote:

Can you add Apache to your list of benchmarks?

[httpd]
% httpd -v
Server version: Apache/1.3.33 (Darwin)
% ab -n 10000 http://localhost/test.txt
Requests per second: 1218.47 [#/sec] (mean)

[lighttpd]
% lighttpd -v
lighttpd-1.4.9 - a light and fast webserver
% ab -n 10000 http://localhost:8888/test.txt
Requests per second: 3652.30 [#/sec] (mean)

Cheers

Zed_S · January 22, 2006, 7:08pm

On Mon, 2006-01-23 at 02:42 +0900, Jeff P. wrote:

Does apache not come standard with everything needed to serve a rails
app? If not, is there an add-on module for apache that makes it
rails-savvy?

Or is it the case that all rails apps have to be served by a special
rails server like mongrel or webrick?

Rails can be run through flat CGI with Apache, but that’s really slow
because every single time you make a request, the code has to be
reloaded into memory from disk. The step up from this is something like
FastCGI, or SCGI which will keep the code in memory between requests.
The performance is then WAY better, but in development changes you make
to the code won’t work until you reload the server. Obviously that’s no
good so WEBrick is a lightweight server intended for development use,
which will just reload the parts of the code you change between
requests. It’s not recommended for deployment though because it isn’t
fast enough.

This is a very good article to read:
http://duncandavidson.com/essay/2005/12/railsdeployment

Hope that helps

Jon

Zed_S · January 22, 2006, 10:06pm

Zed S. wrote:

On Jan 22, 2006, at 12:42 PM, Jeff P. wrote:

snip…

Go talk to someone who’s forced to IIS and you’ll see why something
other than WEBrick is really needed. Actually, WEBrick would be fine
if it weren’t so damn slow.

snip…

Thanks for your work on this. Can you elaborate on what makes Mongrel so
much faster than Webrick? What kind of optimization techniques did you
use to make this faster. Are you using C extensions etc in part to speed
things up. (I guess I’m looking for a bit of a architectural overview
with a webrick arch. comparison to boot if you used that as inspiration)

Just curious!

-Amr

Zed_S · January 22, 2006, 11:13pm

Thanks for your work on this. Can you elaborate on what makes Mongrel so
much faster than Webrick? What kind of optimization techniques did you
use to make this faster. Are you using C extensions etc in part to speed
things up. (I guess I’m looking for a bit of a architectural overview
with a webrick arch. comparison to boot if you used that as inspiration)

Just curious!

Yeah, me too. What I wonder about in specific is why not just rewrite
the
performance critical parts of webrick in C. That way you would already
have
the massive amount of features webrick offers without having to
duplicate
all of this. I wonder if you have been thinking about that and the
reason
you might have decided against doing it this way.

Just curious, too!

-Sascha

Zed_S · January 22, 2006, 7:48pm

On Jan 22, 2006, at 12:42 PM, Jeff P. wrote:

Noob question here.

I like noobs. Especially with BBQ sauce.

No intent to impugn Zed’s mad skilz or the need for something like
Mongrel. I’m just confused by why it would be common to develop or
deploy a ruby on rails app with something other than production
servers
like Apache.

Good question. It really comes down to nothing more than the fastest
simplest way to serve up a Rails (or Nitro, Camping, IOWA, etc.)
application. You’ve currently got various options:

CGI – slow, resource hogging, but works everywhere.
FastCGI – Fast, current best practice, a pain in the ass to
install and real painful for win32 people.
SCGI – Fast, pure ruby (runs everywhere Ruby does), works with a
few servers, very simple to install, use, and cluster, good
monitoring (warning, I wrote this).
mod_ruby – Works but haven’t heard of a lot of success with it,
couples your app to your web server making upgrades difficult.
WEBrick – Runs in pure ruby, easy to deploy, you can put it behind
any web server supporting something like mod_proxy. Fairly slow.

Now, the sweet spot would be something that was kind of at the
optimal axis of FastCGI, SCGI, and WEBrick:

Runs everywhere Ruby does and is easy to install and use.
Fast as hell with very little overhead above the web app framework.
Uses plain HTTP so that it can sit behind anything that can proxy
HTTP. That’s apache, lighttpd, IIS, squid, a huge amount of
deployment options open up.

This would be where I’m trying to place Mongrel. It’s not intended
as a replacement for a full web server like Apache, but rather just
enough web server to run the app frameworks efficiently as backend
processes. Based on my work with SCGI (which will inherit some stuff
from Mongrel soon), it will hopefully meet a niche that’s not being
met right now with the current options.

So far, all of the rails demos I have seen are using webrick. This
has
been true even for setups like macosx that come with apache already
set
up and running.

Does apache not come standard with everything needed to serve a rails
app? If not, is there an add-on module for apache that makes it
rails-savvy?

Apache or lighttpd are the big ones on Unix systems. When you get
over to the win32 camp though lighttpd just don’t work, and many
people insist on using IIS. In my own experiences, if you can’t hook
it into a portal or Apache without installing any software then
you’re dead. Sure this is probably an attempt to stop a disruptive
technology, but if there’s a solid fast way to deploy using HTTP then
that’s one more chink in the armor sealed up.

Go talk to someone who’s forced to IIS and you’ll see why something
other than WEBrick is really needed. Actually, WEBrick would be fine
if it weren’t so damn slow.

Or is it the case that all rails apps have to be served by a special
rails server like mongrel or webrick?

Well, they have to be served by something running Ruby. I know
there’s people who have tried with mod_ruby, but I haven’t heard of a
lot of success. I could be wrong on that. Also, many people don’t
like tightly coupling their applications into their web server.

Zed_S · January 23, 2006, 2:51am

On Jan 22, 2006, at 4:07 PM, Amr M. wrote:

Thanks for your work on this. Can you elaborate on what makes
Mongrel so
much faster than Webrick? What kind of optimization techniques did you
use to make this faster. Are you using C extensions etc in part to
speed
things up. (I guess I’m looking for a bit of a architectural overview
with a webrick arch. comparison to boot if you used that as
inspiration)

You’re going to laugh but right now it’s down to a bit of Ruby and a
nifty C extension. Seriously. No need yet of much more than some
threads that crank on output, a parser (in C) that makes a hash, and
a way to quickly lookup URI mappings. The rest is done with handlers
that process the result of this. It may get a bit larger than this,
but this core will probably be more than enough to at least service
basic requests. I’m currently testing out a way to drop the threads
in favor of IO.select, but it looks like that messes with threads in
some weird ways.

Once I figure out all the nooks and crannies of the thing then I’ll
do a more formal design, but even then it’s going to be ruthlessly
simplistic.

Zed A. Shaw

Zed_S · January 23, 2006, 1:18pm

Well, this may be mean, but have you ever considered that “the massive
amount of features webrick offers” is part of the problem? It’s
difficult to go into a large (or even medium) code base and profile it
and then add bolt on performance improvements. It can be done, but it
usually ends up as a wart on the system.

So, rather than try to “fix” WEBrick I’m just considering it a different
solution to a different set of problems. Mongrel may pick up all the
features WEBrick has, but right now it’s targeted at just serving Ruby
web apps as fast as possible.

I suspected something along those lines I would probably do the same
because starting from the beginning is always more fun than trying to
understand a large code base. Although I personally think that the
latter
doesn’t have to be slower. How long could it take to find a dozen slow
spots in webrick? Maybe 2-3 days? Another 2-3 days to tune them?

Anyway, I was just curious, and I am looking forward to following along
and
look and learn from the C code. I personally never had the need for
anything to be faster than Ruby except the http stuff. But since I
have
never actually written more than a couple of lines of C I shyed away
from
starting such a thing.

Another tip: Maybe you want to look at Will Glozer’s Cerise.

http://rubyforge.org/projects/cerise/

It has a minimum bare bones http server entirely written in Ruby. Maybe
it
is of help. Just a thought.

-Sascha

Zed_S · January 24, 2006, 3:27am

Zed S. wrote:

You’re going to laugh but right now it’s down to a bit of Ruby and a
nifty C extension. Seriously. No need yet of much more than some
threads that crank on output, a parser (in C) that makes a hash, and
a way to quickly lookup URI mappings. The rest is done with handlers
that process the result of this.

That’s a PATRICIA trie for URL lookup, a finite state machine compiled
Ragel->C->binary for HTTP protocol parsing and an implicit use of
select(2) (via Thread), for the even-more-curious out there (first
hit on Google for “Ragel” will tell you what you need to know about
that)

It may get a bit larger than this,
but this core will probably be more than enough to at least service
basic requests. I’m currently testing out a way to drop the threads
in favor of IO.select, but it looks like that messes with threads in
some weird ways.

Ok, so here’s where I fell off your train. On your Ruby/Event page, you
said that you killed the project b/c Ruby’s Thread class multiplexes via
the use of select(2), which undermines libevent’s ability to effectively
manage events (which I had discovered while writing some extensions a
while back and thought “how unfortunate”). But I have some questions
about the above:

As above, the Thread class uses select(2) (or poll(2)) internally;
what would be the difference in using IO::select explicitly besides more
code to write to manage it all?
What are these “weird ways” you keep referring to? I got the
select-hogging-the-event-party thing, but what else?

I am interested b/c I am currently trying to write a microthreading
library for Ruby based on some of the more performing event multiplexing
techniques (kqueue, port_create, epoll, etc) so I can use it for other
stuff I want to write (^_^)

Once I figure out all the nooks and crannies of the thing then I’ll
do a more formal design, but even then it’s going to be ruthlessly
simplistic.

Simple is good, m’kay? Great show in any case! I know I’ll be using
this for my next internal Rails app.

Zed_S · January 23, 2006, 3:09am

On Jan 22, 2006, at 5:10 PM, Sascha E. wrote:

you would already have the massive amount of features webrick
offers without having to duplicate all of this. I wonder if you
have been thinking about that and the reason you might have decided
against doing it this way.

Well, this may be mean, but have you ever considered that “the
massive amount of features webrick offers” is part of the problem?
It’s difficult to go into a large (or even medium) code base and
profile it and then add bolt on performance improvements. It can be
done, but it usually ends up as a wart on the system.

So, rather than try to “fix” WEBrick I’m just considering it a
different solution to a different set of problems. Mongrel may pick
up all the features WEBrick has, but right now it’s targeted at just
serving Ruby web apps as fast as possible.

Zed A. Shaw

Zed_S · January 24, 2006, 5:22pm

PA ha scritto:

On Jan 20, 2006, at 13:31, Zed S. wrote:

Mongrel is a web server I wrote this week that performs much better
than WEBrick (1350 vs 175 req/sec) and only has one small C extension.

Being a sucker for meaningless benchmarks I had to run this as well :))

Hey, that was cool. Any chance yo see how would they run with -c 10? (and I wonder how fast twisted.web would be :)

Zed_S · January 24, 2006, 4:52am

On Jan 23, 2006, at 9:27 PM, Toby DiPasquale wrote:

hit on Google for “Ragel” will tell you what you need to know about
that)

Ooohh, that’s what people want to know. You’re right. Here’s the
main gear involved in the process:

Basic Ruby TCPServer is used to create the server socket. No
magic here. A thread then just runs in a loop accepting connections.
When a client is accepted it’s passed to a “client processor”.
This processor is a single function that runs in a loop doing a
readpartial on the socket to get a chunk of data.
That chunk’s passed to a HTTP parser which makes a Ruby Hash with
the CGI vars in it. The parser is written with Ragel 5.2 (which has
problems compiling on some systems). This parser is the first key to
Mongrel’s speed.
With a completed HTTP parse, and the body of the request waiting
to be processed, Mongrel tries to find the handler for the URI. It
does this with a modified trie that returns the handler as well as
break the prefix and postfix of the URI into SCRIPT_INFO and
PATH_INFO components.
Once I’ve got the handler, the request hash variables, and a
request object I just call the “process” method and it does it’s work.

Unhandled issues are:

The trie was written in ruby and isn’t all that fast. A trie might
also be overkill for what will typically be a few URIs. I was
thinking though that the trie would be great for storing cached
results and looking them up really fast.
The thread handling has limitations that make it not quite as
efficient as I’d like. For example, I read 2k chunks off the wire
and parse them. If the request doesn’t fit in the 2k then I have to
reset the parser, keep the data, and parse it again. I’d really much
rather use a nice ring buffer for this.
The threads create a ton of objects which can make the GC cause
large pauses. I’ve tried a group of threads waiting on a queue of
requests, but that’s not much faster or better. So far the fastest
is using IO.select (see below).

the use of select(2), which undermines libevent’s ability to
effectively
manage events (which I had discovered while writing some extensions a
while back and thought “how unfortunate”). But I have some questions
about the above:

Yes, that’s still true since Ruby and libevent don’t know about the
other. They fight like twenty rabid cats in a pillow case. The main
difference is that IO.select knows about Ruby’s threads, so it’s
supposed to be safe to use.

As above, the Thread class uses select(2) (or poll(2)) internally;
what would be the difference in using IO::select explicitly besides
more
code to write to manage it all?

It does use select transparently, but it seems to add a bunch of
overhead to the select processing it uses. I’m sorting out the
IO.select and thread relationship.

What are these “weird ways” you keep referring to? I got the
select-hogging-the-event-party thing, but what else?

Basically select hogs the party, threads just kind of stop for no
reason, select just stops, etc. I really which they’d just use pth
so I could get on with my life. I’ve been playing with it, and
I think I have something that might work.

I am interested b/c I am currently trying to write a microthreading
library for Ruby based on some of the more performing event
multiplexing
techniques (kqueue, port_create, epoll, etc) so I can use it for other
stuff I want to write (^_^)

You know, having tried this, I have to say you’ll be fighting a
losing battle. Ruby’s thread implementation just isn’t able to work
with external multiplexing methods. I couldn’t figure it out, so if
you do then let me know.

Once I figure out all the nooks and crannies of the thing then I’ll
do a more formal design, but even then it’s going to be ruthlessly
simplistic.

Simple is good, m’kay? Great show in any case! I know I’ll be
using
this for my next internal Rails app.

Thanks!

Zed A. Shaw

Zed_S · January 24, 2006, 6:53pm

Zed S. wrote:

The threads create a ton of objects which can make the GC cause
large pauses. I’ve tried a group of threads waiting on a queue of
requests, but that’s not much faster or better. So far the fastest
is using IO.select (see below).

Have you checked to see if your C extension is “leaking” memory by
virtue of Ruby not correctly handling it? This happened to me recently
with a similarly purposed C extension, so much so that I had to do it in
pure C and simply fork/pipe in Ruby to use it. The problem was that my
extension was using ALLOC() and friends for allocation, but Ruby didn’t
understand that it could release that memory, even after the usage of
the process was 3GB+. I moved on, but I will eventually get back there
to find out why that was happening…

Yes, that’s still true since Ruby and libevent don’t know about the
other. They fight like twenty rabid cats in a pillow case. The main
difference is that IO.select knows about Ruby’s threads, so it’s
supposed to be safe to use.

As far as I understand, at the base of it, IO::select’s C handler,
rb_f_select() calls rb_thread_select() to do the actual select’ing. It
appears that there are more functions on top of the rb_thread_select()
when coming at it from the co-op thread scheduling callchain, however.
This would be in line with what you were saying.

What are these “weird ways” you keep referring to? I got the
select-hogging-the-event-party thing, but what else?

Basically select hogs the party, threads just kind of stop for no
reason, select just stops, etc. I really which they’d just use pth
so I could get on with my life. I’ve been playing with it, and
I think I have something that might work.

Does Ruby spawn Thread objects even when not requested by the
programmer? I seem to remember that GC was in a Thread? Is that right?

If not, can you just not spawn any and avoid these issues altogether
(perhaps alias Thread’s new method to raise an exception to make sure it
doesn’t happen?)

You know, having tried this, I have to say you’ll be fighting a
losing battle. Ruby’s thread implementation just isn’t able to work
with external multiplexing methods. I couldn’t figure it out, so if
you do then let me know.

I’m not at all put off by simply replacing select(2) in the Ruby core
with something else, just so I can get what I need,
[porta|releasa]bility be damned. I know this is not the best solution,
but it might be the fastest. I would really like something I could
gem-ify, though, if at all possible. I thought about trying to work this
into YARV and just use that, but that’s nigh-on-unusable at the moment
for other reasons.

Zed_S · January 24, 2006, 8:43pm

On Jan 24, 2006, at 04:51, Zed S. wrote:

Ooohh, that’s what people want to know. You’re right. Here’s the
main gear involved in the process:

Have you tried something like LibHTTPD perhaps?

http://www.hughes.com.au/products/libhttpd/

Cheers

Zed_S · January 24, 2006, 8:40pm

On Jan 24, 2006, at 16:43, gabriele renzi wrote:

Hey, that was cool. Any chance yo see how would they run with -c 10?

[Mongrel]
% ruby -v
ruby 1.8.4 (2005-12-24) [powerpc-darwin7.9.0]
% ruby simpletest.rb
% ab -n 10000 -c 10 http://localhost:3000/test
Requests per second: 386.31 [#/sec] (mean)

[Webrick]
% ruby -v
ruby 1.8.4 (2005-12-24) [powerpc-darwin7.9.0]
% ruby webrick_compare.rb >& /dev/null
% ab -n 10000 -c 10 http://localhost:3000/test
Requests per second: 27.58 [#/sec] (mean)

[Cherrypy]
% python -V
Python 2.4.2
% python tut01_helloworld.py
% ab -n 10000 -c 10 http://localhost:8080/
Requests per second: 164.77 [#/sec] (mean)

[LuaWeb]
% lua -v
Lua 5.1 Copyright (C) 1994-2006 Lua.org, PUC-Rio
% lua Test.lua
% ab -n 10000 -c 10 http://localhost:1080/hello
Requests per second: 927.04 [#/sec] (mean)

[httpd]
% httpd -v
Server version: Apache/1.3.33 (Darwin)
% ab -n 10000 -c 10 http://localhost/test.txt
Requests per second: 1186.10 [#/sec] (mean)

[lighttpd]
% lighttpd -v
lighttpd-1.4.9 - a light and fast webserver
% ab -n 10000 -c 10 http://localhost:8888/test.txt
Called sick today (fdevent.c.170: aborted)

Cheers

Zed_S · January 24, 2006, 9:59pm

PA ha scritto:

On Jan 24, 2006, at 16:43, gabriele renzi wrote:

Hey, that was cool. Any chance yo see how would they run with -c 10?

great, thanks for realizing my wish

Zed_S · January 24, 2006, 9:38pm

Yep, libhttpd is pretty cool. I’ve used it before. It’s also GPL so
it might not work for most folks. It also uses a select loop I
believe so it would fight with Ruby’s threads the same way as other
external select methods.

Zed A. Shaw

Zed_S · January 25, 2006, 1:15am

-----BEGIN PGP SIGNED MESSAGE-----

In article [email protected],
Sascha E. [email protected] wrote:

I suspected something along those lines I would probably do the same
because starting from the beginning is always more fun than trying to
understand a large code base. Although I personally think that the latter
doesn’t have to be slower. How long could it take to find a dozen slow
spots in webrick? Maybe 2-3 days? Another 2-3 days to tune them?

If it were that easy somebody would have already done
it. Profiling and optimizing languages like Ruby can be
quite difficult, if you do it at the C level you often get
results very difficult to interpret or even do anything useful
with. I.e. the profiler shows you spending 80% your time in
some basic underlying routine of ruby. If you do it at a higher
level, the overhead of benchmarking can often skew the results
badly.

So it’s hard to get the data, and optimizing w/o real profiling
data is one of the great evils of programming. With simpler
apps you can often make a good guess, but in my experience
guessing where the time is spent in a more complex application
is almost always wrong.

_ Booker C. Bense

-----BEGIN PGP SIGNATURE-----
Version: 2.6.2

iQCVAwUBQ9bAIWTWTAjn5N/lAQGJQAQAgMHiY0RF+WR72pcQi0f67w2q9lUXa9wG
4pB0SfD73IiOU6D9khf8iL2Kf8dpfQ1Ubsmgpi+cVsKYADXnbZSC1Krjd6HT6Uq7
gFaGnNyj3T6VyRZDbacBR6p2NJSZRa68R2o9kkRo0g160H/a47cE+J7fi22HGjbb
1kvYfxXLgso=
=Kv8Y
-----END PGP SIGNATURE-----

Zed_S · January 25, 2006, 12:09am

Zed S. wrote:

On Jan 23, 2006, at 9:27 PM, Toby DiPasquale wrote:

I am interested b/c I am currently trying to write a microthreading
library for Ruby based on some of the more performing event
multiplexing…

You know, having tried this, I have to say you’ll be fighting a
losing battle. Ruby’s thread implementation just isn’t able to work
with external multiplexing methods. I couldn’t figure it out, so if
you do then let me know.

I’ve been meaning to ask about this as well ever since I saw you killed
Ruby/Event. In your experience is it only a Bad Idea™ to use
poll/libevent in your Ruby app if you’ll also be using Threads, or is
it always a bad idea, even if you can guarentee that “require ‘thread’”
is never issued?

Also, did you ever get a chance to write a port mortem discussing your
findings and the problems you ran into?

This seems like a fairly major serious problem with Ruby that should
be addressed. Event driven programming really enables the whole
“pieces loosely joined” paradigm. I mean its kind of embarrasing that
the best way to do async programming with Ruby is to use Rails’
javascript libraries. (okay, I’m enough of a web geek to think that
that is actually kind of cool, but we’ll ignore that)

Thanks,
kellan