Decent banchmark results?

roller8 · March 9, 2007, 8:13pm

Hi again folks! Everything is going really well since I last posted
and I’m very close to live deployment!

Anyway, I was benchamrking various Nginx + Mongrel cluster configs and
came up with what seems like my best performance for a regular Hello
Rails page. I was hoping I could get some estimates so I can save the
time of having to try out other solutions (ie, Apache+Mongrel, etc,
etc).

Hardware: Dual dual-core Xeon, 16gb ram, SCSI SAS mirror.

Best test results: 6 Nginx and 5 Mongrels were enough to meet this.
I tried more and less of both in different combos. I got approx. 215
req/sec +/- over about 5 httperf’s tests with these params:

httperf --server 127.0.0.1 --port 80 --uri /say/hello --rate 250 --num-
conn 10000 --num-call 1 --timeout 5

I am using Ezra’s latest nginx.conf with the necessary modifications
to root directory and mongrel cluster block and it’s all working fine.

I really just want to know if this sounds like a correct average. I
realize I can do a lot more with this hardware so I plan to do some
virtualization as recommended earlier with a hardware load balancer up
front.

Thanks everyone!

Raul

roller8 · March 9, 2007, 9:05pm

On Mar 9, 2:12 pm, “roller8” [email protected] wrote:

Anyway, I was benchamrking various Nginx + Mongrel cluster configs and
came up with what seems like my best performance for a regular Hello
Rails page.

Could you explain what a “regular” Hello Rails page is?

Hardware: Dual dual-core Xeon, 16gb ram, SCSI SAS mirror.

That’s a lot of hardware.

Best test results: 6 Nginx and 5 Mongrels were enough to meet this.
I tried more and less of both in different combos. I got approx. 215
req/sec +/- over about 5 httperf’s tests with these params:

Those numbers sound a low to me, actually. I’d expect at least 40 page
of session creation + rhtml render (no other DB activity) per core,
which
would be around 320/second.

Is the DB on the same box? Was the test running on the same box?

I’d recommend you run 8 nginx (1 per core), and 32 mongrels (4 per
core).

Also, 10,000 concurrent requests seems a bit high, but shouldn’t
really
affect aggregate perforance.

–
– Tom M.

roller8 · March 10, 2007, 12:23am

Hi Tom,

On 3/9/07, [email protected] [email protected] wrote:

Is the DB on the same box? Was the test running on the same box?

I’d recommend you run 8 nginx (1 per core), and 32 mongrels (4 per
core).

Is ~4 mongrels per core a normal baseline you start with for hardware
like that? A site I’m working on will have a simliar setup – 3 web
servers with dual dual-core xeons, though with less ram (4 to 8 gigs).
This would be behind Apache, though, would that change the
recommendation of 4 per core?

I plan on doing plenty of tests with http-perf, of course, just
looking for a good starting point.

Rob

roller8 · March 10, 2007, 4:08am

On Mar 9, 6:22 pm, “Rob S.” [email protected] wrote:

Is ~4 mongrels per core a normal baseline you start with for hardware
like that? A site I’m working on will have a simliar setup – 3 web
servers with dual dual-core xeons, though with less ram (4 to 8 gigs).
This would be behind Apache, though, would that change the
recommendation of 4 per core?

There’s a general understanding in Unix performance tuning that a load
of 4.0, which means that at any moment in time there are 4 processes
running or waiting to be scheduled (i.e. in the run queue), is
considered
saturation.

This is a very indirect measurement of exactly what is going on in a
system and assumes that those processes are CPU bound, and not
bound on other things such as network I/O, disk I/O, etc.

So, very simplistically speaking, a good place to start in just about
any
tuning project is 4 running processes per core. Again, this is an
ultra
simplistic way of looking at things.

In a typical Rails deployment scenario, the front-end web servers are
highly unlikely to break a sweat compared to the back-end application
servers. Since we’re talking about such a rough measurement and only
a place to begin tuning at, I literally wouldn’t even consider the
front
end processes into this equation.

If MySQL is running on the same box, however, that would figure into
the equation.

Unix system performance tuning is a very complex subject, and it
changes all the time. That said, some of the best books written on the
subject were written a long time ago, when resources such as CPU
cycles, RAM, and disk storage were all scarce and expensive.

Here are two of the best books I’ve ever read on the subject, and
would highly recommend everyone interested in deployment read:

I’d also recommend a book by Adrian Cocroft, which I believe was
called Solaris Performance Tuning. I’m shocked to find almost
zero references to that book in Google. It was a really great book.

If it’s out of print, it makes me want to run to my bookshelf and
make sure it’s still around, because it’s a really great book.

–
– Tom M., CTO
– Engine Y.

roller8 · March 10, 2007, 3:00am

On 3/9/07, Rob S. [email protected] wrote:

Best test results: 6 Nginx and 5 Mongrels were enough to meet this.
I tried more and less of both in different combos. I got approx. 215
req/sec +/- over about 5 httperf’s tests with these params:

Those numbers sound a low to me, actually.

Indeed. I can get over 700 dynamic “Hello, World” actions per second
with 3
Mongrels on a Dell D620 laptop. My definition of Hello, world is this:

class TestsController < AppliucatrionController
session :off
def say_hi
render :text => ‘Hi!’
end
end

Note the “session :off” bit - this is very important.

Alex

roller8 · March 10, 2007, 4:25am

On 3/9/07, [email protected] [email protected] wrote:

What’s the point of a test that has so little basis in real world
usage?

Are we talking about “hello world”, or real world?

For real world (Mephisto with page caching off), the same laptop does
~40
req/sec.

Without a session creation, you might just as well have served a

static HTML page, which would have returned higher numbers yet.

True. And that was exactly what I wanted to establish with hello world
test

that it’s not much slower than serving static files.

Alex

roller8 · March 10, 2007, 4:11am

On Mar 9, 8:59 pm, “Alexey V.” [email protected]
wrote:

session :off
def say_hi
  render :text => 'Hi!'
end
end

Note the “session :off” bit - this is very important.

What’s the point of a test that has so little basis in real world
usage?

The idea is to measure realistic performance, not see how large a
number you
can generate!

Without a session creation, you might just as well have served a
static HTML
page, which would have returned higher numbers yet.

–
– Tom M., CTO
– Engine Y.

roller8 · March 10, 2007, 6:44am

inline…

-Michael
http://javathehutt.blogspot.com

On Mar 9, 2007, at 7:07 PM, [email protected] wrote:

There’s a general understanding in Unix performance tuning that a load
any

If MySQL is running on the same box, however, that would figure into
the equation.

Assuming the scenario with everything running on one box can you
elaborate about how MySQL would figure into the equation? At the
moment it seems like maybe RAM would be the only factor because ruby
seems to be the CPU bottleneck by a long shot. I can’t really even
get MySQL to blink. From my testing thusfar (granted not as extensive
as I’d like or as you and Ezra have no doubt performed) I don’t see
how in a single box environment tuning anything other than the web
server process count and ruby/rails is going to make any perf diff.
Just curious if I’m way out of line with that thinking. I’m basing my
observation on my log file time division where 80-90%+ is spent in
ruby compared to 10% or less for MySQL.

Unix system performance tuning is a very complex subject, and it
changes all the time. That said, some of the best books written on the
subject were written a long time ago, when resources such as CPU
cycles, RAM, and disk storage were all scarce and expensive.

Here are two of the best books I’ve ever read on the subject, and
would highly recommend everyone interested in deployment read:

O'Reilly Media - Technology and Business Training

Thanks for the book recommendations… TIme to go see if it’s in
safari so I can read it online

Best,
-Michael

roller8 · March 10, 2007, 7:05am

On Mar 10, 12:42 am, Michael K. [email protected] wrote:

If MySQL is running on the same box, however, that would figure into
the equation.

Assuming the scenario with everything running on one box can you
elaborate about how MySQL would figure into the equation? At the
moment it seems like maybe RAM would be the only factor because ruby
seems to be the CPU bottleneck by a long shot. I can’t really even
get MySQL to blink.

Are you using ActiveRecordStore for sessions, or using the default
disk
based sessions?

If you’re writing to disk, it could well be that your disks are
limiting your
throughput, particularly if you are creating many thousands of files
in
the same directory, which can have nasty processor punishing
performance implications.

If you run top while the benchmark is running, what does the header
above
the process list look like?

From my testing thusfar (granted not as extensive
as I’d like or as you and Ezra have no doubt performed) I don’t see
how in a single box environment tuning anything other than the web
server process count and ruby/rails is going to make any perf diff.
Just curious if I’m way out of line with that thinking. I’m basing my
observation on my log file time division where 80-90%+ is spent in
ruby compared to 10% or less for MySQL.

No question you should focus the most on where you spend the most
time. But, why are you spending the time where you are?

You cannot generally tune anything. If you want really good results,
you have to get the tests as close to reality is possible. This is why
really good performance tuning guys must have access to production
systems.

–
– Tom M., CTO
– Engine Y.

roller8 · March 10, 2007, 7:44am

On 3/9/07, [email protected] [email protected] wrote:

You should be getting closer to
4 kreq/sec for static files, perhaps even faster on a local system.

That’s right, 3.9 kreq/sec it is. I had an error in the httpd.conf.
Thanks
for the heads-up.

Alex

roller8 · March 12, 2007, 5:08am

Hmm, I’m a little confused about where I sit in this thread as it’s
grown a
life of it’s own! But I’m glad I get to read through the whole
conversation. Lots of insight.

OK well my hello world is just a piece of the Instant Gratification
chapter
at the start of Agile Web Dev. It’s a single controller named “Say”
with
one action title “hello” and one rhtml template that has a little html
and
one call to Time.now. Then I use htterf with the params I typed in the
original post.

My mySQL is sitting over on a different machine (Dell 2950, dual dual
core
Xdeon with 16gb ram, CentOS 4.4 on a SCSI SAS mirror and the MySQL db’s
living on a local Raid 5 SCSI SAS set. So no local DB action. These
will
all be connected to 1GB switches.

So, if I’m benchmarking low are there any clues as to where I could
begin
looking? Also, I realize that Nginx is supposed to serve up the static
content which is really fast. So does that mean that rhtml pages with
ruby
code, html and images will get partially served by mongrels and Nginx or
is
it that once we’re in an rhml template that it’s 100% mongrels?

Hmm, a bit at a loss here. I’ll benchmark my static html again just to
be
sure but I’m pretty certain i scored low there too according to your
numbers
(~4000 req/secs).

Raul

----- Original Message -----
From: “[email protected]” [email protected]
To: “Deploying Rails” [email protected]
Sent: Friday, March 09, 2007 11:16 PM
Subject: [Rails-deploy] Re: Decent banchmark results?

roller8 · March 10, 2007, 7:17am

On Mar 9, 10:25 pm, “Alexey V.” [email protected]
wrote:

On 3/9/07, [email protected] [email protected] wrote:

What’s the point of a test that has so little basis in real world
usage?

Are we talking about “hello world”, or real world?

Good point, and a fair statement.

“hello world” isn’t real world, that’s true, but it isn’t quite a
fairy tale either, which
is a little closer to what I’d call a sessionless “hello world.”

Without a session creation, you might just as well have served a

static HTML page, which would have returned higher numbers yet.

True. And that was exactly what I wanted to establish with hello world test

that it’s not much slower than serving static files.

Well, that’s a good testing and interesting in and of itself, but it’s
not really fair
to post the results of that test as an example of how his results were
a bit lower
than expected.

And, not much slower than serving static files? You should be getting
closer to
4 kreq/sec for static files, perhaps even faster on a local system.

Here’s a static test of a single Engine Y. slice, from another
slice, over
gigabit ethernet.

ey00-s00070 ~ # ab2 -n 5000 -c 4 http://www.engineyard.com/404.html

This is ApacheBench, Version 2.0.40-dev <$Revision: 1.146 $>
apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking 10.0.128.71 (be patient)
Completed 500 requests
Completed 1000 requests
Completed 1500 requests
Completed 2000 requests
Completed 2500 requests
Completed 3000 requests
Completed 3500 requests
Completed 4000 requests
Completed 4500 requests
Finished 5000 requests

Server Software: nginx/0.4.13
Server Hostname: 10.0.128.71
Server Port: 80

Document Path: /404.html
Document Length: 619 bytes

Concurrency Level: 4
Time taken for tests: 1.179527 seconds
Complete requests: 5000
Failed requests: 0
Write errors: 0
Total transferred: 4150000 bytes
HTML transferred: 3095000 bytes
Requests per second: 4238.99 [#/sec] (mean)
Time per request: 0.944 [ms] (mean)
Time per request: 0.236 [ms] (mean, across all concurrent
requests)
Transfer rate: 3435.28 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 3
Processing: 0 0 1.0 0 32
Waiting: 0 0 1.0 0 31
Total: 0 0 1.1 0 32

Percentage of the requests served within a certain time (ms)
50% 0
66% 0
75% 0
80% 0
90% 1
95% 1
98% 1
99% 1
100% 32 (longest request)

–
– Tom M.

roller8 · March 12, 2007, 5:47pm

What is you sessions storage config? If you didn’t do anything
explicitly,
sessions are persisted as files in ./tmp/sessions, and that’s slow.

Alex Verkhovsky

roller8 · March 12, 2007, 6:46am

On Mar 12, 12:08 am, Roller8 [email protected] wrote:

So, if I’m benchmarking low are there any clues as to where I could begin
looking?

Just the ones already mentioned above. Have you adjust the number of
mongrels per machine, and are you using ActiveRecordStore for
sessions?

Don’t forget that increasing the number of mongrels will require a
change to
nginx.conf…

Also, I realize that Nginx is supposed to serve up the static
content which is really fast. So does that mean that rhtml pages with ruby
code, html and images will get partially served by mongrels and Nginx or is
it that once we’re in an rhml template that it’s 100% mongrels?

In your configuration it looks like nginx is involved in every
requests, and
handled static content all by itself, with no mongrel involvement.

For dynamic content, such as your benchmark, nginx takes the request,
but proxies back to mongrel for each request.

Hmm, a bit at a loss here. I’ll benchmark my static html again just to be
sure but I’m pretty certain i scored low there too according to your numbers
(~4000 req/secs).

4k req/sec is for static content only. I’m sure you’d see similar if
not higher
number for your configuration.

–
– Tom M., CTO
– Engine Y.

roller8 · March 12, 2007, 6:38pm

OK, well I haven’t reached that point yet but I’ve noted this. Thanks
for the information. Also, I think I may have had too many nginx
processes running for my machine. I left 6 from Ezra’s config but
I’ve moved it to 4, one for each processor for now. Early benchmarks
show an improvement of about 260 req/sec but I think there’s still
some tweaking that needs to be done before I even start tweaking the
app with caching and stuff. I’m sure I’m missing something somewhere.

Tom, I’m also moving to 4 mongrels per CPU to see where that takes
me. Oh, and lastly, I’m using the hugemem kernel (Linux
2.6.9-42.0.10.ELhugemem). I wonder if I should try the regular smp
kernel for this? I guess I will try it out anyway.

Raul

On Mar 12, 8:39 am, “Alexey V.” [email protected]

roller8 · March 12, 2007, 7:22pm

On 3/12/07, roller8 [email protected] wrote:

some tweaking that needs to be done before I even start tweaking the
app with caching and stuff. I’m sure I’m missing something somewhere.

Well, if you haven’t done anything with sessions, I would bet my 2 cents
that session persistence through file system is your current bottleneck.
At
least, it was for many people in a similar stage.

The typical price of it, if I remember correctly, is around 20 to 30
msec
per request, and it just keeps on growing.

Alex

roller8 · March 12, 2007, 10:42pm

Yes I think that was a definite problem. I had tons of files in my tmp
sessions too that I’ve since cleared. I’m now switched to
ActiveRecordStore
even though I don’t yet have any DB access going on. I’ll be reporting
back
within minutes my new results.

Raul

roller8 · March 12, 2007, 9:46pm

On Mar 12, 1:37 pm, “roller8” [email protected] wrote:

If reducing mongrels improved your score on that box, then it’s almost
certainly using disk based sessions. Fewer mongrels means less disk
thrashing…

Tom, I’m also moving to 4 mongrels per CPU to see where that takes
me.

That’s 4 mongrels per core.

Oh, and lastly, I’m using the hugemem kernel (Linux
2.6.9-42.0.10.ELhugemem). I wonder if I should try the regular smp
kernel for this? I guess I will try it out anyway.

You’re going to get far better results by checking your session config
and reporting back to us…

If you switch to DB backed sessions with ActiveRecordStore, which
is very easy, you’re likely going to see a major performance increase.

–
– Tom M., CTO

roller8 · March 12, 2007, 11:22pm

OK I’m now running 4 nginx and 16 mongrels (4 per core) and getting
420 to 450 req/sec. So I think that may sound a little more proper
for the hardware I’m running then?

roller8 · March 17, 2007, 10:00pm

Yes! That’s it exactly.

I’m surprised the Google didn’t help me out on that one.

Sorry for the mis-reference, everyone!