Making Mongrels Faster

So I have a rather high traffic site that is starting to slow down
significantly. Here is the setup.

4 boxes,
75 mongrels spread across them all.
One of them is the “master” that is using apache mod_proxy_balancer to
balance the traffic across them all.
And one database server (which does not seem to be getting overly taxed)

What I am looking for are any tweaks tips people have used or know of to
make the rendering that the mongrels do faster. I am looking into
memcached
but as of yet I have found in testing on my linux servers that it is
horribly slow compared to the same code and test on a dev windows box.

Any help would be greatly appreciated.

Nes++

Nestor Camacho said the following on 02/14/2007 11:44 AM:

What I am looking for are any tweaks tips people have used or know of to
make the rendering that the mongrels do faster. I am looking into
memcached but as of yet I have found in testing on my linux servers that
it is horribly slow compared to the same code and test on a dev windows
box.

Without knowing the performance profiles of the boxes, its difficult to
generalise.

I’d start by looking to see of there was ‘starvation’, of memory, of
network
bandwidth … heck, sometimes using a correctly configured switch, or
reconfiguring your switch, can make a massive difference!

I’d then look to see how the mod_proxy is interacting and distributing.

In the past, I’ve achieved astounding performance just with round-robin
DNS
and no other ‘balancing’. At one site they were cynical and installed a
hardware load balancer and performance dropped compared to the RR-DNS.

At another site, they thought that locking critical parts of the
application
into memory would speed things up. In fact the OS paging algorithm was
smarter than they were - the app ran faster when not locked.

Tuning often requires deep knowledge of the architecture. I can tune
most
version of *NIX but haven’t a clue when it comes to Windows.

But as I say, the generalizations we can make in the absence of details
and
measurements may not be very helpful or informative.


My definition of a free society is a society where it is safe to be
unpopular.
Adlai E. Stevenson Jr., Speech in Detroit, 7 Oct. 1952

What is the CPU utilization break out from top while under load?

What does vmstat 5 5 report while under load?

There is a Google Group, started by Robby and Planet Argon here:

http://groups.google.com/group/rubyonrails-deployment

That focuses entirely on Rails deployment issues.


– Tom M., CTO
– Engine Y., Ruby on Rails Hosting
– Reliability, Ease of Use, Scalability
– (866) 518-YARD (9273)

Nestor Camacho said the following on 02/14/2007 12:56 PM:

Unfortunately, I do not have access to the network side of things this
is at a collocation server states away.

LOL!
That pretty much gaurentees its going to be a network problem! :slight_smile:

However, I don’t think the
problem is the network I am able to do a lot of bandwidth related duties
both small and large transfers very quickly and it is very responsive.
I have sustained 10meg’s with no degradation.

I read that to mean your connection form the outside world into the
machines. The connection between the machines may have other
considerations. As I said, setting up switching hubs can influence
performance in odd ways.

You might google for the excellent papers written by the guys at
Wikipedia
(as well, of course as those at Google and Ebay) on how they grew their
networks, do load balancing and the trade off between database,
rendering,
file/image/stylesheet//javascript serving, static pages, etc.

One thing they point out is that when scaling out, FTP is the
kiss-of-death.
Don’t share code that way. Push it with sync.

I wish I knew as much about Ruby and Rails as I do about hardware.
Guess where I mis-spent my youth?


“I am always ready to learn, although I do not always like being
taught”.
– Winston Churchill

First thanks for the quick response and sorry for not giving more
details.
Hardware details below.

Vendor OS CPU Memory Harddrive(s) Dell PowerEdge CENT OS 4.2 2xP4 3
GHZ
2GB 2x160GB

Unfortunately, I do not have access to the network side of things this
is at
a collocation server states away. However, I don’t think the problem is
the
network I am able to do a lot of bandwidth related duties both small and
large transfers very quickly and it is very responsive. I have
sustained
10meg’s with no degradation.

I will try and collect some measurements and update everyone. But from
what
I am seeing I am not quite maxing out on memory or cpu When traffic
starts
to come in, the mongrels spike the CPU to do the rendering/database
calls,
etc. Then cool down than spike again to render/database calls, etc.
Other
than the going through and removing as much code bottlenecks we might
have.
From my point of view I wanted to try and squeeze out all I can from the
mongrels/server.

Nes++

There is definitely a point of diminishing returns on performance
when you add too many mongrels. I would say scale back to 10 mongrels
or less per box. How many page views/day are you serving?

You can change the production log level to fatal and gain
performance by only logging fatal errors. Without knowing more about
your app I can’t say much more. With that many mongrels you may want
to try Haproxy for the load balancing, it is a lot more inteligent
then mod_proxy_balancer.

-Ezra

On Feb 14, 2007, at 9:56 AM, Nestor Camacho wrote:

related duties both small and large transfers very quickly and it
server.

memcached but as of yet I have found in testing on my linux
of network
installed a
version of *NIX but haven’t a clue when it comes to Windows.

– Ezra Z.
– Lead Rails Evangelist
[email protected]
– Engine Y., Serious Rails Hosting
– (866) 518-YARD (9273)

Hi,

How optimized is the app? Can you get rails to do less work using caches
page? Is the app public so we can have a look to get some ideas?

Cheers,
Carl.

On 2/15/07, Ezra Z. [email protected] wrote:

your app I can’t say much more. With that many mongrels you may want

windows
reconfiguring your switch, can make a massive difference!
At another site, they thought that locking critical parts of the
details and
– Lead Rails Evangelist
[email protected]
– Engine Y., Serious Rails Hosting
– (866) 518-YARD (9273)


Carl W.
0412218979
[email protected]

Carl W. said the following on 02/15/2007 01:19 AM:

How optimized is the app? Can you get rails to do less work using caches
page? Is the app public so we can have a look to get some ideas?

Caching is not always a good strategy
It assumes that the items being cached have a high rate of reuse.
There are many circumstances where this does not apply.

Ebay, for example, has factored out the ‘static’ pages and serves them
from
a dedicated machine ((or cluster). The ‘static’ things include
javascript
and style sheets.

The reality is that a UNIX box is caching a lot of things - the pages of
the
files that contain the binary of the Ruby interpreter, the text files
that
make up the code for RoR and the application, and of course the
directory
and i-node information for all those files.

So when you cache fragments or pages, the OS sees them as competing with
all
this.

If the ‘static’ pages are taking up space in the system cache then they
are
loading down your ability to cache fragments and dynamic pages. Even if
they don’t get flushed, they still present a load t the virtual memory
page
use check algorithm, and so eat CPU cycles.

Please don’t try to tweak the VM caching. I’ve found that even the
“poorer”
(by whatever critiqued) virtual memory systems are better than
application
programmers think.

The best short term solution is to throw more memory on the machine.
You
may need one of the enhanced kernels. Many are built for a 4Gig limit,
but
its easy to build or procure one for a 64Gig limit - just make sure you
don’t pick the ones built for a laptop or desktop :slight_smile:

The longer term solution is to study what Google, Yahoo, EBay and others
have done and written about, make measurements of your own system and
experiment.

Don’t expect to get it right the first time!


Follow your inclinations with due regard to the policeman round the
corner.
W. Somerset Maugham, ‘Of Human Bondage’, 1915

I agrees there is a point where you will get diminished returns. We
originally had 4 mongrels per server as the load increased so did the
number
of mongrels we added. Till we reached the 75 in total. The issue were
the
spikes, we would drum along without any issues all day then slam we
would
get 1000-2000 visitors in a 20-30 min span. As it stands from my perf
tests,
I was getting 4 req/s a second to load our main page. Not very good…

Last night I spent a few hours trying out apache/fastcgi since some
people
swear by it some people swear about it ;). And I was able to get 17-18
req/s. Much better… will I move over to fastcgi… maybe I want to try
out
your idea about putting the mongrels logging to fatal errors.

How do I do that I tried looking at the docs and did not see anything in
there.

Thanks for everyones feed back! It has been a tremendous help.

Nes++

4 boxes,
75 mongrels spread across them all.
One of them is the “master” that is using apache mod_proxy_balancer to
balance the traffic across them all.
And one database server (which does not seem to be getting overly taxed)

That’s what… almost 19 mongrels per box? We run 4… 5 was too
many…
see here for more info:

http://mongrel.rubyforge.org/docs/how_many_mongrels.html

What I am looking for are any tweaks tips people have used or know of to
make the rendering that the mongrels do faster. I am looking into
memcached but as of yet I have found in testing on my linux servers that
it is horribly slow compared to the same code and test on a dev windows
box.

Also, memcache shouldn’t be slow… except on OSX, and then only if you
don’t apply the patch by Hodel… memcache should be fast… so
something
is wrong there…

The other thing to look at is DB caching… mysql for instance won’t
cache
any query involving “NOW()”… so if you can tweak some of those you
might
save some DB time as well…

Good luck!

-philip

Nestor Camacho wrote:

So I have a rather high traffic site that is starting to slow down
significantly. Here is the setup.

4 boxes,
75 mongrels spread across them all.
One of them is the “master” that is using apache mod_proxy_balancer to
balance the traffic across them all.
And one database server (which does not seem to be getting overly taxed)

What I am looking for are any tweaks tips people have used or know of to
make the rendering that the mongrels do faster. I am looking into
memcached
but as of yet I have found in testing on my linux servers that it is
horribly slow compared to the same code and test on a dev windows box.

If you don’t mind, how many pages per second are you serving, and how
long do they each take according to the Rails self timing ??

What do you mean by horribly slow?

Stephan

Any help would be greatly appreciated.

Nes++

The main page is not static it is actually dynamic. In my httpd.conf
file I
had set it up so that apache would deliver what little files are static
(stylesheets, images, etc). Besides that though a lot of dynamic
rendering
is done to load the main page. I already have the developers combing
through
the code to find places where they can make things more stream line as
well
as trying to start to use memcache (which is reporting slower responses
on
linux, but that is a whole other issue).

As to Stephan W.'s question.

According to the rails it believes that it is rendering things at 14-27
req/s depending on what part of the site is being rendered.

Nes++

Nestor Camacho said the following on 02/15/2007 02:22 PM:

I agrees there is a point where you will get diminished returns. We
originally had 4 mongrels per server as the load increased so did the
number of mongrels we added. Till we reached the 75 in total. The issue
were the spikes, we would drum along without any issues all day then
slam we would get 1000-2000 visitors in a 20-30 min span. As it stands
from my perf tests, I was getting 4 req/s a second to load our main
page. Not very good…

If your main page is ‘static’ or has many static elements its a
candidate to
move out of that cluster.

You might try, just as an example, setting up the INET daemon with

http-8081 stream tcp nowait nobody /bin/cat cat /cdrom/index.html
http-8082 stream tcp nowait nobody /bin/cat cat /cdrom/docs/index.html
http-8083 stream tcp nowait nobody /bin/cat cat /cdrom/text/doc1.html
http-8084 stream tcp nowait nobody /bin/cat cat /cdrom/text/doc2.html
http-8085 stream tcp nowait nobody /bin/cat cat /cdrom/text/doc3.html

Fast, low CPU load. Oh, and VERY secure as well :slight_smile:

Now, set up a machine with with no applicaitons, no apache that does
that,
and add it to your DNS naming it “styleserver.mydomain.com

Now your headers read:

.... ..... ....

and so forth.
Similarly for javascript.


Whitehead’s Law:
The obvious answer is always overlooked.

I did a test setup of Memcached with my app. The main page also has
several sections of dynamic content. Using memcache to store fragments
of the page for a short time(you can set their expiration time)
increased throughput to approximately three times the uncached numbers.
Memcached is very easy to set up and use, you just have to make sure you
expire the approprate caches when updates to data are made, otherwise
you have to wait for them to expire to get the update.

Also, what, 19 Mongrels per server? That is far, far too many. I can max
out my CPUs with 5 mongrels per machine. I actually have 4 apps with 4
instances each running on my production web nodes. That is probably too
many, but I have plenty of RAM so why not?

Let us know what happens with your site. Also, what is your site?

Jason

Nestor Camacho said the following on 02/15/2007 04:30 PM:

The main page is not static it is actually dynamic. In my httpd.conf
file I had set it up so that apache would deliver what little files are
static (stylesheets, images, etc).

That’s good. It makes it easy to move it out of the overhead that
invovles
the extra work done by that big blob of code that is Apache and onto a
dedicated ‘static server’ machine.

Its not the size of the files, its the ridiculous amount of work Apache
has
to do to serve them up.

Factoring them off take a load off the rendering servers.

K.I.S.S.

http://meta.wikimedia.org/wiki/Why_Wikipedia_ran_slow_in_late_2003
http://meta.wikimedia.org/wiki/Why_Wikipedia_runs_slow
http://meta.wikimedia.org/wiki/November_2005_image_server

At Wikimedia servers - Meta

The Squid systems maintain large caches of pages, so that common or
repeated
requests don’t need to touch the Apache or database servers. They serve
most
page requests made by visitors who aren’t logged in. They are currently
running at a hit-rate of approximately 75%, effectively quadrupling the
capacity of the Apache servers behind them. This is particularly
noticeable
when a large surge of traffic arrives directed to a particular page via
a
web link from another site, as the caching efficiency for that page will
be
nearly 100%. They also load balance the Apaches. Round-robin DNS is
balancing the load among the Squids. See cache strategy for more
details.


Opportunities multiply as they are seized.
–Sun Tzu