Multiple apps on the same server, all should be able to surv

alexeyv · March 2, 2007, 9:50pm

Dear all,

I am researching solutions for “how do you squeeze as many Rails apps as
you
can on a cluster” problem.

Environment constraints are as follows:

4 commodity web servers (2 CPUs, 8 Gb of RAM each)
shared file storage and database (big, fast, not a bottleneck)
multiple Rails apps running on it
normally, the load is insignificant, but from time to time any of
these
apps can have a big, unpredictable spike in load, that takes (say) 8
Mongrels to handle.

The bottleneck, apparently, is RAM. At 100 Mb per Mongrel process, you
can
only put 320 Mongrel processes on those boxes, and under specified
parameters you can only handle 40 apps on the hardware described above.
PHP
can handle thousands of sites under the same set of constraints.

We could use lighty + FastCGI combo, but it has a bad reputation. I
wonder
if it’s because of bugs in implementation, or it’s just not designed for
these scenarios (if not, what’s the limitation, and can it be fixed?)

If anybody knows a ready-made solution to this problem, please let me
know.
The last thing I want to do is reinvent the wheel.

If anybody knows a load balancer smart enough to start and kill multiple
processes across the entire cluster, based on demand per application,
please
let me know about that, too.

Finally, I’ve been thinking about making Rails execution within Mongrel
concurrent by spawning multiple Rails processes as children of Mongrel,
and
talking to them through local pipes (just like FastCGI does, but a
Ruby-specific solution). This may allow a single Mongrel to scale 3-4
times
better than now, and also to scale down if no requests are coming in the
last, say, 10 minutes. A “blank” Ruby process only takes 7Mb of RAM,
perhaps
a “blank” Mongrel is not much more (haven’t checked yet). Wonder if this
makes sense, or am I just crazy.

I think, we can implement (and open-source) any solution that needs
weeks
rather than years of effort.

Thoughts?

Best regards,
Alex Verkhovsky
ThoughtWorks

alexeyv · March 2, 2007, 9:56pm

normally, the load is insignificant, but from time to time any of these
these scenarios (if not, what’s the limitation, and can it be fixed?)

If anybody knows a ready-made solution to this problem, please let me know.
The last thing I want to do is reinvent the wheel.

Look into litespeed. I recently switched to it for my home server for
the
same reason (on a much much much smaller scale :). It can automatically
spawn lsapi (rails) instances on demand (with a variety of cap levels
such
as process, memory, etc.) and then shut them down after they are idle
for
a certain amount of time.

Their free version limits you to 150 connections at a time (active
connections, not queued in the network connections) so it sounds like
you’d need to pay for it, but still might be worth looking at. I think
the free version also only utilizes one CPU…

I got it because i have several apps (in addition to all the ‘normal’
unix
server things) on a small box that hardly get any traffic. litespeed
keeps rails “off” until I hit them, starts them up, then shuts them down
5
minutes after I’m done.

-philip

alexeyv · March 2, 2007, 9:59pm

We’ve experimented a bit with a LiteSpeed/LSAPI combo for hosting as
many Rails apps as possible w/o overcrowding. LiteSpeed seems to have
a couple of features particularly favorable for this sort of setup:

It will start w/ one LSAPI process, but scale up and down
dynamically as necessary, up to a specified limit
You can set resource limits at the web server level on your LSAPI
processes

So far, our experience has shown us that LiteSpeed/LSAPI works great
for running a lot of small sites on a single server, and is dead-
simple to set up (LiteSpeed comes with a web-based admin which allows
you to do most of the legwork of setting up new sites). However,
LiteSpeed did not work well for us for larger applications. For that,
we still use good 'ol Apache/Mongrel.

alexeyv · March 2, 2007, 10:11pm

simple to set up (LiteSpeed comes with a web-based admin which allows
you to do most of the legwork of setting up new sites). However,
LiteSpeed did not work well for us for larger applications. For that,
we still use good 'ol Apache/Mongrel.

Chris - Can you share what issues you had for larger apps? And was that
with their 2.x or 3.x server?

At work we’ve got a couple of the servers running litespeed and a couple
apache/mongrel and so far haven’t noticed any issues… but maybe are
apps are different…

Thanks!

alexeyv · March 2, 2007, 10:18pm

I don’t know all the intimate details of what was happening… but on
one of our higher traffic applications, LiteSpeed seemed to be
spawning an excess number of processes, leaving behind several dead
processes. This was causing our application to run very slowly. We
didn’t dig too much into it. Instead, we switched the app to Nginx (I
mis-spoke earlier when I said Apache) and 10 mongrels, which seems to
run great.

LiteSpeed runs very well for our smaller sites (typically 1-3
processes each). I think we have this setup on 20-30 of these smaller
sites and we haven’t had any issues.

alexeyv · March 2, 2007, 10:50pm

On 3/2/07, Chris A. [email protected] wrote:

However, LiteSpeed did not work well for us for larger applications.

What distinguishes a large app from a small one? And what do the
problems
look like?

Alex

alexeyv · March 3, 2007, 12:26am

On Mar 2, 12:49 pm, “Alexey V.” [email protected]
wrote:

normally, the load is insignificant, but from time to time any of these
these scenarios (if not, what’s the limitation, and can it be fixed?)
talking to them through local pipes (just like FastCGI does, but a
Ruby-specific solution). This may allow a single Mongrel to scale 3-4 times
better than now, and also to scale down if no requests are coming in the
last, say, 10 minutes. A “blank” Ruby process only takes 7Mb of RAM, perhaps
a “blank” Mongrel is not much more (haven’t checked yet). Wonder if this
makes sense, or am I just crazy.

I think, we can implement (and open-source) any solution that needs weeks
rather than years of effort.

Alex, sorry if I am being Caption Obvious here, but do you cache at
all? If you can get away with caching complete pages as static HTML,
you wouldn’t have to worry about Mongrel and Rails at all. If that’s
not possible, have you considered Rails’ page fragment caching or a
more custom caching solution?

It feels like any web app with that much traffic in a short amount of
time would be rendering the same data and making the same calculations
repeatedly.

Scott

alexeyv · March 3, 2007, 1:43pm

We could use lighty + FastCGI combo, but it has a bad reputation. I wonder
if it’s because of bugs in implementation, or it’s just not designed for
these scenarios (if not, what’s the limitation, and can it be fixed?)

Not sure what is the bad reputation of Lighty + FastCGI, the only
reference I know about problems with Lighty is about the proxy module.
Zed doesn´t recommend it because of staled development that prevented
some bugs of being fixed.

Considering your scenario, it looks like FastCGI with a well tuned set
of rules should be able to handle the both your normal low load, and
the peak loads your apps receive from time to time. FastCGI whas
designed to handle this kind setups.

–
Aníbal Rojas

http://www.hasmanydevelopers.com

alexeyv · March 3, 2007, 1:58pm

On Mar 2, 10:17 pm, Chris A. [email protected] wrote:

I don’t know all the intimate details of what was happening… but on
one of our higher traffic applications, LiteSpeed seemed to be
spawning an excess number of processes, leaving behind several dead
processes. This was causing our application to run very slowly. We
didn’t dig too much into it. Instead, we switched the app to Nginx (I
mis-spoke earlier when I said Apache) and 10 mongrels, which seems to
run great.

We ran into this issue and it just turned out to be a configuration
issue that was fixed with some fine tuning. The litespeed config for
2.2 can be somewhat confusing in the wording and throws many people
off. They cleaned much of this up in 3.0. I came across this thread
which helped:

After a few config tweaks everything was kosher.

alexeyv · March 3, 2007, 11:12pm

One issue with Litespeed is it’s closed source. A key piece of your IT
infrastructure, completely dependent on a small company?
This situation created some interesting problems in the past. Not to
mention
the licensing costs ($1400 per box).

It’s not a show-stopper, but is there an open-source alternative that
actually works?

Re page caching: yes, absolutely, page caching rocks, but it’s not
always
possible, and even when it is possible, it is not always implemented.

Re bad reputation of lighty + FastCGI combo: most open source software
is
written out of frustration (either with existing solutions, or with your
day
job, and usually both). As far as I know, both Mongrel and mod_fcgid
were
written out of frustration with mod_fastcgi.

Best regards,
Alex

alexeyv · March 3, 2007, 1:47pm

I would also recommend litespeed. Though we’re only running one app,
we were running into the same memory issue with Nginx +
Mongrel_cluster. Since we switched to litespeed we’ve had much more
memory available plus better performance. To top it off its a BREEZE
to setup and maintain.

On Mar 2, 9:49 pm, “Alexey V.” [email protected]

alexeyv · March 4, 2007, 1:34am

Alex,

Re bad reputation of lighty + FastCGI combo: most open source software is
written out of frustration (either with existing solutions, or with your day
job, and usually both). As far as I know, both Mongrel and mod_fcgid were
written out of frustration with mod_fastcgi.

That’s a kind of pessimistic view of “scratch own itch”

As far as I understand mod_fcgid with Apache 2.x is the way to go
because of the troubles with the mod_fastcgi implementation of fastCGI
for Apache 1.x. Mongrel is an alternative, and simpler approach, based
in the use of a well know and workinf technology: reverse proxies.
SimpleCGI (first Zed try) as a protocol is really simple compared to
FastCGI, wich is really powerful but complex.

But I am not sure where are the Lighty + fastCGI issues reported …
What have been reporte are the problems with proxy.

Best regards,

–
Aníbal Rojas