RE: Mongrel + RubyOnRails + FileUploads = Problems?

dacbo · May 3, 2006, 9:08pm

So I could run multiple mongrel instances on one server and use a proxy
load balancer?

Chris

dacbo · May 3, 2006, 10:32pm

Yes.

–
– Tom M.

dacbo · May 3, 2006, 11:15pm

Wait… this is kind of bad… if I understand this right…

1 dispatcher means that if someone uplaods a file, nobody else can use
the
site?

So 2 dispatchers works until two people upload a file at the same
time… so
really, this is not going to scale well at all if you’re trying to do
something like flickr.

With that logic you’d need 100 dispatchers if you expected 100 users to
be
concurrently using your site to upload files.

YouTube gets 25,000 submissions per day… does that mean Rails could
never
be used to build a site like that?

I just can’t believe that’s the case.

dacbo · May 4, 2006, 3:26am

Hey Brian,

On 5/3/06 5:12 PM, “Brian H.” [email protected] wrote:

Wait… this is kind of bad… if I understand this right…

1 dispatcher means that if someone uplaods a file, nobody else can use the
site?

Yep, for the time it takes Rails to process the uploaded file with
cgi.rb.
If you’ve got an insane amount of upload content then 1 won’t work at
all.

So 2 dispatchers works until two people upload a file at the same time… so
really, this is not going to scale well at all if you’re trying to do
something like flickr.

Yep, then there’s a delay. But, the story is a bit more complex.

With that logic you’d need 100 dispatchers if you expected 100 users to be
concurrently using your site to upload files.

100 concurrent users is an insane number of actual people in the
process
queue. The thing to keep in mind is that you can’t measure users since
that’s behavior dependent. You could have a site that users hit rarely,
where 100 concurrent means you have billions of users. You could have a
chat site like campfire where 100 concurrent could mean thousands. It
all
depends on user behavior.

The real way to figure out what kind of req/sec equates to a concurrency
level is to use simulation. There’s math methods using queuing theory
that
work decent if you need to figure your require concurrency before you
build
the system. If you have the system built already then you need to write
a
simulator tool or use an existing one that will perform a simulation
against
your live site.

A key thing with this though is you can’t really use the results as a
measurement of performance. It does tell you how well the site handles
load, generally how fast that simulated process might be, and what kind
of
additional hardware you’d need. But, to really find out how to tune a
particular part of the process you need to go back to performance
measurements with a tool like httperf.

YouTube gets 25,000 submissions per day… does that mean Rails could never
be used to build a site like that?

This is something else that’s kind of weird. I’m assuming most people
here
have a basic understanding of math but for some reason they throw out
these
kinds of measurements. Any “I get X per day” is really pretty useless
since
it doesn’t include a distribution. If you assume that this 25k is
steady
and average it out that’s like .29 req/second. No way that’s right
since
there’s probably some kind of statistical distribution for the requests.

The measurement would have to be augmented with a peak measurement over
a
smaller period of time. For example, “YouTube get 25k submissions/day
with
a peak of 1400/hour and .38/second.” Then you start to get a bigger
picture.

Anyway, hope that clears some things up. The gist is that you can
usually
handle a lot more concurrent connections with lower numbers of backends
then
you think, but you need to run simulations to really determine this.

Zed A. Shaw

http://mongrel.rubyforge.org/