Dynamically generating 10k pages per second

Hi,

Anyone got an idea of how many web and database servers I’d need to
push out 10,000 dynamic pages per second? Fairly simple pages and
database queries. I’d appreciate recommendations for hardware.

The clients for this project are anticipating large amounts of burst
traffic.

Joe

That question is rather hard to answer, since “dynamic” isnt really
well-specified… your best bet would be to use caching heavily and in
the
best of situations you could just serve static pages, and then its up to
the
web server - not rails.

On 7/28/06, Joe Van D. [email protected] wrote:

Hi,

Anyone got an idea of how many web and database servers I’d need to
push out 10,000 dynamic pages per second? Fairly simple pages and
database queries. I’d appreciate recommendations for hardware.

The clients for this project are anticipating large amounts of burst traffic.

(I think that estimate is quite a bit on the high end of what will
actually happen, but anyways)

I want to be able to tell them that going up to that number of visits
will be just a matter of adding another machine or ten to the rack at
the data center. In other words, I’d like to have a hardware and
software setup that could handle that number of visits simply by
adding another web server.

Thanks,
Joe

On 7/28/06, Joe Van D. [email protected] wrote:

Hi,

Anyone got an idea of how many web and database servers I’d need to
push out 10,000 dynamic pages per second? Fairly simple pages and
database queries. I’d appreciate recommendations for hardware.

The clients for this project are anticipating large amounts of burst traffic

What makes them think they will get that much traffic? I can probably
count on one hand the number of sites that do that much, burst or
otherwise. Sounds fishy to me.

On 7/28/06, Joe Van D. [email protected] wrote:

actually happen, but anyways)

I want to be able to tell them that going up to that number of visits
will be just a matter of adding another machine or ten to the rack at
the data center. In other words, I’d like to have a hardware and
software setup that could handle that number of visits simply by
adding another web server.

Even assuming that they will get say 5-10 million hits per day, if the
site is database driven more than likely it’s going to take more than
just adding X number of servers per X amount of traffic. How you
setup your database servers to handle reads/writes would probably be
one of the bigger issues, as well as how you handle your sessions and
caching. I would start out with some of the basics in place, like a
database cluster of some type and hardware based load balancers on the
front end such as ServerIrons. Some things are easier to change once
you get going than others. Switching from a single database to a
cluster while you are already getting a million hits a day is not fun.
You will also be spending some money on routers, probably something
like the cisco 28XX or 38XX series. You could easily use 20mbps or so
when bursting.

And if your clients have unrealistic expectations, I would be very
very careful. Personally I tell my clients to be prepared for the
worst, and if they except that, only then will I work for them. That
way when something does go wrong (and it will), they won’t be coming
back to you yelling and screaming. They might not like things going
wrong, but they will remember that you told them that things like this
were bound to happen, and to be prepared for it.

I’d say it’ll take about 100 times as much as it would take to push
out 100 dynamic pages per second…

Seriously, you need a WHOLE lot of info about app, infrastructure,
hardware etc. before anyone could make any such recommendation.
Anyone who says otherwise has no idea what they’re talking about.

If you get a sample of the hardware you intend to use, then load up
your app, tune it and push it as hard as it can go, you’ll get some
sort of idea. If it can handle 100 pages/second, then you need ~100
times as much hardware as you’ve already got. Yep, I know that’s
overly simplistic, but it’s relatively cheap, simple to extrapolate,
and it’ll get you within 20-50% of a reasonable estimate in dollar
terms. You’ll have to factor in load balancing hardware (which you
won’t be able to drive to breaking point without a sizeably greater
investment), database replication and the costs of running a suitable
data centre, at some point, but to counter that you’ll probably be
able to squeeze out some more pages/second from your existing hardware
(and in any case, $$$/hardware grunt continues to drop so that by the
time you’ve bought your 10k pages/second hardware, the cost of the
hardware will have dropped considerably).

Alternately, you could just listen to the guy who replies e.g. “27 Web
servers and 13 database servers”, and accept that at face value ;->

Sorry if that’s very little help, but at least it’s the truth. Nobody
could answer your question without doing a lot of research on your
specific app first.

Regards

Dave M.

10,000,000 (ten million) hits a day is only 115.74 hits per second.
10,000 hits per second is 864,000,000 hits per day.

There’s no way I’m going to believe that your client is going to be
getting
864,000,000 page views per day. Any company that is getting that many
page
views per day, even in bursts, already has the architecture and
infrastructure in place to handle it, so they wouldn’t be asking your
advice, furthermore they’d know HOW to handle that kind of load too, so
again, they wouldn’t be asking an outsider for help.

Find out how many hits your customer is going to REALISTICALLY be
expecting,
then come back and re-submit your question.

This is of course excluding sites whose sole purpose is to support
massive
botnets and other tools of evil.

-masukomi

On 7/30/06, kate rhodes [email protected] wrote:

10,000,000 (ten million) hits a day is only 115.74 hits per second.
10,000 hits per second is 864,000,000 hits per day.

There’s no way I’m going to believe that your client is going to be getting
864,000,000 page views per day. Any company that is getting that many page
views per day, even in bursts, already has the architecture and
infrastructure in place to handle it, so they wouldn’t be asking your
advice, furthermore they’d know HOW to handle that kind of load too, so
again, they wouldn’t be asking an outsider for help.

I thought I said it’s a lot of burst traffic (meaning a lot of traffic
in a short amount of time). The site is not going to sustain that
much traffic through out the day.

Joe

Still… even to be getting anywhere near that kind of traffic, the
previous responses are correct, just doesnt sound right!!

Any idea on what actual web server your going to be using? lighttpd is
capable of dynamic load balencing so if you havent already, take a good
long hard look at it!

Tim

I thought I said it’s a lot of burst traffic (meaning a lot of traffic
in a short amount of time). The site is not going to sustain that
much traffic through out the day.

Doesn’t really matter, it’s still out in lala land IMO. Look, the
other guy is right, this just isn’t how it’s done. Sharp business
people know how to find talent, whether it’s because they have
experience in the industry, or through VC’s who know how to get
talent, or simply because they are smart enough to talk to people who
have done it before. If they had any clue at all they would know
that they needed people with prior experience in this area. It’s just
common sense.

That said, lets assume for the moment that it’s legit and you are
going to do this. Why haven’t you given any details? Several people
have asked for more detail, and they are right in saying that no one
can give you any meaningful information without a lot more detail.
You have some of the sharpest minds in the ruby community here that
would be willing to help, but you essentially deny their help by not
giving them the information they need to help you. Which is another
reason people are probably disinclined to believe this whole thing is
legitimate.

In any case, good luck with it all.

On 7/30/06, Francis C. [email protected] wrote:

I assume 10,000 hits/second is an average, and I also assume it’s a
global average so the peak rate at particular times will be closer to
40,000/second.

It’s not. 5-10k hits per second would be at the very high end for a
short amount of time.

Joe

I assume 10,000 hits/second is an average, and I also assume it’s a
global average so the peak rate at particular times will be closer to
40,000/second. You are among the very top sliver of the most heavily
trafficked sites in the world. Many companies (I assume you’re a
company) in this position create their own application software on top
of modified kernels. (I’ve been involved in several such efforts with
traffic loads similar to yours.) One thing you will not do if you’re
like most people is use an RDBMS to back this site. You’ll probably
design your own well-customized and highly-denormal data-query system.
There are a lot of different approaches to this, but the commercial
value of such a well-trafficked site is such that you should already
have lined up more than enough funding to do this job right. And there
are plenty of Internet-bubble veterans around who’ve been there and done
that, that you can hire. I’m still trying to decide if you’re playing
with us here.

On Jul 30, 2006, at 9:02 AM, kate rhodes wrote:

There’s no way I’m going to believe that your client is going to be
getting 864,000,000 page views per day. Any company that is getting
that many page views per day, even in bursts, already has the
architecture and infrastructure in place to handle it, so they
wouldn’t be asking your advice, furthermore they’d know HOW to
handle that kind of load too, so again, they wouldn’t be asking an
outsider for help.

Sorry to pick on you, Kate, because this fits many others as well)

I cannot imagine why anyone would be so close minded.

Does anything that you mention in your response fit Google, eBay,
or Amazon at inception?

Is it impossible to imagine that someone has a good idea, has done
some research, is slightly over optimistic (but not necessarily
wrong!), and want to get an idea of what it might take to handle that
sort of load?


– Tom M.

Rasmus has done several talks on how to architect a system which can
handle
the load of a place such as Yahoo.

Might be worth sifting through his slides for the diagrams he mentioned.
Whether its PHP or Rails, they are similar enough that you can leverage
his
knoweldge when you get down to something like system architecture.

Not sure why everyone is jumping down this guys throat. Who cares if he
landed a client like Digg or YouTube? He was just asking for how it
would be
done.

-NSHB

On 7/30/06, Tom M. [email protected] wrote:

sort of load?


– Tom M.


Rails mailing list
[email protected]
http://lists.rubyonrails.org/mailman/listinfo/rails


Kind regards,

Nathaniel B.
President & CEO
Inimit Innovations Inc. - http://inimit.com

On 7/30/06, Nathaniel B. [email protected] wrote:

Rasmus has done several talks on how to architect a system which can handle
the load of a place such as Yahoo.

Any chance you could link to those slides?

Might be worth sifting through his slides for the diagrams he mentioned.
Whether its PHP or Rails, they are similar enough that you can leverage his
knoweldge when you get down to something like system architecture.

Not sure why everyone is jumping down this guys throat. Who cares if he
landed a client like Digg or YouTube? He was just asking for how it would be
done.

:slight_smile:

The intent of my question was to figure out what changes when you move
to, say, 100 dynamic pages per second, which my laptop can handle, to
1000 dynamic pages per second, which probably a couple servers could
handle, to 10k pages per second.

I probably should’ve phrased the original question better.

Again, this is a large amount of traffic in a burst, a short amount
of time. Think victoriasecret.com advertised during the superbowl.
Not quite at that level, but the general idea applies.

In my situation, handling 500 to 1,000 dynamic pages per second
without any slow down would be great, and quite honestly, is probably
all we’ll ever need. But, the folks I’m doing this for want to be
assured that it’s not too difficult to go higher. I don’t have much
experience at that level of performance, hence the question.

Another question: Assuming I’ve got some initial architecture in
place, how do I test everything? Using the Apache benchmark ‘ab’
program seems to only measure the performance of downloading one
single page, so it wouldn’t measure the effect of having a couple
images, javascript includes, css files, etc.

Thanks for all your responses, even the snarky ones, :smiley:
Joe

<…>

Is it impossible to imagine that someone has a good idea, has done
some research, is slightly over optimistic (but not necessarily
wrong!), and want to get an idea of what it might take to handle that
sort of load?

Regards,
Rimantas

http://rimantas.com/

By the way, there is a rather large archive of PHP talks at
http://talks.php.net

-NSHB

On 7/30/06, Nathaniel B. [email protected] wrote:

done.

that many page views per day, even in bursts, already has the
or Amazon at inception?
Rails mailing list
President & CEO
Inimit Innovations Inc. - http://inimit.com


Kind regards,

Nathaniel B.
President & CEO
Inimit Innovations Inc. - http://inimit.com

Joe Van D. wrote:

Again, this is a large amount of traffic in a burst, a short amount
of time. Think victoriasecret.com advertised during the superbowl.
Not quite at that level, but the general idea applies.

In my situation, handling 500 to 1,000 dynamic pages per second
without any slow down would be great, and quite honestly, is probably
all we’ll ever need. But, the folks I’m doing this for want to be
assured that it’s not too difficult to go higher. I don’t have much
experience at that level of performance, hence the question.

You’re not defining what a “burst” is. The issue you will face at very
high load-levels is this: depending on a lot of factors in your
application and in your infrastructure, you may find that scalability
barriers emerge at particular load levels that are not easy to break
through simply by adding hardware. How close can you get your
architecture to true shared-nothing? At the end of the day, you’re
sharing the network infrastructure among machines, so you can’t really
get all the way to shared-nothing.

In my experiences on extremely high-load sites, I’ve seen the barriers
emerge the earliest in the RDBMS. (And I’ve seen people try to address
this with enormously expensive computers and Oracle licenses, which only
gets you so far. A far better answer is not to use an RDBMS.) Another
barrier is when you try to use a standard web server like Apache and it
runs out of gas. At this point most people write their own custom web
server, generally using an event-driven model, that knows a lot about
how their dynamic data is structured. Yes, this breaks down all of the
commonly-accepted principles of software engineering. And yes, this
breakdown is easy to cost-justify at the highest load-level
requirements. (Remember, Google went so far as to write their own file
system, a very odd duck with engineering parameters that don’t match any
application I can imagine, apart from Google.) At really high levels,
I’ve even seen the interpacket delay on switched ethernet links inside
the server farm become the bottleneck.

Your most important question is about the economics of the site. When
you get those 10,000/second bursts, is it feasible for the site simply
to fail and start sending 503 responses until it recovers? As ugly as it
sounds to a techie, that is often the right answer from a business point
of view. On the other hand, what if most of the actual value of the site
is created in those few seconds of top load? (Victoria’s Secret, your
example, got huge publicity out of those high-traffic events, which
crashed the first time they ran it. But this is a very unusual example.)
In this case, it makes sense for you to design to the highest-traffic
case, and as I’m suggesting, you may hit barriers that you will have to
solve in very non-standard ways.

On 7/31/06, Rimantas L. [email protected] wrote:

<…>

Is it impossible to imagine that someone has a good idea, has done
some research, is slightly over optimistic (but not necessarily
wrong!), and want to get an idea of what it might take to handle that
sort of load?

Getting Real

I just knew someone would post a link to that. :smiley:

I think one little 386 box with a 56k modem connection will do nicely.

Anything that is amazing enough to make that many people run to their
computer all at once is worth waiting for!

jp