[email protected] wrote the following on 15.02.2007 11:01 :
You got the breakdown correctly.
Pizza boxes make great front end processors, but they do not handle
database tranactioning on the scale of the big iron. When you have
large scale real-time process management problems with dynamic data,
it is time to bring in the big iron for the back end processing.
Agreed. But isn’t it an argument against your previous statement? Rails
doesn’t sit on the backend server and is in itself inherently scalable
on pizza boxes, only the DB and its related needs can lead to big iron
for some needs. Whatever the application server you use, if your DB is a
bottleneck in your kind of workload, you won’t get around it and have an
inherent cost that you can’t hope to get down by changing the
application server technology. So going with Rails in this kind of
situation is still the smart choice.
There is an impact of the application layer technology choice on the
DB load but when you hit the kind of loads that mandates big iron, you
should already have the ressources (smart people and money) needed to
remove the bottlenecks (are you are doomed anyway). In ActiveRecord I
identified the following problems with some existing or projected
solutions:
- no integrated read cache: plugins using MemCache exist (although I
avoided them and prefered to implement cache at a higher level myself
until now), - high number of DB connexions when handling large site (one for each
Rails process): you can use connexion poolers (I know that there is at
least one project for PostgreSQL designed for this), this is not a huge
problem as modern RDBMs can handle thousands of simultaneous connexions
without trouble. - basic Ruby drivers and no prepared statements: AFAIKT the
ActiveRecord::Base API should allow its code to be modified to handle
prepared statements transparently so it will probably come as a plugin
(isn’t there already one?) but the drivers should support them too.
Does this help?
Yes, I can imagine the kind of data workflow at Walmart that make a
mainframe more suitable.
A related thought : the size of the data/problems doesn’t always mean
going on bigger iron, sometimes the DB can be distributed.
Myspace/Youtube/Google/DailyMotion all have a huge DB to handle and
don’t use mainframes but simple x86/x86_64 boxes.
Lionel