Alex Payne (from twitter): Rails doesn't scale


#1

Ok Ok Ok… first off, I apologize for the sliiightly misleading
subject line. But, I thought I’d try to draw some attention and see if
I could get some discussion going (hopefully no flames though!).

I was browsing around and I came across this interview with Alex Payne
from twitter:
http://www.radicalbehavior.com/5-question-interview-with-twitter-developer-alex-payne

The second interview question is what particularly drew my attention
where he says that they have been running into a number of scaling
issues that they probably wouldn’t have with other frameworks.

There are three main points that caught my attention in the response.

  1. “At this point in time there’s no facility in Rails to talk to more
    than one database at a time.”

  2. “setting up multiple read-only slave databases [is not a quick fix
    to implement]”

  3. Ruby + Rails’ syntactical sugar = slow

Point #3 is pretty well known, the solutions always mentioned before
is scale out. However, Alex says that they can’t because of 1 & 2.
I’ve been under the impression (and still am) that doing 1 & 2 really
isn’t that hard.

So, the question is, what to do if you have a rails app and are in
twitters place?

-carl


EPA Rating: 3000 Lines of Code / Gallon (of coffee)


#2

Carl L. wrote:

  1. “At this point in time there’s no facility in Rails to talk to more
    than one database at a time.”

You can set up another model to use another DB connection that’s
configured in your database.yml file. Like so:

class EmailModel < ActiveRecord::Base
self.abstract_class = true
establish_connection “#{RAILS_ENV}_email”
end

I have actual models that then subclass from EmailModel that actually
interact with tables in that connection’s DB.

In this case, the *_email connection just happens to be to another SQL
Server (gasp!) database on the same server, but there isn’t any reason
that I can think of that you shouldn’t be able to configure another
connection to go wherever you want and deal with it.

You just have to keep track of your models, etc.

Does that sound right to everyone?

Wes


#3

On 4/12/07, Carl L. removed_email_address@domain.invalid wrote:

Ok Ok Ok… first off, I apologize for the sliiightly misleading
subject line. But, I thought I’d try to draw some attention and see if
I could get some discussion going (hopefully no flames though!).

  1. “At this point in time there’s no facility in Rails to talk to more
    than one database at a time.”

I have seen Rails deployments that use multiple databases, theres
an illustration of one on the Ruby on Rails site for a german web
community.

Basically there is two databases, each is attached to a farm
of Rails Pizza boxes. Each Pizza box can only ever see one DB,
but the DBs increment their PKs in 2s (one is even the other odd),
so that they can periodically replicate/merge their data sets.

In principle you could use multiple DBs and scale horizontally,
but replication overhead will probably eat into you very quickly.

  1. “setting up multiple read-only slave databases [is not a quick fix
    to implement]”

  2. Ruby + Rails’ syntactical sugar = slow

Point #3 is pretty well known, the solutions always mentioned before
is scale out. However, Alex says that they can’t because of 1 & 2.
I’ve been under the impression (and still am) that doing 1 & 2 really
isn’t that hard.

Well there are different options for 2. I am quite sure that you
actually
spend lots of money on your DB to achieve it.

He did say or imply though that scaling out incurs additional DB
overhead
per Rails instance. Which implies an optimal ratio between DBs & Rails
instances.

So, the question is, what to do if you have a rails app and are in
twitters place?

Pretty much what he is doing already. But then, its not like scaling is
easy in
the first place.

You have to ask yourself though - Twitter got up and running in ~ 9-12
months or so?
In a Java/.NET version would he even be in production by now? How would
the PHP
equivalent fare?


#4

I’m taking the multiple-databases idea even one further: reading from
external databases and dynamically building an AR heirarchy to ease
access
to that database (this is an administration site used to administer many
websites).

The real question needs to be: what server is he running? Is he using
capistrano to help with distributed installs? I’ve heard good things
about
mongrel clusters and other bits of load balancing. Rails is most
definitely
scalable, it just may not be as easy as with other (read: older)
technologies.

Jason


#5

Preface: I’ve been working at Obvious (Twitter’s parent company) since
it was Odeo, and Alex is one of my co-workers.

We’re definitely in a distributed setup, using capistrano, etc, etc.
Our database is not currently a limiting factor, nor is ActiveRecord -
scaling those components was certainly a challenge, but one that we
were glad to have. That we have such a thriving and active community
is, as many have suggested, a testament to the tools that Rails and
the Ruby community in general offer us as developers and designers.

I’ll be giving a talk at the SDForum Silicon Valley Ruby Conference
next weekend on the subject of Scaling Twitter. More details at
http://romeda.org/blog/2007/04/scaling-twitter-talk.html

Blaine
It’s Obvious - http://twitter.com/blaine


#6

You do what you do with any other bottleneck. You take measurements,
understand what your application is doing and then address it. What
you don’t do is panic: Just because Rails might not be able to run
Ebay ‘out of the box’ doesn’t mean that it is a pile of junk.

There are loads of solutions. A few examples are:

  • Scale, replicate and cluster your database. Oracle cracked this
    years ago.

  • Does all of your data in the database need to talk to all of the
    other data? If it doesn’t then partition the data along the
    appropriate fault lines. Technologies such as Amazon EC2 allow you to
    launch a complete machine per client per application per database if
    you want. ‘Shared nothing’ is more than just avoiding threads. It
    means don’t share anything unless you have to - because bottlenecks
    happen at the coupling points.

  • Split the application and glue it together with REST calls. It
    doesn’t all have to live in one stack. Splitting the stack gives you
    multiple databases. For example you could have a historical
    application that looks after what was happening yesterday, and a live
    application that looks after what happened today, with an archiving
    process moving rows between the two. Today is what is important and
    that has the fast response, triggering an AJAX call to the
    ‘historical’ stack which may take a little longer to respond.

But most importantly don’t throw the baby out with the bathwater. Ruby/
Rails gives you productivity by solving the problems inherent in web-
apps. If you suddenly find you have a hit on your hand then that is a
good problem to have - spend some of the money on fixing and improving
the framework/infrastructure/language interpreter, or more likely your
application design.

But in the vast majority of cases scaling is a non-issue. It is much
more important just to get the app written and written quickly -
that’s where Rails still wins.

NeilW