Rails suitable for highly scalable apps?

Hello. I’m new to Ruby & Rails, though a veteran at engineering large-
scale distributed systems.

I have a new project which requires a REST API and simple web UI and
after reading (superficially) about RoR on and off over recent years,
I thought it was time I took it for a spin for a new project. It is a
dream ‘ground-up’ project with no legacy requirements.

However, I’ve hit a speed-bump and I’m unsure if it is a limitation of
Rails or just my lack of in-depth understanding of the code base.

When using the generator to generate a new model class, Rails chooses
an auto-increment int id as the primary key by default(!). This is
obviously pretty poor form for numerous reasons, such as:

  1. Completely at odds with scalability and distributed implementation
    of a DB since it introduces an unnecessary need for centralization
  2. Depending on the DB engine, you might run out of primary keys as
    soon as you hit 2^32 rows
  3. A security vulnerability waiting to happen - unless you pay close
    attention, it would be easy to expose ids to the public in a multi-
    user environment for which guess-ability of some resource ids is bad
    practice

By 1, I’m talking about indefinitely scalable distributed
implementations (since the term ‘scalability’ is used to mean a wide
variety of things, from vertically scaling a web app performance by
adding memory to a server to horizontally scaling with limits where
adding resources, such as servers, eventually becomes a case of
diminishing returns).
An easy way to check if your architecture is fully distributed and
performance of operations is independent of data size etc., is to do a
quick thought experiment where you imaging to have a ridiculous amount
of data, users etc. For example, would the performance for a user be
significantly effected if your database was so large it needed to be
spread over a trillion servers? If the answer is yes, then your
architecture is not indefinitely scalable as there is some
centralization introducing a dependency between performance and data
set size, of user-base size or whatever.

So, if you had a trillion DB servers, auto_increment could never work
because to determine which is the next id would require querying them
all to figure out what the largest existing id is (or, alternatively,
keeping the ‘next id’ stored in a central place - which will be a
performance bottleneck when a trillion servers have to hit it up for
every insert).
(for the purists, notice I said “significantly” above. For example,
consider the design of the DNS system and imagine if records had no
TTL - living on indefinitely. The load on the root servers would be
vanishingly small and it would hardly matter if they were out of
service for short periods).

Obviously, nobody has a trillion servers, but engineering systems to
be highly-scalable isn’t hard and is good practice anyway (- in case
your client’s service becomes the next Facebook, in which case you
won’t have to touch anything - just spool up more and more cloud
servers and sit back rather than watch as their business fails due to
users leaving a sinking ship of slow or failed page-loads ).

Now, I’ve surfed around the web for information about how to use
custom ids or other primary key columns in Rails, but have only found
confusion (ignoring people who ask why and/or say not to do it).
Examples given seem to differ (perhaps due to changes before Rails 3?)
and I can’t get any of the ideas to work.

For example, supposing I wish to use UUIDs for primary keys. I’ve
tried variations on:

class CreateItems < ActiveRecord::Migration
def self.up
create_table :items, {:id => false} do |t|
t.string :id, :null => false, :limit => 36, :primary => true

  t.timestamps
end

end

def self.down
drop_table :items
end
end

However, the :primary doesn’t seem to work (perhaps is invalid) and
the table generated doesn’t have a primary key. I can use add_index
to add a :unique index, but it isn’t primary. Obviously, I’ll need
some hooks to generate the UUIDs - I’ve delved into that part.

So, can Rails really handle this in a clean way and have scaffolding
work etc? How? Can someone kindly clue me into what I need in the
migration, model class and anywhere else? I’d prefer to avoid DB-
specific SQL execution (while I’m testing this on MySQL, that
obviously isn’t a distributed scalable technology so I’ll be using a
distributed store ultimately).
I’d also like some tables to have natural (domain specific) primary
key values, a related though perhaps separate issue (and less
critical).

I’ve achieved similar on another project using Grails by writing a JPA
implementation. I’m really hoping Rails can do this without having
the source hacked.

Any help or pointers are greatly appreciated.

Cheers,
-David.

On Apr 4, 5:01am, DavidJ [email protected] wrote:

So, if you had a trillion DB servers, auto_increment could never work
because to determine which is the next id would require querying them
all to figure out what the largest existing id is (or, alternatively,
keeping the ‘next id’ stored in a central place - which will be a
performance bottleneck when a trillion servers have to hit it up for
every insert).

I think master master mysql setups do things slightly differently, you
can set things up so that with (for example)
3 servers one of them has auto increment keys that look like 3n, the
next one 3n+1, and the third 3n+2, so a given server only needs to
track the auto increment it last assigned. No idea how far this
scales, even with 64bit ids you would’t have much room with 10^12
servers

However, the :primary doesn’t seem to work (perhaps is invalid) and
the table generated doesn’t have a primary key. I can use add_index
to add a :unique index, but it isn’t primary. Obviously, I’ll need
some hooks to generate the UUIDs - I’ve delved into that part.

:primary_key is hardwired to be an integer on mysql, and I believe on
on postgres and other dbs too. If you want a primary key of a
different type I think you’ll need to add the column as whatever data
type you want and then run a lump of sql to mark it as the primary key

So, can Rails really handle this in a clean way and have scaffolding
work etc? How? Can someone kindly clue me into what I need in the
migration, model class and anywhere else? I’d prefer to avoid DB-
specific SQL execution (while I’m testing this on MySQL, that
obviously isn’t a distributed scalable technology so I’ll be using a
distributed store ultimately).

If that is your end goal you might not want to spend too much time
with activerecord since it only really does SQLish things (ie not
mongodb, couchdb etc.)

Fred

On Monday, April 4, 2011 6:35:51 AM UTC-4, Frederick C. wrote:

[…]
:primary_key is hardwired to be an integer on mysql, and I believe on
on postgres and other dbs too. If you want a primary key of a
different type I think you’ll need to add the column as whatever data
type you want and then run a lump of sql to mark it as the primary key

Nasty. I was hoping that wasn’t the case. I had a quick look at the
source
for rails and the mysql2 gem and saw code that looked like the
‘primary_key’
type was being hard-coded to ‘INT(11) NOT NULL AUTO_INCREMENT’.

If that is your end goal you might not want to spend too much time

with activerecord since it only really does SQLish things (ie not
mongodb, couchdb etc.)

If I understand correctly, without using ActiveRecord, the scaffolding
won’t
work either. I’m guessing there are other parts of Rails that depend on
AR
too.
I guess the answer to my subject question is “no” - Rails isn’t suitable
for
modern scalable web apps. Sadly.
It baffles me why a relatively young project would cripple itself by
making
use of legacy architecture almost mandatory.

btw, there is nothing in the SQL standard that makes RDBMS inherently
unscalable, it is just that the only implementations of RDBMs’
currently publicly are not scalable (in the indefinite sense) - though
that
is about to change. I’ve used no-sql stores before, but I prefer not to
give up SQL unless necessary (and it isn’t). The only issue with SQL is
that there is no standardization on how to handle domain-level conflict
resolution (which is a given in a distributed system because inter-node
communication can never be infinitely fast - even if networking
technology
advances, Einstein tells us that much).

My next plan is to spend a little effort creating some custom code to
attempt to use UUIDs as primary keys and if that proves to be too much
work,
I likely use Grails instead.

Appreciate your reply.
Cheers.

On 4 April 2011 15:59, DavidJ [email protected] wrote:


If I understand correctly, without using ActiveRecord, the scaffolding won’t
work either. I’m guessing there are other parts of Rails that depend on AR
too.

I am surprised that whether scaffolding works for you or not is
relevant. It is certainly not appropriate for the sort of app you are
describing.

My next plan is to spend a little effort creating some custom code to
attempt to use UUIDs as primary keys and if that proves to be too much work,
I likely use Grails instead.

This might be helpful
http://amthekkel.blogspot.com/2009/02/ruby-on-rails-how-to-use-guid-for-use.html

Google for
rails guid primary key
and
rails legacy database
for more suggestions

Colin

On Mon, Apr 4, 2011 at 4:59 PM, DavidJ [email protected]
wrote:

If I understand correctly, without using ActiveRecord, the scaffolding won’t
work either.

Scaffolding is a tool for demos and super rapid prototyping, it is not
intended to be used for regular code. It is certainly not relevant to
your concerns.

On Apr 4, 4:19pm, Xavier N. [email protected] wrote:

On Mon, Apr 4, 2011 at 4:59 PM, DavidJ [email protected] wrote:

If I understand correctly, without using ActiveRecord, the scaffolding won’t
work either.

Scaffolding is a tool for demos and super rapid prototyping, it is not
intended to be used for regular code. It is certainly not relevant to
your concerns.

ActiveModel does a good job of smoothing that sort of stuff over (but
I of course agree that if you are scaling to a trillion servers it
won’t be using rails scaffolds)

Ferd

On Apr 4, 2011, at 11:25 AM, DavidJ wrote:

On Monday, April 4, 2011 11:17:34 AM UTC-4, Colin L. wrote:
I am surprised that whether scaffolding works for you or not is
relevant. It is certainly not appropriate for the sort of app you are
describing.

It isn’t, but being new to Rails I’m uncertain what other Rails functionality
uses or assumes AR. That is, if I don’t use AR, what will the impact be and what
will be left that Rails is actually bringing to the table?
Of course, right now, I am in the ‘prototyping’ phase, so scaffolding would have
been ‘nice’ is all.

Just to ask the obvious question, do you know whether this app will
need to scale yet? There is a thin line between best practices and
premature optimization.

Generally if I’m scaling to a bazillion servers, I’m gonna be using a
functional programming language with one or more NoSQL data store and an
asynchronous messaging architecture with eventual consistency. If
someone is asking me to build a first cut for a startup, I’m not gonna
try to do that using Scala, Lift and Mongo. I’m gonna build it quickly
in Rails and then I’ll refactor performance critical subsystems -
initially through caching and eventually through re-writing in more
appropriate stacks for scaling if for any reason Rails isn’t taking me
where I need to be.

I’m a big fan of Grails, but I generally build most of my web apps in
Rails unless I’m working with a Java shop (Groovy is less of a
conceptual leap than JRuby) or need really tight integration with
Spring, Hibernate or something else quintisentially Java. I find I can
usually build something quicker in Rails.

So I’d start by just confirming that you really have a scale problem.
If you’re rewriting an existing app that already has substantial load,
it makes perfect sense to be focusing on this now. If it’s a start up
venture (whether within an existing business or not), I’d focus on
failing and iterating quickly. If your problem ends up being that the
stack you started with isn’t scaling the way you want that is a really
high quality (and unfortunately a really rare) problem to have.

Best Wishes,
Peter

On Monday, April 4, 2011 11:17:34 AM UTC-4, Colin L. wrote:

relevant. It is certainly not appropriate for the sort of app you are
describing.

It isn’t, but being new to Rails I’m uncertain what other Rails
functionality uses or assumes AR. That is, if I don’t use AR, what will
the
impact be and what will be left that Rails is actually bringing to the
table?
Of course, right now, I am in the ‘prototyping’ phase, so scaffolding
would
have been ‘nice’ is all.

This might be helpful

http://amthekkel.blogspot.com/2009/02/ruby-on-rails-how-to-use-guid-for-use.html

Thanks, I saw that one (and the other copies of it) and have spent a day
googling around already.
Cheers.

How many servers does Twitter have? I’m just curious how many
applications - in the real world - need scaling to the level discussed
here. I’m new to RoR and have spent my decades in computers building
much smaller scaled implementations. i.e. Corporate apps. So, I don’t
really have a handle on this.

However, I’m curious as to how big of an issue this really is. If
Twitter runs on RoR (and I’m told it does, but don’t know any
details), that would seem to be a very large implementation. Are they
running into limitations? What systems are bigger than Twitter and how
much? Does anyone have any real data?

Thanks,
Clyde

Twitter had huge scaling problems. While I am a big fan of Rails and
while it is wrong to suggest rails cannot scale, if you really hit
twitter scale you are not going to want to use a general purpose web
framework with a SQL data store. However, almost none of us hit that
scale which is why I build sites in rails and am open to re-architecting
if lightening happens to strike.

Best wishes,
Peter

Sent from my iPhone

No, obviously I don’t know. However, I’ve watched as businesses have
gone
under precisely because they didn’t architect for scale at the outset.
There is no reason not to it as it really is little more effort than
not
doing so.

Consider this scenario which I watched pay out: A site had been
implemented
using a standard LAMP stack (Linux, Apache, MySQL, PHP) with
master-slave replication. It had been running along for a reasonable
time
collecting customers (around 7 months) when the exponential adoption
started
to hit. It was driven by a few factors, including media buzz and
unhappiness with a policy change of a competing site which saw users
switching over en-mass. Within about a week the number of registrations
was
just over 1000x what it was the week or so prior. Their reaction was to
throw more web instances behind the load-balancer and spool up
significantly
more MySQL slave instances.

Unfortunately, another week passed and the site was too slow to be
usable
for customers - page timeouts and just very long load-times ensued while
the
devs tried to retrofit the code to handle the ‘eventual consistency’
that
results when you have many slaves replicating from a master, add a
caching
layer, implement application-level sharding of key tables etc. Well,
there
was a user backlash about the poor performance and the media picked that
up
also. Customer support was so overwhelmed they couldn’t even reply to
most
help tickets. By the third week a new competitor had launched a new
site
that was snappy (easy when you have little traffic). However, the same
customers were signing up with the new competitor in droves and however
they
architected their site (it was run on Amazon EC2) it withstood the test
and
by the end of the month there were almost no active users of the site
left.
After the brand had been ruined by bad press, there were few new
registrations and not enough revenue to cover the costs, the owners just
closed up shop and the business was history.

I’ve seen something similar happen on two other occasions also. It is
easy
to say in hindsight that ‘they should have done this or that’, but they
just
couldn’t react quickly enough.

So, if you think you can take a system with tens of thousands of active
users and loads of existing data and re-architect it for scalability and
migrate a large database to a different technology within a few days,
good
luck to you. I don’t want to even try it.

Not long ago, I agree, that the best options were non-SQL stores, but
that
is changing. While there are not yet any inherently-scalable SQL
technologies on the market, a few are getting ready to launch (and
already
have limited availability for pilots, beta tests etc).

I’m going to stick with the minor additional effort of just architecting
for
scale at the outset and then if a situation like that strikes when I’m
on
vacation, I’ll stay on vacation :wink:

Cheers.

On Apr 5, 2011, at 11:02 AM, DavidJ wrote:

No, obviously I don’t know. However, I’ve watched as businesses have gone
under precisely because they didn’t architect for scale at the outset. There is
no reason not to it as it really is little more effort than not doing so.

I’ve build apps for scale and I can say this is definitively not
true. It’s not just a matter of using UUID’s instead of doubles or
ints for your primary key and a few other tweaks. Firstly, what you need
to change depends on the exact load characteristics. Often a number of
levels of caching are a piece of the puzzle, but a message based
asynchronous approach to mutable state in the app is often required. For
example, I often use an event-sourcing style approach where any entity
state in the db is simple a cache of a query of all of the events that
have happened to that entity over time. it’s a great approach for
scaling writes effectively, but it is definitely extra effort to
implement as none of the mainstream web frameworks think in terms of
events and optimizing for immutability. It’s somewhat easier where
you’re scaling for reads than writes, but there are a lot of usage
specific questions that drive the best scaling strategies and IMO unless
you know out of the box that you will have huge load, it’s pure waste
in the lean software development sense of the term.

Consider this scenario which I watched pay out: I’ve seen something similar happen on two other occasions also. It is easy
to say in hindsight that ‘they should have done this or that’, but they just
couldn’t react quickly enough.

And what percentage of the projects you’ve ever seen have had this
issue? By definition, the number of sites that will be in the (say) top
10,000 for traffic is very small compared to the number of sites that
are built.

Best Wishes,
Peter

On Tuesday, April 5, 2011 11:13:24 AM UTC-4, Peter B. wrote:

I guess I should have qualified that to say that it is little more
effort if
you have the tools available. Having built scalable services several
times,
I’ve the experience and (custom) tools available. You are right, that
starting from traditional/legacy systems such as existing RDBMs and most
frameworks out-of-the-box would be a significant effort until modern
tools
come to market and mature.

[…] By definition, the number of sites that will be in the (say) top
10,000 for traffic is very small compared to the number of sites that are
built.

True, but irrelevant to the owners of those sites those businesses fail
as a
result regardless of how improbable it seemed beforehand. If it
happened to
a customer who’s site I architected, I’d hardly feel good explaining,
after
their business was bankrupted, that I didn’t bother building it for
scale as
it didn’t seem very likely to need it - since only a small percentage
do.

Anyway, most of our customers have high-scalability as a requirement -
so
regardless of if they’re dreaming, that is what they get.

My hope in at looking at Rails for this new project (which isn’t
critical
and hence one I can take the risk of experimenting with a new
technology),
was that being relatively new, it might be less work to incorporate the
features required for scalability. Unfortunately, it isn’t looking that
way.

Cheers.

I think i understand David’s point: “scalable” is true or false.
Could be nice if scalability was planned for from the beginning.
I do not know anything about it, but i hope that the issue discussed
here is only about Rails, i hope that Ruby can be used for scalable
applications.

Alexey.

If the big focus is scalability I’d look at clojure, scala or erlang.

Sent from my iPhone

On Apr 6, 2011, at 4:06 AM, Alexey M. wrote:

I think i understand David’s point: “scalable” is true or false.

I’m not sure that is Davids point at all. It sounds like he has
experience with building scalable applications and notices that some
best practices for scalability like UUID primary keys are not the
default way in Rails. It seems to me that he is more than sophisticated
enough to realize that scalability isn’t a binary choice.

Could be nice if scalability was planned for from the beginning.

I think there are some architectural defaults that could be changed
which would help, but scalability is so large, complex and specific to a
given use case I don’t think there’s any way to “just build scalability
in”.

I do not know anything about it, but i hope that the issue discussed
here is only about Rails, i hope that Ruby can be used for scalable
applications.

Rails has some conventions which are not optimal for scaling certain
types of applications. All languages can be used for scalable
applications - although you need to look at the performance
characteristics at runtime for a given application. The main issue for
scaling with Ruby is that it’s an OO rather than functional language
which raises fundamental issues in managing mutable state. You can work
around that by writing Ruby in a more functional style and using
architectural patterns that don’t depend on such shared state.

I think a way more interesting question is whether Rails is productive
than whether it is scalable. I would much rather build something quickly
in a productive framework and then revisit parts of the app if scale
became a concern than spend way more to build a scalable app that nobody
ends up using. I use Rails for it’s productivity and am enjoying it more
with each project I deliver.

Best Wishes,
Peter

rails has become a lot better in the years thanks to the core devs,
contributors and the community for pushing it that way. it definitely
depends on how you build your application.

there are couple of things i can recommend and i am following them in
practice.

  • optimize your queries like hell.

  • design your database well and so you can avoid joins as much as
    possible. for example, you can have posts and comments. in order to
    display comment count of blog post in your blog home page, instead of
    doing the following query “select count(*) as count from comments
    where post_id = x” you can add a column in posts table called as
    “comment_count” and show them there.

  • use faster solutions wherever you can.

  • use solr or sphinx for full text searching.

  • make sure your views are perfectly fine and does not contain ruby
    code more than enough. remember, the rendering takes time, as well.

  • cache whatever you can.

good luck.

On Sun, Apr 3, 2011 at 11:01 PM, DavidJ [email protected]
wrote:

class CreateItems < ActiveRecord::Migration
drop_table :items
migration, model class and anywhere else? I’d prefer to avoid DB-

Any help or pointers are greatly appreciated.

For now you are going to have to execute some sort of SQL to set
non-standard Rails primary keys. The below article covers all the steps
and
gems you will need to make setting primary keys other than the auto
increment integer work. The SQL call outlined in this article is pretty
standard and you should not have issues moving it from database to
database.

On the issue of scaling, Rails has come a long way since it’s early
inception and the issues Twitter had. Rails can scale, just don’t be
afraid
to do some work to make it happen. Here is a 21 screen cast series from
Gregg P. and New Relic that discuss and show you how to scale
Rails.

http://railslab.newrelic.com/scaling-rails

B.

On Apr 6, 2011, at 10:17 AM, mengu wrote:

rails has become a lot better in the years thanks to the core devs,
contributors and the community for pushing it that way. it definitely
depends on how you build your application.
there are couple of things i can recommend and i am following them in
practice.

I think all of those are really good points, but they relate to
incremental scaling up, not scaling out. They will allow you to get more
performance from a single server. For substantial scale, it’s more a
case of architecting so that you can throw multiple servers at the
problem.

I’ve never tried to build out a Rails app with ten front end servers
speaking to a cluster of back end (SQL) db servers, but I’m guessing
there would be problems with database contention and you’d start to have
to take a lot more interest in failed saves - especially if you need to
scale a write heavy application. That’s where a fundamentally different
architectural approaches based on designing from the get go for eventual
consistency and asynchronous messaging are really important (assuming
you don’t need immediate consistency for most of your app).

Personally I quite like the event sourcing model
(Event Sourcing) as it effectively
gets rid of mutable state and makes any database values for entities a
cache rather than an authoritative source. It’s a different way of
writing things, but it makes scaling out trivially simple. If I get some
time I may have a play to see the best way of providing an easy
implementation of this in the Ruby world. I see something here
(GitHub - cavalle/banksimplistic: Exploring CQRS, Event Sourcing and DDD with Ruby) but haven’t had a
chance to play with it.

That said, I just got brought in to do some architecture on a JRuby app
that is going to genuinely need substantial write scaling from day one,
so I may just get a chance to play with some of this if it makes sense
to keep this in Rails vs. just using Rails as a thin layer and handling
all the contention logic using a message bus or an eventually
consistent, write scalable NoSQL data store like Cassandra with
callbacks for contention handling.

Best Wishes,
Peter

On Thu, Apr 7, 2011 at 4:12 PM, Peter B. [email protected] wrote:

Rails is dead, long live Rails.
Indeed.

In that post, note that the performance improvement is not necessarily
a function of the language. My interpretation is that the major impact
comes from switching to a new async architecture which has been in
addition designed a posteriori, when you know where it hurts, and you
have the numbers, and you have a concrete technology ecosystem in
your company to evolve.

You would not design Twitter 2011 in 2008.