Hello. I’m new to Ruby & Rails, though a veteran at engineering large-
scale distributed systems.
I have a new project which requires a REST API and simple web UI and
after reading (superficially) about RoR on and off over recent years,
I thought it was time I took it for a spin for a new project. It is a
dream ‘ground-up’ project with no legacy requirements.
However, I’ve hit a speed-bump and I’m unsure if it is a limitation of
Rails or just my lack of in-depth understanding of the code base.
When using the generator to generate a new model class, Rails chooses
an auto-increment int id as the primary key by default(!). This is
obviously pretty poor form for numerous reasons, such as:
- Completely at odds with scalability and distributed implementation
of a DB since it introduces an unnecessary need for centralization - Depending on the DB engine, you might run out of primary keys as
soon as you hit 2^32 rows - A security vulnerability waiting to happen - unless you pay close
attention, it would be easy to expose ids to the public in a multi-
user environment for which guess-ability of some resource ids is bad
practice
By 1, I’m talking about indefinitely scalable distributed
implementations (since the term ‘scalability’ is used to mean a wide
variety of things, from vertically scaling a web app performance by
adding memory to a server to horizontally scaling with limits where
adding resources, such as servers, eventually becomes a case of
diminishing returns).
An easy way to check if your architecture is fully distributed and
performance of operations is independent of data size etc., is to do a
quick thought experiment where you imaging to have a ridiculous amount
of data, users etc. For example, would the performance for a user be
significantly effected if your database was so large it needed to be
spread over a trillion servers? If the answer is yes, then your
architecture is not indefinitely scalable as there is some
centralization introducing a dependency between performance and data
set size, of user-base size or whatever.
So, if you had a trillion DB servers, auto_increment could never work
because to determine which is the next id would require querying them
all to figure out what the largest existing id is (or, alternatively,
keeping the ‘next id’ stored in a central place - which will be a
performance bottleneck when a trillion servers have to hit it up for
every insert).
(for the purists, notice I said “significantly” above. For example,
consider the design of the DNS system and imagine if records had no
TTL - living on indefinitely. The load on the root servers would be
vanishingly small and it would hardly matter if they were out of
service for short periods).
Obviously, nobody has a trillion servers, but engineering systems to
be highly-scalable isn’t hard and is good practice anyway (- in case
your client’s service becomes the next Facebook, in which case you
won’t have to touch anything - just spool up more and more cloud
servers and sit back rather than watch as their business fails due to
users leaving a sinking ship of slow or failed page-loads ).
Now, I’ve surfed around the web for information about how to use
custom ids or other primary key columns in Rails, but have only found
confusion (ignoring people who ask why and/or say not to do it).
Examples given seem to differ (perhaps due to changes before Rails 3?)
and I can’t get any of the ideas to work.
For example, supposing I wish to use UUIDs for primary keys. I’ve
tried variations on:
class CreateItems < ActiveRecord::Migration
def self.up
create_table :items, {:id => false} do |t|
t.string :id, :null => false, :limit => 36, :primary => true
t.timestamps
end
end
def self.down
drop_table :items
end
end
However, the :primary doesn’t seem to work (perhaps is invalid) and
the table generated doesn’t have a primary key. I can use add_index
to add a :unique index, but it isn’t primary. Obviously, I’ll need
some hooks to generate the UUIDs - I’ve delved into that part.
So, can Rails really handle this in a clean way and have scaffolding
work etc? How? Can someone kindly clue me into what I need in the
migration, model class and anywhere else? I’d prefer to avoid DB-
specific SQL execution (while I’m testing this on MySQL, that
obviously isn’t a distributed scalable technology so I’ll be using a
distributed store ultimately).
I’d also like some tables to have natural (domain specific) primary
key values, a related though perhaps separate issue (and less
critical).
I’ve achieved similar on another project using Grails by writing a JPA
implementation. I’m really hoping Rails can do this without having
the source hacked.
Any help or pointers are greatly appreciated.
Cheers,
-David.