Rails doesn't scale because RI/transactions in app not db!

damnpenguins · May 30, 2007, 12:14am

Not that I have your attention! I’ve been having a discussion with a
respected colleague here:

http://phpbuilder.com/board/showthread.php?t=10340411 (Rails vs. PHP)

He’s basically saying that Rails won’t be able to scale because
referential integrity and tranactions are done in the application rather
than the database. Please read the referenced thread for much more info.

He also says that he doesn’t think any large-scale transaction system
done using Rails exist because he hasn’t heard of them. Does anybody
know of any?

Thanks!
CSN

damnpenguins · May 30, 2007, 12:42am

Cs Sn [email protected] wrote:

He’s basically saying that Rails won’t be able to scale because
referential integrity and tranactions are done in the application rather
than the database. Please read the referenced thread for much more info.

I’m a bit concerned about that too, so I’ve been sticking every piece
of
validation I can into the database layer as opposed to in ruby code.

To be fair, having referential integrity in the ruby side shouldn’t
slow
things down, although it does make your database more fragile;
references
arent loaded unless you ask them to be. A regular index on a join column
would be just as fast as a foreign key index, possibly faster on
updates/deletes.

However, you lose a lot of the integrity part of that if you leave
this
up to rails. The main reasons referential integrity and constraints
exist
are to to defend your data layer from bugs in the application layer.
Stuff
like preventing a customer that has an order pending from being deleted,
or
making sure an invoice can not be saved with a total of less than one
cent.
Sure, you can code all of this into the ruby side, but then if you’re
working on something by hand (in script/console or doing raw SQL
queries)
you can still mess things up pretty badly.

The way rails handles transactions does bug me a bit though. If I need
to
do a complex sequence of changes in a database as one transaction, I
will
probably do it as a stored procedure on the database side and just call
out
to that. I’m already doing this for “side effects” in one of my
applications… in several places, but here’s one example: I have a
table
for hostnames, and a table for domain names. Hosts belong to Domains.
Hosts
have a “full_name” column that combines their name with the name of
their
parent (eg; host “foo” in domain “bar.com” gets the full_name
“foo.bar.com”). Instead of coding this as a before_save action on the
rails
side, I made it a “BEFORE INSERT OR UPDATE” trigger on the postgresql
side.

I think leaving as much of the data massaging/validating up to the
database as possible is not just a bit faster, but also probably a lot
safer.

Rails needs to provide these features because not all database engines
support them (eg; SQLite)… but if something to do with the data model
can
be done in the database, I think it should. That doesn’t mean that you
should do something in the database layer JUST because you can. The
database
should only been concerned with handling the data. I’ve seen stored
procedures that actually do things like send email, and that, well,
scares
me.

Cheers,
Tyler

damnpenguins · May 30, 2007, 1:19am

On May 29, 2007, at 3:14 PM, Cs Sn wrote:

info.
Neither referential integrity nor transactions are handled by Rails,
but Rails does provide methods for handling both elegantly and
simply, including niceties such as object rollback in the application
on transaction failures.

Also, there seems to be a misunderstanding of speed -vs- scalability
that I find all too common in these discussions. While they appear
related, are related in many circumstances, speed generally comes
from efficiency while scalability generally comes from architecture.

Say, for instance, that one system generates 1,000 requests per
second per measure of computing performance, while another generates
100.

Which is more scalable?

Be careful, it’s a trick question…there is simply not enough
information to answer the question!

The important question is: what and where are the bottlenecks?

Say the 1,000 request per second system bottlenecks at the DB at
2,000 requests per second, while the 1 request per second bottlenecks
at 10,000 requests per second at the network…

Obviously the 1,000 request per second system is more efficient in
terms of $/request/second up to 2,000 requests per second, but does
it scale?

As to your discussion with your friend: Any work removed from the DB
is good for scaling, IMHO. Stored procedures are efficient, but
scaling the DB is nearly always the most onerous part of a really
large web project, so putting work from the DB layer into the
application layer is likely less efficient, but more scalable.

–
– Tom M., CTO
– Engine Y., Ruby on Rails Hosting
– Support, Scalability, Reliability
– (866) 518-YARD (9273)

damnpenguins · May 30, 2007, 8:43am

I’m certainly no expert, but I read a presentation by Bruce Tate on the
architecture of eBay (probably presented at one conference or another).
They basically decided to implement the RI and constraints in the
application, rather than the database, for performance reasons. He says
their application layer is very carefully written to prevent corrupting
the data.

n

damnpenguins · May 30, 2007, 8:45am

Sorry, not Bruce Tate. Here’s the link:

http://bitworking.org/news/The_eBay_Architecture

n

damnpenguins · May 30, 2007, 10:06am

On 5/30/07, Cs Sn [email protected] wrote:

Not that I have your attention! I’ve been having a discussion with a
respected colleague here:

http://phpbuilder.com/board/showthread.php?t=10340411 (Rails vs. PHP)

He’s basically saying that Rails won’t be able to scale because
referential integrity and tranactions are done in the application rather
than the database. Please read the referenced thread for much more info.

Transactions are handled by the db.

And RoR doesn’t prevent you from doing RI in the db. However, you
kindof need it in the app layer as well, for simple error handling and
so on, which certainly adds some overhead. Yay for mysql…

My main grief from a scalability pov is lack of support for prepared
statements / bind variables. That’s 30-80% more load on your db, which
is the hardest layer to scale out.
However, i think it’s being worked on…

Isak

damnpenguins · May 30, 2007, 6:27pm

Obviously a database addict.

Database ‘gurus’ who bang on about referential integrity and
transactions are just like those who go on about how you can’t right
correct code without static typing and a compiler.

Basically they are terrified they might be wrong and out of a job.

Rails is an object oriented system. The integrity of the application
depends upon the integrity of the object network within the
application. And that depends upon your testing regime.

The database is treated as just a big hash table with knobs on. If the
filesystem was clever enough we’d probably use that instead and save a
lot of hassle. A RDBMS is frankly overkill in the context of a Rails
app.

damnpenguins · May 30, 2007, 6:44pm

Neil W. [email protected] wrote:

Err, that’s what the directory marked ‘test’ is for.

All you are doing here is stating that you don’t trust your testing
regime and loading your database with unnecessary checks.

It doesn’t matter who I trust. The database should not trust the
application. The database is the custodian of the data, and if it
lets
the data get corrupted by outside hands, that’s it’s fault and something
that it could have prevented.

The “application” isn’t always the rails app, either. It might be you
going in and doing queries manually. And there are numerous bugs that
would
not get caught by conventional unit testing, that would get caught by a
strong database layer - for instance, a dispatcher process getting 'kill
-9’ed in the middle of a complex set of data manipulation (which really
should have been a stored procedure on the SQL side in the first
place…)

Cheers,
Tyler

damnpenguins · May 30, 2007, 6:53pm

We can go back and forth on this until the end of time. There’s
obviously no
“God-ordained right way” to deal with databases and referential
integrity
(see the eBay link above) no matter how hard people push each side. If
you
have multiple applications hitting the same database, then yes, you’ll
pretty much need DB integrity built into your tables, otherwise it’s
really
a personal choice taking into account usages, scaling and what not.

Personally, I HATE stored procedures as they, depending on the DBMS, are
quite difficult to write tests for and add another external dependency
on
the application.

I wonder if the RevolutionHealth has anything to say on this subject.
I’m
gonna search their blog, see what’s up.

Jason

damnpenguins · May 30, 2007, 7:09pm

Jason R. [email protected] wrote:

We can go back and forth on this until the end of time. There’s obviously no
“God-ordained right way” to deal with databases and referential integrity
(see the eBay link above) no matter how hard people push each side. If you
have multiple applications hitting the same database, then yes, you’ll
pretty much need DB integrity built into your tables, otherwise it’s really
a personal choice taking into account usages, scaling and what not.

You’re right. The original concern was that the lack of
RI/transactions on
the DB side would cause scaling problems. I guess the point I was trying
to
illustrate was, that when you are developing in rails, you can still
take
advantage of pretty much everything your RDBMS has to offer.

Cheers,
Tyler

damnpenguins · May 30, 2007, 6:31pm

Err, that’s what the directory marked ‘test’ is for.

All you are doing here is stating that you don’t trust your testing
regime and loading your database with unnecessary checks.

On May 29, 11:41 pm, Tyler MacDonald [email protected]
wrote:
The main reasons referential integrity and constraints exist

damnpenguins · May 30, 2007, 7:34pm

Neil W. wrote the following on 30.05.2007 18:27 :

application. And that depends upon your testing regime.

The database is treated as just a big hash table with knobs on. If the
filesystem was clever enough we’d probably use that instead and save a
lot of hassle. A RDBMS is frankly overkill in the context of a Rails
app.

That’s the other extreme point of view and I don’t think it matches the
reality more than the opinion of some DBAs willing to put all
constraints in database.

I wouldn’t have been able to code several of my Rails apps if it weren’t
for ACID support. I even had to use SERIALISABLE PostgreSQL transactions
with nearly raw SQL UPDATEs to make sure that there was no way some
buckets could have less than 0 in the DB under concurrent accesses.

Lionel.

damnpenguins · May 30, 2007, 8:03pm

Thanks for all the answers!

I agree with Tyler and think I’d probably put redundant RI in the
database just to protect against me or others when using the database’s
command line (which I prefer to Rails’ console) or php**admin. It also
protects against other apps that use the database (which are maybe
written in PHP, etc.).

Yay for MySQL? Yay for PostgreSQL!

CSN

damnpenguins · May 31, 2007, 11:36am

So if the filesystem was ACID compliant and network available, would
you miss the constant SQL parsing, the unused permissions system,
consistency checks and stored procedures?

AFAICS Rails only uses a SQL RDBMS to store its data because that’s
the best alternative open to it at present. It doesn’t need one.

NeilW

On May 30, 6:33 pm, Lionel B. [email protected]

damnpenguins · May 30, 2007, 7:29pm

Tyler MacDonald wrote the following on 30.05.2007 18:44 :

Neil W. [email protected] wrote:

Err, that’s what the directory marked ‘test’ is for.

All you are doing here is stating that you don’t trust your testing
regime and loading your database with unnecessary checks.

It doesn’t matter who I trust. The database should not trust the
application.

Then code your application in the database. See you next century

The database is the custodian of the data, and if it lets
the data get corrupted by outside hands, that’s it’s fault and something
that it could have prevented.

I don’t think you can reasonably describe all constraints on your data
in the database. This probably could be done with trigers and stored
procedure but would be a maintenance hell. Next thing you would push the
access control to the database layer too (some have tried and this has
been a nightmare for them : RDBMs supports for groups and acls is not
standardised and make migrations across them impossible, everytime the
user/group attributes aren’t enough and you must add other tables to
store basic informations which complicates your object model, …).

At this time the best thing in my opinion is to combine the
transactional capabilities and foreign keys support of RDBMS with
validation checks out of the database (which makes it far easier to
report errors to users BTW, a RDBMs raw error string isn’t exactly
useful to the end-users).
Putting your validation code out of the DB means that migrating your DB
to new constraints is far easier.
I’ve done several migrations where constraints that were true in the
previous application versions weren’t anymore which meant some data
became invalid, it’s far easier to handle this problem with external
code : you can run all your validating code on current data while it’s
still in use just to measure the extent of the problem and evaluate the
time needed to reformat some columns for example. With constraints in DB
you have far less friendly tools : the DB can refuse changing
constraints due to invalid data or accept them without a posteriori
problem detection (when the check is only done at INSERT or UPDATE time
for example) → your data becomes broken. Solving this is possible but
far more difficult than it is at the application level.

The “application” isn’t always the rails app, either. It might be you
going in and doing queries manually.

I agree you’re far better off with only one consistant mean of accessing
your data. But that doesn’t mean it must be the DB, with Rails you are
usually advised to provide a REST interface in order to make other
applications use the same validation code. The script/console tool
replaces the db command-line tool with the same benefit.

And there are numerous bugs that would
not get caught by conventional unit testing, that would get caught by a
strong database layer - for instance, a dispatcher process getting 'kill
-9’ed in the middle of a complex set of data manipulation (which really
should have been a stored procedure on the SQL side in the first place…)

Rails supports transactions, a “kill -9” doesn’t harm your data unless
you are working outside transactions, no need for stored procedures,
point blank.

Lionel

damnpenguins · May 31, 2007, 3:48pm

On May 31, 11:40 am, Lionel B. [email protected]
wrote:

The current state is quite nice for me too :

to the point where PostgreSQL is would not be an easy task and I’m
wondering if the result could be much faster or simpler.

Lionel.

An interesting discussion, that predictably has the people with
database backgrounds taking a different view to the people with app
server backgrounds - see if you can guess where my background is :o)

Some recurring points made here are:

Rails should work equally well with any database - one of the biggest
failures of database centric projects, based on over 12 years
experience according to Tom Kyte (Oracle expert) is people treating
the database as a black box for keyed reads. To really make Oracle/
Mysql/Postgres/SQL Server scale, you have to use it the way it was
intended. I know Oracle well, and to not use Bind variables is a
cardinal sin and the most common thing done wrong by Java app
developers (and Rails for that matter). Rails was designed to use
Mysql primarily, which the last I read doesn’t have any concept of
bind variables, hence Rails not using them. Lets not even get started
on the different locking, read consistency and transaction mechanisms
used by each database, and powerful SQL constructs available in some
DBs and not in others.

RI should be in the app - Apps come and go, 5 years ago it was a Perl
CGI app, then came PHP just before some pointy haired boss decided
J2EE was the way to go. I bet that in all this time the database
structure hardly changed at all. Lets not get started on all the
various back end processes that may need to talk to this database to
do their thing etc. The database simply must protect the data -
constraints and foreign keys are the way forward here (I tend to avoid
triggers like the plague). Now that said, why not let the App
validate RI too? If the app server can spot that the Create Customer
form was submitted with out a surname, and it can avoid a trip to the
database when it knows that and generate a nice error message, thats
fantastic! I feel these RI in app vs RI in DB debates are a bit like
form validation in JS or form validation on the server - both gives
the best user experience, failing to do it on the server lets hackers
have fun corrupting your data.

New version of the app may require database changes - its a rare
release that doesn’t require some, I wouldn’t loose any sleep over a
few alter constrain sql commands in the scheme of things.

Testing will catch all bugs that could corrupt your data - so all
tested software is perfect then? New bugs are never discovered? :o)

Just my two cents - For the record I love Rails and what it lets us do

having coded small apps using mod_plsql and Oracle stored
procedures, Rails is light years ahead, but its important we don’t get
carried away and forget what databases were designed for!

damnpenguins · May 31, 2007, 12:41pm

Neil W. wrote the following on 31.05.2007 11:35 :

So if the filesystem was ACID compliant and network available, would
you miss the constant SQL parsing, the unused permissions system,
consistency checks and stored procedures?

I’m not sure if ACID covers the SERIALIZABLE aspects. If it does, yes it
would probably suit my needs.
But wouldn’t you get something really close to a RDBMS ?

The current state is quite nice for me too :

SQL parsing is a problem only because ActiveRecord doesn’t support
prepared statements (yet). And you need some standard to query the
storage repository because doing full table scans on the client is
simply out of the question. So doing some query syntax parsing isn’t a
problem, in fact it is a solution to avoid huge loads on your storage
repository (the problem is that it isn’t as optimised as it could yet,
reparsing the same query again and again is a waste of time).
The unused permissions system is not a problem (the overhead is most
probably negligible).
Stored procedures aren’t a problem unless they are used

So the current state isn’t so bad in my opinion.

To handle a non-trivial amount of data without borking it or suffering
performance problems, in my opininon you need network support,
serializable transactions (even transactional schema updates are a big
plus for migrations), indices, foreign keys and a rich query syntax :
that’s a simplified (in fact not so simple, MySQL didn’t even support
transactional schema updates last time I looked and serializable
transactions were not available in earlier versions) RDBMS, trying to
reinvent a simple one could be done, but for example I’m afraid getting
to the point where PostgreSQL is would not be an easy task and I’m
wondering if the result could be much faster or simpler.

Lionel.

damnpenguins · May 31, 2007, 4:21pm

On May 30, 2007, at 12:27 PM, Neil W. wrote:

Database ‘gurus’ who bang on about referential integrity and
transactions are just like those who go on about how you can’t
right correct code without static typing and a compiler.

I could see that claim for referential integrity, but transactions?
How do you test the case where you lost power in the middle of a
database update?

-faisal

damnpenguins · May 31, 2007, 4:51pm

On May 31, 2007, at 5:35 AM, Neil W. wrote:

So if the filesystem was ACID compliant and network available,
would you miss the constant SQL parsing, the unused permissions
system, consistency checks and stored procedures?

If the filesystem was ACID compliant we’d live in a different world.

-faisal

damnpenguins · May 31, 2007, 5:58pm

On 5/31/07, Faisal N Jawdat [email protected] wrote:

-faisal
I don’t understand the “rails handles transactions in the app.” It
doesn’t. The transactions are all handled by the database. You write
Ruby code (Client.transaction do…) but that just starts a
transaction in the db.

I’ve come up with a pretty simple strategy. Constraints like NOT
NULL, length, etc are handled by my app. They’re really easy to test,
often change a lot, and over all are easier when managed by my app. I
use foreign key constraints because my RDBMS is meant to handle
relations.

I manage meaning in my app, and relations in the database. This
strategy keeps me happy working with my code and lets me sleep easy at
night.

Pat