Rails validation is inefficient

fipa · May 23, 2008, 9:27pm

Hi,

I have a User model, with a validates_uniqueness_of :login

The generated SQL for the validation is:
SELECT * FROM users WHERE (LOWER(users.login) = ‘fernando’ AND
users.id <> 10001) LIMIT 1;
and it takes 0.13s to happen (my table has 10.000 rows)

When I use the EXPLAIN instruction on it, it shows that it will use the
primary_key as index, but this is not efficient.

With an index on login and the following MySQL query:
SELECT * FROM users WHERE (users.login = ‘fernando’ AND users.id <>
10001) LIMIT 1
It takes less than 0.001s and the EXPLAIN gives me: primary_key and
login.

How can I remove this “LOWER” instruction from MySQL which is
unefficient, I can code that directly in Ruby.

fipa · May 23, 2008, 9:36pm

On May 23, 8:27 pm, Fernando P. [email protected]
wrote:

Hi,
How can I remove this “LOWER” instruction from MySQL which is
unefficient, I can code that directly in Ruby.

Play with the :case_sensitive option

Fred

fipa · May 23, 2008, 10:00pm

No that doesn’t work. I want to have case insensitive validation.

But instead of having it plugged into the MySQL query with LOWER(), I
want to have ruby do login.downcase, then pass it to MySQL.

fipa · May 23, 2008, 10:35pm

On Fri, May 23, 2008 at 10:00 PM, Fernando P.
[email protected] wrote:

No that doesn’t work. I want to have case insensitive validation.

But instead of having it plugged into the MySQL query with LOWER(), I
want to have ruby do login.downcase, then pass it to MySQL.

Looks like you indeed want case-sensitive SQL with normalized values.
As Frederick suggests :case_sensitive => true should do if that’s the
case. I guess logins are normalized in the database because otherwise
that login.downcase approach wouldn’t make sense.

BTW http://dev.rubyonrails.org/changeset/9248 is relevant.

fipa · May 23, 2008, 11:39pm

Xavier N. wrote:

On Fri, May 23, 2008 at 10:00 PM, Fernando P.
[email protected] wrote:

No that doesn’t work. I want to have case insensitive validation.

There are several things you may wish to consider here.

Rails “validates_uniqueness_of” is a trap that will, over time,
eventually permit duplicate values in your database. The only way to
prevent this is to have an INDEX UNIQUE clause on the column that you
wish to enforce the uniqueness of. This constraint properly lives in
the DBMS and not in the application logic in any case, regardless of the
expressed opinions of others.

The way that I handle this in my migrations is to do this inside the .up
method:

14 add_index :table, :column,
15 :name => :idxU_table_column,
16 :unique => true

Removing “validates_uniqueness_of” gets rid of the auto-generated
LOWER in the SQL but that still leaves the problem of denormalized
(mixed-case) entries colliding on the same normalized value (lowercase).
The way that I dealt with this was to write a keycase method and inject
it into string by placing the file containing this code into
./config/initializers:

1 class String
2 # remove extra whitespace, force left, and down shift letters
3 def keycase
4 strip.squeeze(" ").downcase
5 end
6 end

Now, in your model file just use this:

75 # override assignment
76 def column_attribute=(name)
77 # keycase is a local extension of class String.
78 write_attribute(:column_attribute, name.keycase)
79 end

You can use the .titlecase method from ActiveSupport on the normalized
data in the views if you desire a prettier output than straight
lowercase.

The advantage of this approach over simply doing everything in the Rails
application is that you allow the DBMS planner to make the best use of
the index and you only store normalized data, greatly simplifying the
api for non-Rails clients.

HTH
Jim

fipa · May 23, 2008, 11:59pm

On May 23, 2008, at 4:00 PM, Fernando P. wrote:

No that doesn’t work. I want to have case insensitive validation.

But instead of having it plugged into the MySQL query with LOWER(), I
want to have ruby do login.downcase, then pass it to MySQL.

–

Since you’re using MySQL:

http://dev.mysql.com/doc/refman/5.0/en/comparison-operators.html#operator_equal

then scroll up a few lines to see:

“By default, string comparisons are not case sensitive and use the
current character set. The default is latin1(cp1252 West European),
which also works well for English.”

So let ActiveRecord think that the comparison is case sensitive and
needs no LOWER() and MySQL will do what you want.

(And add a comment to your code and a test to prove it so you’ll know
if MySQL changes its behavior in the future.)

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

fipa · May 24, 2008, 5:39pm

Hey James you actually pinpointed-out a a huge failure in Rails that I
hadn’t figured out concerning the duplicate values that
validates_uniqueness_of won’t detect. And yet the Rails bubble
constantly brags about the benefits of TDD/BDD…

Thank you Rob for the link, it works as expected.

fipa · May 24, 2008, 8:05pm

Hey James you actually pinpointed-out a a huge failure in Rails that I
hadn’t figured out concerning the duplicate values that
validates_uniqueness_of won’t detect. And yet the Rails bubble
constantly brags about the benefits of TDD/BDD…

What does TDD have to do with a validation??

Add a :uniq to the migration that creates your table…

fipa · May 25, 2008, 10:59am

Hi,

I have changed my index to be unique, now when I try to update or
insert a record that would duplicate some entry, I get the following
error: Mysql::Error: Duplicate entry …

I wrapped my statement with a begin…rescue…else…end, but I still
get this error displayed on screen, how do you handle that?

PS: The TDD and validation: even using TDD, it is hard to discover that
validates_uniqueness_of is broken.

fipa · May 24, 2008, 12:13am

On Fri, May 23, 2008 at 11:59 PM, Rob B.
[email protected] wrote:

So let ActiveRecord think that the comparison is case sensitive and
needs no LOWER() and MySQL will do what you want.

(And add a comment to your code and a test to prove it so you’ll know
if MySQL changes its behavior in the future.)

Is it a bug that Rails does not add BINARY?

fipa · May 25, 2008, 12:21pm

Using the unique index technique, I have had to overwrite the save
method according to this wiki page:
http://wiki.rubyonrails.org/rails/pages/HowtoHandleStatementInvalid

How can I detect that the error is a duplicate entry problem? I don’t to
show any other error message to the user if the error is not a duplicate
entry.

You could argue that in that case it shouldn’t exist in the first place, but that’s a different issue.
Yeah, that’s what I was thinking about. Okay the race condition has
0.0000001% chances to happen, but as long as this chance exist, it is
unacceptable, and therefore validates_uniqueness_of shouldn’t exist in
its current state, specially as we are actually rewriting it to handle
this kind of problem.

We could probably even submit a patch if it hasn’t been fixed in Rails
2.1 RC1 yet?

fipa · May 25, 2008, 12:33pm

On Sun, May 25, 2008 at 12:21 PM, Fernando P.
[email protected] wrote:

You could argue that in that case it shouldn’t exist in the first place, but that’s a different issue.
Yeah, that’s what I was thinking about. Okay the race condition has
0.0000001% chances to happen, but as long as this chance exist, it is
unacceptable, and therefore validates_uniqueness_of shouldn’t exist in
its current state, specially as we are actually rewriting it to handle
this kind of problem.

We could probably even submit a patch if it hasn’t been fixed in Rails
2.1 RC1 yet?

Problem is the validation is done outside the database, I don’t think
it is “fixable”. Also in the database you can’t express :scope for
example… I believe there’s a trade-off in the programmer’s side
here. The docs explain the race condition and the validation may still
have some valid uses cases that accept the risk.

fipa · May 25, 2008, 11:40am

On Sun, May 25, 2008 at 10:59 AM, Fernando P.
[email protected] wrote:

PS: The TDD and validation: even using TDD, it is hard to discover that
validates_uniqueness_of is broken.

Oh, it is not that tests fails to identify it is broken, it is known
that it implies a race condition. You could argue that in that case it
shouldn’t exist in the first place, but that’s a different issue.

fipa · May 25, 2008, 12:46pm

The docs explain the race condition and the validation may still have some valid uses cases that accept the risk.
Which docs? I didn’t find anything on the rails api:
ActiveRecord::Validations::ClassMethods

Anyway using the unique index feature of MySQL introduces a little
problem. Which is the validation of this unique field doesn’t show up
with the other Rails validations. It is only when all other Rails
validations are passed that the validation of the unique index appears,
this is because previous validations don’t try to save or update the
record.

fipa · May 26, 2008, 4:56pm

Xavier N. wrote:

Problem is the validation is done outside the database, I don’t think
it is “fixable”. Also in the database you can’t express :scope for
example… I believe there’s a trade-off in the programmer’s side
here. The docs explain the race condition and the validation may still
have some valid uses cases that accept the risk.

The problem is that on the alter of “agnosticism” the ability to trap
and report DBMS errors has been “sacrificed”. Rails is a General Purpose
web application development tool. To be as attractive as possible to the
widest audience its authors have elected, quite sensibly in my opinion,
to provide 80% of the value for 20% of the effort. So SQL support is
reduced to the very lowest common denominator (SQLite).

What it comes down to is that the SQL-92/99/03 standard as to what a
DBMS must report for certain error conditions, SQLSTATE, is not
uniformly implemented. In SQLite3 for instance, a duplicate index value
reports some nondescript generic error message similar to the following:

“SQL logic error or missing database”

This is really not very much help outside of the context of the INSERT
call so that determining the exact nature of the error is perforce left
to the application; in other words: “validates_uniqueness_of”.

I digress for a moment, but it seems to me that the proper way to handle
this in Rails would have been to INSERT first, trap any error and THEN
do a find on the index to see if the error was caused by a duplicate.
If a duplicate is found and is different (based on hash contents of
inserted row less id attribute versus hash of returned row lest id
attribute?) than the row just inserted then report a validation error;
if not then raise the original dbms error. That way the race condition
would never occur and there would, on average, be
((1 / ) - 1) fewer SQL reads.

Returning to the main stream of thought, PostgreSQL 8 returns this:

“ERROR: Cannot insert a duplicate key into unique index”.

While MySQL 5 can return two different messages, depending on whether
one is using NDB or not:

NDB: ERROR 1169 (23000): Can’t write, because of unique constraint, to
table

not NDB: ERROR 1062 (23000): Duplicate entry for key

Note: The common element in these latter two messages, (23000) simply
tells us the SQLSTATE falls under “Integrity constraint violation”.

The exact error code that should be returned in this case is (23505)
which is exactly what PostgreSQL [sqlca.sqlstate], DB2 [sqlca.sqlstate],
and MySQL [mysql_sqlstate()] do return, they just choose not to display
it in the human readable message. This sort of error trapping stuff all
should live in the DB adapters really, but that poses serious coding
problems for Rails when accessing DBMS that do not support the full
SQLSTATE implementation; SQLite3 for instance, which seemingly returns
SQLSTATE=HY000 for everything that goes wrong.

My suggestion is that you seriously consider what DBMS best supports the
production requirements for your application and use it for development
and testing. Your design is in any case going to reflect the specific
implementation details of the DBMS you select, so pick the DBMS first
and after you decide do not worry over much about the other
possiblities.

When convenient, use SQLite3 to sketch out design ideas and to code
proof of concepts, but do not break your heart trying to get a
single-user DBMS to act like industrial strength software. Similarly,
while MySQL has its many adherents that advocate its suitability for
production use I am not one of them.

Remember that in the vast majority of business applications, it is the
data that is valuable and not the present method of its presentation.
The choice as to where the data goes and how it is handled, verified and
secured from corruption, tampering and loss, trumps the latest whiz-bang
programming paradigm every day of the week. Remember too, that any
significant persistent data store is eventually going to be accessed by
more than one application and by more than one programming language in
the course of its existence.

fipa · May 25, 2008, 12:58pm

On Sun, May 25, 2008 at 12:46 PM, Fernando P.
[email protected] wrote:

The docs explain the race condition and the validation may still have some valid uses cases that accept the risk.
Which docs? I didn’t find anything on the rails api:
ActiveRecord::Validations::ClassMethods

Yeah, check the paragraph that starts with “Because this check is
performed outside”.

Perhaps that paragraph could also say something about the gotcha you
mention.

fipa · May 26, 2008, 5:47pm

I’m sort of working on a plugin to enforce it through the validation
mechanism in a way (haven’t had much time to hack on it lately). I’ll
Github it when it’s done (if it ends up working like I think it will).

–Jeremy

On Sun, May 25, 2008 at 6:32 AM, Xavier N. [email protected] wrote:

–
http://jeremymcanally.com/
http://entp.com

Read my books:
Ruby in Practice (Ruby in Practice)
My free Ruby e-book (http://humblelittlerubybook.com/)

Or, my blogs:

http://rubyinpractice.com

fipa · May 26, 2008, 10:44pm

Thank you James for your very informative message.

fipa · May 28, 2008, 4:26am

Quoting F. Perez [email protected]:

When I use the EXPLAIN instruction on it, it shows that it will use the
primary_key as index, but this is not efficient.

With an index on login and the following MySQL query:
SELECT * FROM users WHERE (users.login = ‘fernando’ AND users.id <>
10001) LIMIT 1
It takes less than 0.001s and the EXPLAIN gives me: primary_key and
login.

How can I remove this “LOWER” instruction from MySQL which is
unefficient, I can code that directly in Ruby.

Is your Web site so heavily loaded that it is more cost efficient to
spend
expensive programmer time tweaking the code for creating a new and
unique
login name than simply leave it alone or add more cheap hardware?

Note: validations can be restricted to just certain actions, e.g.

validates_uniqueness_of :login, :on=>:create

This may be much more Railish/Rubyish, takes less programmer time, and
is less
likely to be broken by new Rails & Ruby versions.

Just my $0.02USD,
Jeffrey

fipa · May 28, 2008, 10:42am

On 28 May 2008, at 03:25, Jeffrey L. Taylor wrote:

Is your Web site so heavily loaded that it is more cost efficient to
spend
expensive programmer time tweaking the code for creating a new and
unique
login name than simply leave it alone or add more cheap hardware?

Note: validations can be restricted to just certain actions, e.g.

validates_uniqueness_of :login, :on=>:create

I’m sure Jeffrey knows this, but just wanted to make this explicit for
others (as there is sometimes confusion about this). The :on option
refers to whether an object is being created, updated or saved (which
covers the 2 previous options and is the default. It has nothing to do
with the controller action. You can call your create action bananarama
and :on => :create validations will still fire (it pretty much has to
be this way - your models should not know or care about stuff that
happens way above them).

Fred