How to (re)run validations on existing data

Suppose one had a hypothetical situation in which records were entered
into
a database which were later determined to be invalid.

In this supposedly hypothetical situation, the agile programmer might
add
validations to the models to ensure this doesn’t, err, wouldn’t happen
again.

Following that, there might be some records in the database which were
invalid.

I like thinking about hypothetical questions and coming up with
hypothetical
answers, and I thought that perhaps there might be other readers on this
list who enjoy that sort of game too. So, for those of you who enjoy
that
sort of thing, what would you do to address this?

One thing I’ve thought of would be to write a throwaway rake task to
iterate
through the items in the the affected table(s), calling the validate
function on each one and reporting the results back. If there were only
a
few invalid records, I could probably go in and fix them by hand. If
there
were too many, I might write another rake task to identify and fix them.

If I did that, would I really “throwaway” the rake task when I was done?
Probably not… I’d probably check it into source control and keep it
just
for documentation/history’s sake.

What would you do?

–wpd

Patrick D. wrote:

Suppose one had a hypothetical situation in which records were entered
into
a database which were later determined to be invalid.

Because they were invalid in the first place, or because the criteria
for validity changed?

In this supposedly hypothetical situation, the agile programmer might
add
validations to the models to ensure this doesn’t, err, wouldn’t happen
again.

Yes. And also constraints in the database.

Following that, there might be some records in the database which were
invalid.

I like thinking about hypothetical questions and coming up with
hypothetical
answers, and I thought that perhaps there might be other readers on this
list who enjoy that sort of game too. So, for those of you who enjoy
that
sort of thing, what would you do to address this?

One thing I’ve thought of would be to write a throwaway rake task to
iterate
through the items in the the affected table(s), calling the validate
function on each one and reporting the results back. If there were only
a
few invalid records, I could probably go in and fix them by hand. If
there
were too many, I might write another rake task to identify and fix them.

This seems quite reasonable. If it’s a question of the data structure
changing and affecting validity, then it’s probably a good idea to put
the validation routine right in the migration file.

If I did that, would I really “throwaway” the rake task when I was done?
Probably not… I’d probably check it into source control and keep it
just
for documentation/history’s sake.

Right.

What would you do?

Remember to put checks in the DB wherever possible. Rails validations
are not a replacement for that.

–wpd

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

Patrick D. wrote:

On Thu, Sep 17, 2009 at 9:14 AM, Marnen Laibow-Koser
[email protected] wrote:

[…]

Yes. And also constraints in the database.
That’s a new one for me…

Really? It may not be worth implementing complex constraints in the DB,
but simple ones are almost always a good idea. Remember, the database
is not dumb, and it will work best for you if you respect its
intelligence and allow it to reason as much as possible about the data
it contains. This is one reason that we use a database instead of a
flat CSV file or some such.

There is a tendency among some Rails developers to treat the database as
a dumb data store, but that is generally a bad idea. (Of course, the
opposite extreme – putting all the app logic in the DB – is just as
bad.)

I presume there are ways to implement the
same sort of validates_presence_of

Just declare the field not null.

and validates_uniqueness_of checks
in databaseland,

Declare a unique index. These two things are so basic that you really
shouldn’t be doing database design if you don’t know about them.

but what about custom validations that require, for
example, text be formatted a certain way,

This one may be more trouble than it’s worth in the DB. More than that,
if you have such a constraint, it suggests that you are storing your
data improperly. The DB really should store data, not formatting
(unless the formatting is part of the data, as for example in a CMS
that stores HTML fragments in the DB). If you can say more about your
use case here, I may be able to provide more specific advice.

or a product price must be
at least a cent, etc…

This is easy with a check constraint. Unfortunately, Rails migrations
don’t make check constraints easy, and there’s no plugin to add this
feature AFAIK. One of these days I hope to extend
foreign_key_migrations so it supports check constraints. Till then,
you’re stuck with raw SQL.

I guess I’m asking what is the value added to constraints in a
database that will only ever be accessed via the web application
front-end?

You are asking the question backwards. First of all, you can’t
guarantee that the DB will only ever be used through the Rails app.
What happens 5 years from now when the Rails app is retired and
rewritten from scratch in Lolcode on Kittehs (or whatever the hot new
technology is in 2014)? Surely you will still want to have access to
the old data. Your data will almost certainly outlive your application.

Second, remember that validations in the Rails app can be thought of as
“suggestions”. They only work insofar as data in the DB is only
modified through the Rails ActiveRecord interface. It is still possible
to get bad data into the DB by bypassing the ActiveRecord interface
(which you could even do with a naked INSERT query right from the Rails
app!). The only foolproof way to prevent bad data from getting into
the DB is to supply the DB with the tools to keep it out – and that
means constraints.

In other words, the DB constraints are the primary, foolproof means of
keeping bad data out of the DB. The Rails validations are simply a
layer on top of that, to prevent the DB constraints from being triggered
in the first place, and to implement complex constraints that would be
difficult or impossible to implement in the DB.

One thing I’ve thought of would be to write a throwaway rake task

This seems quite reasonable. �If it’s a question of the data structure
changing and affecting validity, then it’s probably a good idea to put
the validation routine right in the migration file.
In this (um, hypothetical) case, there is no migration file. It’s
just been noticed after the fact that there is something wrong.

OK. Then a rake task, or simply running Model.all.select{|m| !m.valid?}
from the console, is probably the way to go.

Fortunately, this is hypothetically a very small, custom application
that is only used (currently) by a single user, so the effects are not
terribly disastrous.

For now. :slight_smile:

–wpd

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

On Thu, Sep 17, 2009 at 9:14 AM, Marnen Laibow-Koser
[email protected] wrote:

Patrick D. wrote:

Suppose one had a hypothetical situation in which records were entered
into a database which were later determined to be invalid.

Because they were invalid in the first place, or because the criteria
for validity changed?
Probably a little of both. Regardless, now there could potentially be
a problem in the database.

In this supposedly hypothetical situation, the agile programmer might
add validations to the models to ensure this doesn’t, err, wouldn’t happen
again.

Yes. And also constraints in the database.
That’s a new one for me… I presume there are ways to implement the
same sort of validates_presence_of and validates_uniqueness_of checks
in databaseland, but what about custom validations that require, for
example, text be formatted a certain way, or a product price must be
at least a cent, etc…

I guess I’m asking what is the value added to constraints in a
database that will only ever be accessed via the web application
front-end?

One thing I’ve thought of would be to write a throwaway rake task

This seems quite reasonable. If it’s a question of the data structure
changing and affecting validity, then it’s probably a good idea to put
the validation routine right in the migration file.
In this (um, hypothetical) case, there is no migration file. It’s
just been noticed after the fact that there is something wrong.

Fortunately, this is hypothetically a very small, custom application
that is only used (currently) by a single user, so the effects are not
terribly disastrous.

–wpd