ActiveRecord and data integrity

tma · August 27, 2006, 7:48pm

I’m running into a bit of a ideological and functional problem with
ActiveRecord and would like to find out if some of these concerns are
valid. Hopefully someone here already knows the answer…

The concerns are that ActiveRecord has a lot of activity involving data
validation (validates_uniqueness_of, validates_associated) and model
definition (belongs_to, has_many, has_one). This falls into three basic
areas of data management that might be better handled elsewhere. At
least that’s my olde school thinking.

model definition: belongs_to, has_many, has_one
These are all concepts that have been previously handled by standard SQL
concepts such as:

belongs_to:
This is addressed by the SQL
execute ‘alter table contacts add constraint fk_contacts_users
foreign key (user_id) references users(id)
on delete cascade on update cascade’

has_many:
This is a data model that is not addressed by SQL directly
but is included in the SQL for belongs_to

has_one:
execute 'create unique index idx_contacts (username)

The common concern that I have for each of these is that the
ActiveRecord model may not always guarantee correct behaviors of the
data.

As an example: I have an application which requires that each user have
a unique name for login. If I have a database table [id, username] and
a ActiveRecord model which validates the uniqueness of the username.

What happens to the application when two people enter in the same
username (for INSERT) concurrently?

Is there something about the ActiveRecord which can manage the
Concurrency effectively enough that the underlying database does not
require an UNIQUE index on username?

My past experience with data systems would tell me that the answer is to
enforce a UNIQUE index on the username field and then validate
uniqueness of the data insert by handling the resulting database error
that is going to be thrown.

Additionally, (and this is my lack of understanding of ActiveRecord) I
am assuming that this validation is going to run a SQL statement to see
how many records match that name, and returning none, will insert the
record. That’s two queries instead of one plus error handling. Plus it
doesn’t guarantee the data integrity that is required of the
application.

Similarly, any specific constraints on what kind of data is permitted
(price > 0) is something that can also be added as a constraint to the
database field. But some DBA’s and others would argue against this
since it’s not a data integrity requirement but a business requirement.
Perhaps it would be more agreeable to state the constraint as price >=
0 since we’re pretty sure we don’t want to sell at negative values.

This last example is something that may not be as much of a hard and
fast rule as the previous requirements about data linking from table to
table or perhaps other requirements like a field cannot be null in the
database and the ActiveRecord validates that the field matches a set of
values.

There’s arguably two layers of data integrity: The rules that make the
business of running the web site make sense (price >=0) and the rules
that keep things from completely falling apart (unique username).

One partition is to consider the Model to contain everything that can be
compared to values in the model code or the data provided (in memory)
for that specific event. This would ensure the price >=0 or price > 0
or that a zip code is sane (5 digits US, 6 chars CA). And to handle the
errors returned from the database when the data integration rules are
violated (referential integrity, unique key violations…) rather than
trying to second guess the database.

This is where my thinking it as.
I’m trying to resolve what my options are in developing an application
that is going to provide both a Rails-ish approach yet still keep the
data sane.
Hints?

tma · August 27, 2006, 8:14pm

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom A. wrote:

model definition: belongs_to, has_many, has_one
This is a data model that is not addressed by SQL directly
a unique name for login. If I have a database table [id, username] and
a ActiveRecord model which validates the uniqueness of the username.

What happens to the application when two people enter in the same
username (for INSERT) concurrently?

Is there something about the ActiveRecord which can manage the
Concurrency effectively enough that the underlying database does not
require an UNIQUE index on username?

NO.

Database vendors have spent alot of time investing in ACID compliance.
ActiveRecord allows you to define this because it is domain specific
logic, but if you truly need unique data, add a unique constraint to
your field in the database.

Migrations allow you to set unique constraints as well, and I believe
this is the preferred way to handle this with AR. Although if you don’t
use migrations then just to it manually.

My past experience with data systems would tell me that the answer is to
enforce a UNIQUE index on the username field and then validate
uniqueness of the data insert by handling the resulting database error
that is going to be thrown.

Yep.

Additionally, (and this is my lack of understanding of ActiveRecord) I
am assuming that this validation is going to run a SQL statement to see
how many records match that name, and returning none, will insert the
record. That’s two queries instead of one plus error handling. Plus it
doesn’t guarantee the data integrity that is required of the application.

If you need to know what/why your application is doing something it is
easy to see that in your User model you are enforcing uniqueness on
username, rather then having to go search the DB table structure to
determine that you want uniqueness on username.

table or perhaps other requirements like a field cannot be null in the
database and the ActiveRecord validates that the field matches a set of
values.

There’s arguably two layers of data integrity: The rules that make the
business of running the web site make sense (price >=0) and the rules
that keep things from completely falling apart (unique username).

Business logic (domain logic) goes into the ActiveRecord models. This is
not a end-all for needing features that databases provide. It is simply
a way to keep all business logic in one layer.

I’m trying to resolve what my options are in developing an application
that is going to provide both a Rails-ish approach yet still keep the
data sane.
Hints?

Keep your business logic in your application, do not simply put it in
your database. However, to enforce certain factions of data integrity
for your particular problem domain you will need to use the correct
constraints (like unique indexes) to ensure that your data will be sane.

Validations in ActiveRecord are not a replacement for having proper
database design and constraints in place. They are merely a way to put
all business constraints explicitly in the application layer rather then
spreading them through the app and then the database.

Zach

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE8d+HMyx0fW1d8G0RAmBhAJkBoHpswGX2I/9FNd5GUgu0EqnkgwCdEpJU
BwJsJDf59AGJbpoEvH7MUF8=
=3LTY
-----END PGP SIGNATURE-----