Three ActiveRecord features we want at work - any comments/s

Hi

I thought I would run this past the Rails list before I start doing
any work because I don’t want to re-invent any wheels.

Basically the situation I’m in is that I work for a company that is
currently working on two big non-Rails projects used for financial
purposes (one sells vehicles and vehicle finance, the other is an
internal broker management system used by lenders and brokers).
After about two months of badgering we’re starting to use Rails for
little projects. Assuming these trials (there are at least 3 going
ahead) are successful we may use Rails on the bigger projects. Now,
both of our large projects have grown organically, and it shows in
the database as inconsistencies and redundancy. In future I want to
do things RIGHT. There as three significant ways I want to work in
future to limit the chance of problems:

  • Eliminate ALL nulls in the database. IE, every column in every
    table will be constrained NOT NULL.
  • Easily store temporal (historical) data, and not rely on ad-hoc
    schemes for recording previous information
  • Use class-table inheritance

What I’m interested in is how feasible the possible solutions to
these sound, and whether they could be implemented as acts_as plugins
or would have to patch ActiveRecord directly. I’ve found that
developers coming from C# or Java to Ruby and Rails get Rails-spoilt,
in that Rails does so much for them they expect it to do EVERYTHING.
So I want my solutions to be as straightforward and self-explanatory
as possible.

I’ll only look at eliminating NULLs here because it turns out
explaining even one of these things is a task in itself!!!

I’m sure my first wish will raise a few eyebrows :slight_smile: The sad truth is
that NULLs invalidate pretty much all of the relational database
theory that SQL is (supposedly) based on. I won’t dwell on this, if
you aren’t convinced read pretty much anything by Chris Dates, Fabian
Pascal or Hugh Darwen (eg http://www.comp.leeds.ac.uk/pd31/Notes/
Darwen.pdf)

All three authors have suggested different solutions to the problem,
but the one I like best is proposed in the PDF by Darwen, referenced
above. Basically instead of using a flag (ie NULL) in each table to
indicate a missing value, you use a separate table to record why the
information is not available.

For example consider the following table:
SurveyResult: { id, name, address, age, sex, survey_date,
feedback_comments, follow_up_comments }

assuming the following business rules:
A survey result must contain name, address, feedback_comments and
survey_date
A survey result must have name and address deleted two years after
survey_date (to comply with some data law)
A survey respondent may demand that any of name, address, age, sex
are deleted one year after survey_date (to comply with some other law)

and the following procedure:
Researchers attempt to collect follow_up_comments one month after
survey_date

Now here are “valid” SurveyResults for today’s date (2006-06-30):

{ 1, ‘Billy’, ‘1 My St’, 30, ‘M’, ‘2006-06-15’, ‘Not bad’,
NULL }
{ 2, ‘Bobby’, ‘2 Your Ave’, NULL, NULL, ‘2006-01-01’, ‘Very good’,
‘Okay’ }
{ 3, ‘Bert’, NULL, NULL, NULL, ‘2005-01-01’, ‘Excellent’,
NULL }
{ 4, NULL, NULL, NULL, NULL, ‘2004-01-01’, ‘Very good’,
‘Got worse’ }
{ 5, NULL, NULL, NULL, NULL, ‘2004-01-02’, ‘So so’,
‘Much improved’ }

In the context of the business rules, the first two record are
unambiguous, but the rest lead you to ask the following
(unanswerable) questions:

  • Has Bert requested his personal information deleted, or was it
    never recorded?
  • Why does Bert not have follow-up comments, as it is more than 1
    month since he was surveyed?
  • How many applicants have requested their age deleted?

However, in the absence of the business rules (which you don’t want
to code into every query you write!), given any particular record you
can’t say why a certain bit of information is missing.

The solution I like is to store the optional information in two
satellite tables per attribute, eg:

SurveyResults_Name: { survey_result_id, name }
SurveyResults_Name_Missing: { survey_result_id, reason }

reason could be a id or a string, I haven’t decided which one I
prefer yet. (Reason could be pretty much anything - ‘unknown’,
‘not_provided’, ‘deleted_data_protection_act’, ‘requested_deleted’,
‘pending’, ‘not applicable’ etc. Fabian Pascal thinks that ‘not
applicable’ actually means you have two distinct types of entity, but
that leads us on to CTI. I didn’t give an example of N/A here
because I’m not convinced it is fundamentally different when you are
dealing with ORM).

Obviously, there will be an entry in exactly one of these two tables
for each survey result.

Now on to the ActiveRecord bit (finally). What I want to do is merge
in the attributes so I can access them in the Main class like this:

sr = SurveyResult.find(:first)
puts sr.name
sr.name = ‘William’

and what effectively happens is:

sr = SurveyResult.find(:first)
puts sr.survey_results_name.name
sr.survey_results_name.name = ‘William’

Now I’ve sort of dreamed up syntax for this, along the lines of:

class SurveyResult < ActiveRecord::Base
#has_one :survey_result_name # get rid of this
acts_as_optional :name
end

sr = SurveyResult.find(:first)
sr.name = ‘William’

sr.name_status
=> :present

sr.name_missing :requested_deleted

sr.name_status
=> :requested_deleted

The *_status and *_missing methods I suppose you can create easy
enough in the acts_as_optional call or via an extension to
method_missing.

I’m not sure what to do in the case of a call to “sr.name” if the
value is actually missing. I couldn’t return a missing symbol
because that could be confused as a value. The best solution I can
think of is to raise an AttributeMissingException or some such.

Now I don’t know how much thought has been given to problems like
this. Maybe merging tables like this has already been solved
somewhere? I haven’t dug very far into the ActiveRecord code to know
if this could be implemented without hacking the core. (A plugin
would be a better solution initially.)

I won’t go into it now because this e-mail is already coma-inducingly
long, but a solution to my second problem (storing temporal data) can
be achieved by timestamping the satellite tables used to record
missing information. (I’ve only read about missing information and
temporal data separately, but I can’t see a reason why they can’t be
combined.)

And finally, class-table inheritance seems like an extension of the
general table-merging problem, but obviously more complex than
including just a single field. I know this has been discussed before
but it doesn’t look like much work has gone into it yet.

Feedback very much appreciated on this. Hopefully I will get enough
time at work to add one, two, or all three of these features into
Rails but I’d like some guidance first.

Cheers
Ashley