Models without primary key; n-ary associations

Today for the second time in a short while I’ve come upon a case where
I’d like to have a “model” without a primary key. Instances of such a
model wouldn’t have an identity above the unique combination of (part
of) their attribute values.

The cases I’ve come up with involve expressing facts about two or more
related objects. They can be thought of as n-ary associations, in a
sense, generalizations of habtm.

Here’s an example: There are multiple compilers, each of which
potentially runs on many host systems and can compile code for many
target systems (depending on host system!).

I’d like to express facts and queries like these

  • compiler c running on host OS h can compile for target OS t
  • find all compilers that run on host OS h and compile for target OS t
  • find all compilers that match some other condition and run on host OS
    h

Much of this can be kludged with :through associations, but it doesn’t
feel right to me as these involve full-fledged association models with
an identity independent of their attribute values.

I’m very much interested in suggestions.

Michael


Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/

Michael S. wrote:

I’m very much interested in suggestions.

Michael


With no primary key, it’s not really an ActiveRecord::Base as all the
associations will simply break anyways,

It is suggested to use the habtm if the link table only contains the 2
foreign keys, otherwise, you should make it a model with the primary key
that rails expects.

If you just want to manage the association between N to N and this
association doesn’t go in the DB then you should just write the code
yourself as all the code writers that come with rails won’t help you one
bit

ilan

Michael S. wrote:

Today for the second time in a short while I’ve come upon a case where
I’d like to have a “model” without a primary key. Instances of such a
model wouldn’t have an identity above the unique combination of (part
of) their attribute values.

Isn’t that what all records/models are? It’s not the id that’s
important, but all the attribute goodness present in the fields…
99.99999% of the time (accounting systems excepted), I don’t care
whether an id value is 1, 42, or a bazillion except as a fast way to get
at all the rest of the data for that record.

I just think of those “n-way primary attribute” cases as instances where
the id field is just a short-hand method of specifying the unique
combination of attributes however many there might be.

On Tuesday 29 April 2008, Ilan B. wrote:

It is suggested to use the habtm if the link table only contains the
2 foreign keys, otherwise, you should make it a model with the
primary key that rails expects.

Yes, I know that this is how Rails works currently, but things being
flexible, a plugin could add new functionality, provided that it is
useful.

Michael


Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/

I’m having trouble seeing your point. Databases consist of tables that
contain rows, which in turn contain columns. Why do you even care if
each row contains a simple incrementing primary key. In fact I’ve used
databases (OpenBase and Oracle for example) that you cannot create a
database table that does not have some form of row id. In OpenBase
that would be _rowid. In fact every OpenBase table has three default
(and required) columns (_rowid, _timestamp and _version).

The only valid reason I see for supporting tables that have no primary
key is to support legacy databases, where the designer did not provide
one. And for whatever reason you are not able, or allowed, to add one.

By the way Rails is not the only ORM with this type of limitation. I
previously worked with the ORM system in WebObjects, which has a
similar requirement for some form of primary key. ORM systems must
have a way to identify rows uniquely in a database. That is how they
map objects to table rows. How would you map a single object to two or
more rows that happen to contain the exact same data. That’s just bad
design to begin with.

As I said I’ve run into situation in WebObjects where I had legacy
tables with no PK. The only option I had there was to mark every
column as part of the PK. That’s just silly, but in an ORM system
that’s really your only option. I found that when the table contained
duplicate rows WebObjects ORM would give multiple objects with
duplicate information in their attributes, however, they were both
read from the same physical row. So if I were to make changes to both
objects and save, only one of the rows would receive the updates.
Basically the ORM would “randomly” choose just one of the duplicate
rows and give you multiple objects, but they all referenced the same
physical row in the database.

This is a difficult problem to solve in ORM systems. SQL and object
mapping is not exactly compatible. The requirement for a primary key
solves one of the incompatibilities. Namely the issue of identity. All
objects in OOP have an identity, but that is not necessary the case
for SQL rows.

On Friday 27 June 2008, Robert W. wrote:

I’m having trouble seeing your point. Databases consist of tables
that contain rows, which in turn contain columns. Why do you even
care if each row contains a simple incrementing primary key. In fact
I’ve used databases (OpenBase and Oracle for example) that you cannot
create a database table that does not have some form of row id. In
OpenBase that would be _rowid. In fact every OpenBase table has three
default (and required) columns (_rowid, _timestamp and _version).

Forget about no PK. I meant no synthetic key, but of course the table
has a primary key, however, a composite one. I wouldn’t mind a
synthetic key that is managed and used by the DBMS only, either. The
problem is that ActiveRecord holds onto that numeric ID, even though
the semantic identity of the row/object depends on the other
attributes.

Michael


Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/

My understanding of your problem:

The attributes should be able to act as a composite PK. Their
combination will always be unique, and yet you’ve got this synthetic
key which would seem redundant.

But that’s really just aesthetics. It’ll still work with a synthetic
PK. Except…

One implementation problem that I can see this: using a synthetic key,
it would be possible to have duplicate rows in the database, differing
only by the synthetic PK. In which case you’d have to have some model
validations to check that the new record is unique - which would be
quite messy.

Are there any other implementation problems? Or is it just aesthetics
issues? (what I mean by aesthetics is it makes you feel gooie inside
when your code reflects your model cleanly)

I think to do it you’d have to override your model’s “id” method, and
have it return an integer which is derived from the other attributes
and guaranteed to be unique. Like a hash of all the attributes. But it
would be very hard to guarantee that the hash could not reproduced by
another combination of attributes.

Hope this makes sense. I’m tired. I’ll probably read over this post
later and be shamed by the nonsense I’ve just polluted the thread
with.

Joel.

There is a dead simple fix to the problem. Add a unique index across
your composite key and let Rails have it’s synthetic key. Indexes
don’t have to be PKs to be unique.

On Saturday 28 June 2008, LemmingJoel wrote:

One implementation problem that I can see this: using a synthetic
key, it would be possible to have duplicate rows in the database,
differing only by the synthetic PK. In which case you’d have to have
some model validations to check that the new record is unique - which
would be quite messy.

That’s exactly the problem. There is even a composite PK plugin/gem (by
Dr. Nic), but I shy away from it, as it hacks a bit too deeply into the
innards of ActiveRecord to be entirely wholesome.

Why would such a problem where id-identity and attribute-identity
diverge ever occur? In my case I have a ternary relationship that can
be edited in a view where I can’t easily keep track of which/whether
ids already belong to a relationship instance. When assigning the
changed association, I’d like to do this as

associated_objects = build_assoc(association_params)
my_model.the_assoc.replace(associated_objects)

Association#replace out of the box takes care of only actually removing
and adding changes, therefore foreign key constraints are not touched
for unchanged association instances. It determines changes based on the
object id, which does not work in my case. To circumvent this, I’ve
overwritten #== in the association class. This works, but I’m not 100%
sure that it won’t break anything in AR’s inner workings in some arcane
case.

Michael


Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/

But if you’ve got a unique key constraint on the natural key, then there
can’t be > 1 record in your table w/the same values of your natural key
columns. In which case the synthetic key is identity enough, isn’t it?

On Monday 30 June 2008, Robert W. wrote:

There is a dead simple fix to the problem. Add a unique index across
your composite key and let Rails have it’s synthetic key. Indexes
don’t have to be PKs to be unique.

I already have such an index in place. As I explained, my problem was
that ActiveRecord identifies records by their synthetic key. By that
measure, two records with the same attributes, i.e. the same natural
key, are not considered the same when their synthetic keys diverge.
This can cause trouble if you’re updating (#replace) which objects are
associated through a particular association. The solution, in my case,
was to override #== in the model class of the associated objects.

Michael


Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/

Just write your own association methods. That’s how I get around these
things. Leave the id column around and just ignore it, and don’t
associate
it :

class Foo < ActiveRecord::Base
def bars
@bars ||= Bar.find_all_by_some_other_thing(“foo”)
end
end

On Tuesday 01 July 2008, Pardee, Roy wrote:

But if you’ve got a unique key constraint on the natural key, then
there can’t be > 1 record in your table w/the same values of your
natural key columns. In which case the synthetic key is identity
enough, isn’t it?

What already is in the table is not the problem at all. The problem
occurs when I try to update the table via an association. In that
case, when a new record, i.e. one without an id, has the same attribute
values as an already existing record, the existing record should not be
touched.

This is relevant when editing all associated in a single form. Consider
my original example again.

  • compiler c running on host OS h can compile for target OS t
  • find all compilers that run on host OS h and compile for target OS t
  • find all compilers that match some other condition and run on host OS
    h

I have a single form where for a given compiler all host-target mappings
can be edited. Compiler has_many HostTargetMappings; when this
association is updated, it is not possible to simply delete all
associated objects and insert new ones. This approach is bound to
vioate foreign key constraints imposed by other objects associated to
the relevant HostTargetMappings.

AR in principle does the right thing, on host_target_mappings#replace it
inserts new objects, deletes the ones no longer pertinent, and should
leave the others untouched. However, this latter part depends on the
identity of an object being tied to its synthetic id, because #replace
does the comparison using #==, which in turn compares AR objects based
on their id.

Now, imagine if you will, that the user creates a mapping duplicating
the attributes of an already present one. Such a duplicate ought simply
to be ignored, but when comparison is based on id, this object is
unlike any existing one, therefore AR tries to insert it, but fails due
to the foreign key violation.

As I’ve explained at least three times already, the way around this is
to override HostTargetMapping#== so that it is based on attribute
values not id. This works perfectly in my case, but I can’t guarantee
that it won’t blow up because some part of AR might depend on #==
depending on the id.

Michael


Michael S.
mailto:[email protected]
http://www.schuerig.de/michael/