Why there is no automatic relationship discovery


#1

When I explain RoR to someone I don’t have a convincing argument that
explains why everything is so automatic and transparent, but
relationships have to be encoded by hand in models. That shows I
don’t understand well that part of AR.

There’s the argument that says that not all databases offer metadata
about foreign keys, but the natural question then is that, if the
reason is that one, why it does not automate this when your database
do offer that metadata. I am sure the real reason is more subtle than
that.

Can anyone elaborate a once and for all explanation for this?

– fxn


#2

On 09 Feb 2006, at 10:35, Xavier N. wrote:

Can anyone elaborate a once and for all explanation for this?

Because it is model logic and should thus belong in your model. Your
database is just a means of storing and retrieving your data. I do
believe database constraints and relations can provide an extra means
of ensuring data integrity, but the fact remains that your
relationships should be defined in your model. Some people think of
the database as their model and put their model logic in the database
while in fact it’s not.

This might not be the best analogy, but it’s all I can come up with
at the moment: it’s not because when I buy a parachute and the box
says: “Ready for you to jump”, I’m just going to hop on a plane and
jump out without checking myself.

Best regards

Peter De Berdt


#3

On Thu, Feb 09, 2006 at 10:35:25AM +0100, Xavier N. wrote:

Can anyone elaborate a once and for all explanation for this?

I wondered the same thing when I first started.

I can think of a few reasons why this would not be automatically
generated. For one, there’s a performance hit in autoloading related
objects if you don’t need them. I think that the objects aren’t
actually loaded until they’re needed, but I’m not sure about that.

Unlike the standard crud operations, there’s a lot that can go
wrong when you try to infer semantic relationships from structure.

Although, yeah, I think it would be nice to have an equivalent for the
scaffold for this, so it can take its best guess to get you up and
running quickly, and then you can fix the parts that aren’t right.

I think the answer you’re looking for is that the relationship
management features in AR are still evolving. That means there may not
be a “once and for all explanation”.


- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.everylastounce.com ]

[ http://www.aquick.org/blog ] … Blog
[ http://www.adamfields.com/resume.html ]… Experience
[ http://www.flickr.com/photos/fields ] … Photos
[ http://www.aquicki.com/wiki ]…Wiki
[ http://del.icio.us/fields ] … Links


#4

On Feb 9, 2006, at 14:37, Peter De Berdt wrote:

when your database do offer that metadata. I am sure the real
the database while in fact it’s not.
I see, but in my view by the same reasoning you shouldn’t build
classes based on table metadata. And we do.

I believe the problem is technical, like not being able to guess the
correct relationships in some cases or something like that.

– fxn


#5

On Feb 9, 2006, at 8:27 AM, Xavier N. wrote:

metadata about foreign keys, but the natural question then is
your relationships should be defined in your model. Some people
think of the database as their model and put their model logic in
the database while in fact it’s not.

I see, but in my view by the same reasoning you shouldn’t build
classes based on table metadata. And we do.

I believe the problem is technical, like not being able to guess
the correct relationships in some cases or something like that.

This is exactly the reason. It’s impossible to differentiate a one-to-
many join that currently only has one corresponding row in the ‘many’
table from a one-to-one join merely on table and database metadata.
You can know a join existing based on foreign key constraints, but
you can’t know the type of join. Which is why we specify it.


Jason P.
removed_email_address@domain.invalid

“The computer allows you to make mistakes
faster than any other invention, with the
possible exception of handguns and tequila.”


#6

On Thu, Feb 09, 2006 at 08:31:36AM -0600, Jason P. wrote:
[…]

I believe the problem is technical, like not being able to guess
the correct relationships in some cases or something like that.

This is exactly the reason. It’s impossible to differentiate a one-to-
many join that currently only has one corresponding row in the ‘many’
table from a one-to-one join merely on table and database metadata.
You can know a join existing based on foreign key constraints, but
you can’t know the type of join. Which is why we specify it.

Although I don’t think rails enforces this, 1:1 joins should have the
foreign key reference field as part of the primary key.

It is possible to tell the join types apart:

1:n --> standard foreign key
1:1 --> foreign key is in primary key (or is only primary key)
n:n --> mapping table

It’s a bit more difficult to discern the “acts_as” extensions to the
basic relationship types, but even that should be obvious in some
cases (like if you have a self-referential key).

Still, the scaffolding isn’t perfect, so I wouldn’t see any reason to
expect automatic relationship inference to be. It should be able to be
good enough.


- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.everylastounce.com ]

[ http://www.aquick.org/blog ] … Blog
[ http://www.adamfields.com/resume.html ]… Experience
[ http://www.flickr.com/photos/fields ] … Photos
[ http://www.aquicki.com/wiki ]…Wiki
[ http://del.icio.us/fields ] … Links


#7

On 2/9/06, Jason P. removed_email_address@domain.invalid wrote:

On Feb 9, 2006, at 8:27 AM, Xavier N. wrote:

On Feb 9, 2006, at 14:37, Peter De Berdt wrote:

On 09 Feb 2006, at 10:35, Xavier N. wrote:

Type of join? Why not just punt and say if you have a foreign key
defined on you, you belong to the other table, if you are referenced
by the other table, you have many. You will be right 99% of the time,
and if you’re not, you just need to s/many/one/ . This would be a
boon to people who are migrating to rails. I can still feel my
tendons from the time I spent typing all my relationships into models.
I would have loved to do

ruby script/generate model-guess-relationships-please modelname
tablename

especially since I also had to set_table_name and set_primary_key and
:foreign_key => foo for all of them.

  • Ian

#8

On Feb 9, 2006, at 08:42 AM, Adam F. wrote:

you can’t know the type of join. Which is why we specify it.

Although I don’t think rails enforces this, 1:1 joins should have the
foreign key reference field as part of the primary key.

That’s clever. The downside to that is that it makes the database
structure difficult to modify with that information now encoded in
the primary key of a table and I imagine that’s why the normal method
that I’ve seen to do a 1:1 is a foreign key in the second table -
with a unique constraint if your database supports them. If not, no
unique constraint and you encode that information in your application
framework.


Jason P.
removed_email_address@domain.invalid

“As democracy is perfected, the office of president represents,
more and more closely, the inner soul of the people. On some
great and glorious day the plain folks of the land will reach
their heart’s desire at last and the White House will be adorned
by a downright moron.” - H.L. Mencken (1880 - 1956)


#9

On Feb 9, 2006, at 6:42 AM, Adam F. wrote:

you can’t know the type of join. Which is why we specify it.

Although I don’t think rails enforces this, 1:1 joins should have the
foreign key reference field as part of the primary key.

It is possible to tell the join types apart:

1:n --> standard foreign key
1:1 --> foreign key is in primary key (or is only primary key)
n:n --> mapping table

1:1 could also simply be unique constraint (unique index) on the
foreign key field.


– Tom M.


#10

On Thu, Feb 09, 2006 at 08:34:59AM -0800, Tom M. wrote:

It is possible to tell the join types apart:

1:n --> standard foreign key
1:1 --> foreign key is in primary key (or is only primary key)
n:n --> mapping table

1:1 could also simply be unique constraint (unique index) on the
foreign key field.

For 1:1 mappings, I’ve always used the primary key field, because I
think of this as an “is_a” relationship rather than a “has_a”. In that
case, it makes sense to me to share the primary key between the two
tables. I use this mostly for polymorphic associations, and also
sometimes for things like nonflexible property sheets where it doesn’t
make sense to store the additional fields with the main table.

I suppose you might want to do it differently if your target table has
multiple 1:1 relationships with other tables, but I’ve never seen that
come up.

Just curious - can you give me an example of where you’d use this as a
non-primary key preferentially?


- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.everylastounce.com ]

[ http://www.aquick.org/blog ] … Blog
[ http://www.adamfields.com/resume.html ]… Experience
[ http://www.flickr.com/photos/fields ] … Photos
[ http://www.aquicki.com/wiki ]…Wiki
[ http://del.icio.us/fields ] … Links


#11

On Feb 9, 2006, at 10:57 AM, Adam F. wrote:

Just curious - can you give me an example of where you’d use this as a
non-primary key preferentially?

User has a login.


Jason P.
removed_email_address@domain.invalid

“The computer allows you to make mistakes
faster than any other invention, with the
possible exception of handguns and tequila.”


#12

On Feb 9, 2006, at 8:57 AM, Adam F. wrote:

For 1:1 mappings, I’ve always used the primary key field, because I
Just curious - can you give me an example of where you’d use this as a
non-primary key preferentially?

I don’t see they’re different in any way, or how one is an advantage
over
the other, except perhaps for DB optimization.

Oracle, at one point, couldn’t use multi-key indexes unless the query
you
wrote only accessed leftmost members of multicolumn indexes. In that
case,
two separate indexes are more flexible than one. I’m not sure that this
limitation exists anymore…and I have no idea whatsoever whether
this is
an issue in PostgreSQL or (shudder) MySQL.

In fact, if your database has the same limitation, and you make a multi
column primary key of (id,foreign_id), joins using that foreign_id are
going to be very costly.


– Tom M.


#13

On Thu, Feb 09, 2006 at 09:21:07AM -0800, Tom M. wrote:
[…]

Your method of adding the foreign key into the primary key and my method
of placing a unique constraint on the foreign key (making it an
alternate
key) have entirely different semantics, and my method is more compatible
with the Rails definition of has_one than yours.

You’ve missed what I was saying. In most cases, the foreign key IS the
primary key, not added to the primary key.

There’s no (id, foreign_id), there’s just id, which is itself a
foreign key.

In Jason’s example of “User has a login”, I’d probably do it this way,
so that user.id = login.id for any given user/login.

The only drawback to that, mentioned previously, is that then login
can only be associated with user in that way. If you want to associate
login with something else (also in a 1:1 relationship), you need to do
it differently. But as I said - I don’t think I’ve seen a case in
practice where a subsidiary table is bound to more than one other
table with a 1:1 relationship.

  1. id is supposed to be unique in a table, period.
  2. foreign table has_two entries in table. :slight_smile:

Right. Using the primary key as a foreign key gets you that.

With my method, you’d get this:

insert into table (id,foreign_id) values (1,1)
insert into table (id,foreign_id) values (1,2) <- errors out (id is
not unique)
insert into table (id,foreign_id) values (2,1)
insert into table (id,foreign_id) values (2,2) <- errors out
(foreign_id is not unique)


- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.everylastounce.com ]

[ http://www.aquick.org/blog ] … Blog
[ http://www.adamfields.com/resume.html ]… Experience
[ http://www.flickr.com/photos/fields ] … Photos
[ http://www.aquicki.com/wiki ]…Wiki
[ http://del.icio.us/fields ] … Links


#14

On Feb 9, 2006, at 9:56 AM, Adam F. wrote:

You’ve missed what I was saying. In most cases, the foreign key IS the
primary key, not added to the primary key.

Ah, got it, sorry for that!

Well, my gut is still the same, though for less technical reasons:

You’re method puts you off the rails. The foreign key is misnamed
with respect to Rails’ naming conventions.


– Tom M.


#15

On Feb 9, 2006, at 9:10 AM, Tom M. wrote:

wrote only accessed leftmost members of multicolumn indexes. In
going to be very costly.
Sorry to reply to my own post, but I just realized another HUGE
difference!

Your method of adding the foreign key into the primary key and my method
of placing a unique constraint on the foreign key (making it an
alternate
key) have entirely different semantics, and my method is more compatible
with the Rails definition of has_one than yours.

With your method, this would be allowed:

insert into table (id,foreign_id) values (1,1)
insert into table (id,foreign_id) values (1,2)
insert into table (id,foreign_id) values (2,1)
insert into table (id,foreign_id) values (2,2)

Which means that your method is doubly incompatible with Rails’ has_one
semantics:

  1. id is supposed to be unique in a table, period.
  2. foreign table has_two entries in table. :slight_smile:

With my method, you’d get this:

insert into table (id,foreign_id) values (1,1)
insert into table (id,foreign_id) values (1,2) <- errors out (id is
not unique)
insert into table (id,foreign_id) values (2,1)
insert into table (id,foreign_id) values (2,2) <- errors out
(foreign_id is not unique)


– Tom M.


#16

On Thu, Feb 09, 2006 at 10:04:28AM -0800, Tom M. wrote:

You’ve missed what I was saying. In most cases, the foreign key IS the
primary key, not added to the primary key.

Ah, got it, sorry for that!

Well, my gut is still the same, though for less technical reasons:

You’re method puts you off the rails. The foreign key is misnamed
with respect to Rails’ naming conventions.

Nope. Still called “id”.

There’s nothing that says a foreign key has to be called anything in
particular.

Yes, when you set up the relations in rails, you need to override the
default names, but that’s expected behavior. There’s nothing that says
that you have to stick to the default naming conventions, and there
are plenty of exceptions. Doing so doesn’t “put you off the rails”.


- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.everylastounce.com ]

[ http://www.aquick.org/blog ] … Blog
[ http://www.adamfields.com/resume.html ]… Experience
[ http://www.flickr.com/photos/fields ] … Photos
[ http://www.aquicki.com/wiki ]…Wiki
[ http://del.icio.us/fields ] … Links


#17

On Feb 9, 2006, at 10:04 AM, Tom M. wrote:

You’re method puts you off the rails.

Must. Type. More. Slowly…

Your method puts you off the rails.


– Tom M.


#18

On Feb 9, 2006, at 11:56 AM, Adam F. wrote:

You’ve missed what I was saying. In most cases, the foreign key IS the
login with something else (also in a 1:1 relationship), you need to do
it differently. But as I said - I don’t think I’ve seen a case in
practice where a subsidiary table is bound to more than one other
table with a 1:1 relationship.

My concern is what happens when the business rule that one user is
only ever associated with one login becomes one user can have more
than one login.

  1. modifying has_one to has_many is the trivial solution;
  2. modifying a unique constraint on a column is nearly as trivial;
  3. the foreign key as a primary key is going to be the most difficult
    to content with - especially on a production database. I’ll readily
    admit that I have no idea if migrations would help you with this or not.


Jason P.
removed_email_address@domain.invalid

“The computer allows you to make mistakes
faster than any other invention, with the
possible exception of handguns and tequila.”


#19

On Thu, Feb 09, 2006 at 12:15:04PM -0600, Jason P. wrote:

My concern is what happens when the business rule that one user is
only ever associated with one login becomes one user can have more
than one login.

  1. modifying has_one to has_many is the trivial solution;
  2. modifying a unique constraint on a column is nearly as trivial;
  3. the foreign key as a primary key is going to be the most difficult
    to content with - especially on a production database. I’ll readily
    admit that I have no idea if migrations would help you with this or not.

To be honest, I stay away from 1:1 relationships most of the time
anyway, for exactly this reason. If there’s any chance that it will
become a 1:n relationship at some point in the future, you’re better
off modeling it that way up front. True 1:1 relationships are for
something very specific that doesn’t come up very often.

Other than that, this isn’t substantially more difficult than any
other schema change.


- Adam

** Expert Technical Project and Business Management
**** System Performance Analysis and Architecture
****** [ http://www.everylastounce.com ]

[ http://www.aquick.org/blog ] … Blog
[ http://www.adamfields.com/resume.html ]… Experience
[ http://www.flickr.com/photos/fields ] … Photos
[ http://www.aquicki.com/wiki ]…Wiki
[ http://del.icio.us/fields ] … Links


#20

Hi …

You wrote:

I believe the problem is technical, like not being able to guess the
correct relationships in some cases or something like that.

My take is that it is hard to do correctly, and so it is better not to
do it at all and leave it as an exercise for the reader :wink:

Consider how the process of data modeling works: we start with a
conceptual model, then we move to a physical model, then we go to an
implementation. As we move through the chain, our thinking turns more
to practical matters (performance, security, DB specific matters (the
use of stored procedures, etc.)). In the end our implementation has a
lot more stuff than the conceptual model.

So, what we are asking AR, or any other ORM, to do is unwind that
process. It is hard. If the mapping through the model chain remained
consistent, and we could assume a standard SQL, say SQL-92, and we
assumed that the designer worked to 3rd normal form, then we may have
some hope of being accurate.

If they didn’t, then the model generated by an ORM will require work.
Perhaps it could get us 80% of the way, maybe not. It would really
depend on how clever the reverse engineering was.

One way that I have found that can help is to use a good DB modeling
tool, such as Data Architect from Sybase, to reverse engineer databases.
Do that a few times and you get a grasp on why it is so tricky.

My $0.02, the authoritative will have to come from someone else :slight_smile:

Regards,

-mark.