Confusion with expressing many to many relationship


#1

Hi folks, I am in the process of converting an existing non-rails
application to rails, and am not sure what the best approach would be
for specifying the relationship between the two sets of data.

I’ll describe the existing table structures first.

Table 1: Urls
Each url has a unique id and two lists of Phrases, these are currently
setup as varchars with “,” delimitation - so each url has many Phrases.

Table 2: Phrases
Each phrase has a unique id and two lists of urls, each url list is
setup as a varchar containing a “,” delimited list of urls ids - so each
phrase also has many urls.

In trying to convert this to rails and improve the table structure I
find myself confused with the way rails business model allows table
relationships to be specified.

I’d be very grateful if someone more experienced could explain how they
might define these tables and the relationsip they share.

Thanks,
Andy


#2

Andrew C. wrote:

Table 1: Urls
Each url has a unique id and two lists of Phrases, these are currently
setup as varchars with “,” delimitation - so each url has many Phrases.

Table 2: Phrases
Each phrase has a unique id and two lists of urls, each url list is
setup as a varchar containing a “,” delimited list of urls ids - so each
phrase also has many urls.

In trying to convert this to rails and improve the table structure I
find myself confused with the way rails business model allows table
relationships to be specified.

Wow, that sounds gross. Packing foreign keys into a varchar list of ids
means you lose all the power of the database to create relations between
records or do queries with joins. You get more leverage from using
something like a join table to let the database manage those relations
for you. There are two ways to do what you want, has_and_belong_to_many
(join table) or has_many :through (join model). If you go with habtm
you’ll need two join tables, one for each type of relationship. If you
use has_many :through, you can use one join model with an attribute that
indicates the type of the relationship. You can find a lot more
information about doing that on my blog.


Josh S.
http://blog.hasmanythrough.com


#3

Can you explain why there are two lists of phrases and two lists of
urls? Is that because they are split into types?


#4

Carl F. wrote:

Can you explain why there are two lists of phrases and two lists of
urls? Is that because they are split into types?

Yes, one list is for normal phrases and one is for relevant phrases.
Each url has a list of normal-related phrases and a list of relevant
phrases. Each phrase in turn has a list going backwards to the urls it
is normally-related to and the list of urls it is relevant to.

Its a very redundant schema, and years old, at the time it seemed like a
good way to go since it made lookups easy in either direction. But
moving to Rails it seems a horrible approach. :wink:

Thanks
-Andy


#5

The approach would have been very suitable for a flat file database.
Flat file databases have their place and can be useful, imdb was flat
file for a very long time, but you want to move it to Rails so you
need to put it into a more logical (normalized) schema now…

You basically need to add a third table to the schema that will link
the phrases and urls. You should make it a full blown model and take
advantage of the “has_many :through”, which Josh was talking about.
You want a full blown model because you need to apply a “type” field
that describes the relationship (normal phrase vs relevant phrase).
You could use HABTM but that is, in my opinion, a little more
confusing to implement.

Hope that helps!


#6

Thank you both for the suggestions and for the pointer to your blog!

Yep, my approach was and is still gross, but I must admit I am concerned
with the size that the join table will take, as currently the urls table
has about 7000 records and the phrases table has about 87000 records…

So, being a newbie with the has_many :through relationship I’d like to
make sure I understand before commiting to its implementation. A quick
rundown to make sure I am getting what I need to do, please tear this
apart if it becomes apparent that I missed the boat – used the example
from hasmanythrough.com.

Table 1: urls - has id and url
Table 2: phrases - has id and phrase
Table 3: relations - has id, url_id, phrase_id, isrelevant (boolean)

Ignore isrelevant for now as I can add class methods to weed out the
relationship during overloaded finds, am mostly concerned in the url <->
phrases relationship.

class Relation < ActiveRecord::Base
belongs_to :url, :foreign_key => “url_id”, :class_name => “Url”
belongs_to :phrase, :foreign_key => “phrase_id”, :class_name =>
“Phrase”
end

class Url < ActiveRecord::Base
has_many :related_as_phrase, :foreign_key => ‘phrase_id’,
:class_name => ‘Relation’
has_many :phrases, :through => :related_as_phrase
end

class Phrase < ActiveRecord::Base
has_many :related_as_url, :foreign_key => ‘url_id’,
:class_name => ‘Relation’
has_many :urls, :through => :related_as_urls
end

Do the above model rules look right for this? Anything I should be aware
of?

Thanks again, am much appeciated that this was pointed out to me!

-Andy

Wow, that sounds gross. Packing foreign keys into a varchar list of ids
means you lose all the power of the database to create relations between
records or do queries with joins. You get more leverage from using
something like a join table to let the database manage those relations
for you. There are two ways to do what you want, has_and_belong_to_many
(join table) or has_many :through (join model). If you go with habtm
you’ll need two join tables, one for each type of relationship. If you
use has_many :through, you can use one join model with an attribute that
indicates the type of the relationship. You can find a lot more
information about doing that on my blog.


Josh S.
http://blog.hasmanythrough.com


#7

Ahhhh! I like your suggested model changes much better, the node/edges
example on your site put some doubt in my mind that defining the models
would be as natural as the revised models end up being.

Thank you again for taking the time to help clear this up, it is much
appreciated! :slight_smile:

-Andy

Josh S. wrote:

Andrew C. wrote:

Thank you both for the suggestions and for the pointer to your blog!

No prob. That’s what it’s there for.

Yep, my approach was and is still gross, but I must admit I am concerned
with the size that the join table will take, as currently the urls table
has about 7000 records and the phrases table has about 87000 records…

Don’t worry about the size of the join table. Databases are built just
for doing this sort of thing, and unless your table is many millions of
rows long you don’t need to be concerned.

Table 1: urls - has id and url
Table 2: phrases - has id and phrase
Table 3: relations - has id, url_id, phrase_id, isrelevant (boolean)

Ignore isrelevant for now as I can add class methods to weed out the
relationship during overloaded finds, am mostly concerned in the url <->
phrases relationship.

Your tables look fine, but you go a bit overboard in the model classes.
You’ll have much more fun in Rails if you go with the flow, meaning just
follow the conventions and let Rails use the defaults unless you
actually need to specify something different. It looks like you got
confused looking at the polymorphic examples, which is serious overkill
for what you want. Just go with the standard has_many :through, like so:

class Relation < ActiveRecord::Base
belongs_to :url
belongs_to :phrase
end

class Url < ActiveRecord::Base
has_many :relations
has_many :phrases, :through => :relations
end

class Phrase < ActiveRecord::Base
has_many :relations
has_many :urls, :through => :relations
end

One more thing - the model name “relation” is probably too generic.
Having a bad name isn’t going to break anything, but having a meaningful
name can help you think more clearly about your objects. “Magazine
has_many :readers, :through => :subscritions” is a lot more meaningful
than “Magazine has_many :people, :through => :relations”. It’s almost
possible for Ruby code to be self-documenting if you give things good
names. Just something to think about.


Josh S.
http://blog.hasmanythrough.com


#8

Josh, I didn’t get to play with this until this morning as my server was
getting a hardware upgrade last night – so far this is working out
great as far as being able to access url.phrases and phrase.urls …

I fully understand your meaning with renaming the Relation model, but
after much thought I really cannot find anything more approriate so for
the time being am leaving that.

One thing I am wondering is, how am I able to access the specific
relation that a url and phrase share?

Example:

given a list of single url @url
@url.phrases.each do |ph|

How can I access the relation that @url and ph share through the

Relation model here ?

end

I tried ph.relation on a whim and of course found it a non-existent
member.

I’ll keep poking at it but so far am not sure if I need to modify the
model or if something else is needed…

Thanks!
Andy

class Relation < ActiveRecord::Base
belongs_to :url
belongs_to :phrase
end

class Url < ActiveRecord::Base
has_many :relations
has_many :phrases, :through => :relations
end

class Phrase < ActiveRecord::Base
has_many :relations
has_many :urls, :through => :relations
end


#9

Andrew C. wrote:

Thank you both for the suggestions and for the pointer to your blog!

No prob. That’s what it’s there for.

Yep, my approach was and is still gross, but I must admit I am concerned
with the size that the join table will take, as currently the urls table
has about 7000 records and the phrases table has about 87000 records…

Don’t worry about the size of the join table. Databases are built just
for doing this sort of thing, and unless your table is many millions of
rows long you don’t need to be concerned.

Table 1: urls - has id and url
Table 2: phrases - has id and phrase
Table 3: relations - has id, url_id, phrase_id, isrelevant (boolean)

Ignore isrelevant for now as I can add class methods to weed out the
relationship during overloaded finds, am mostly concerned in the url <->
phrases relationship.

Your tables look fine, but you go a bit overboard in the model classes.
You’ll have much more fun in Rails if you go with the flow, meaning just
follow the conventions and let Rails use the defaults unless you
actually need to specify something different. It looks like you got
confused looking at the polymorphic examples, which is serious overkill
for what you want. Just go with the standard has_many :through, like so:

class Relation < ActiveRecord::Base
belongs_to :url
belongs_to :phrase
end

class Url < ActiveRecord::Base
has_many :relations
has_many :phrases, :through => :relations
end

class Phrase < ActiveRecord::Base
has_many :relations
has_many :urls, :through => :relations
end

One more thing - the model name “relation” is probably too generic.
Having a bad name isn’t going to break anything, but having a meaningful
name can help you think more clearly about your objects. “Magazine
has_many :readers, :through => :subscritions” is a lot more meaningful
than “Magazine has_many :people, :through => :relations”. It’s almost
possible for Ruby code to be self-documenting if you give things good
names. Just something to think about.


Josh S.
http://blog.hasmanythrough.com


#10

Actually, now that I have gotten to play with this some more I see that
there is a very real performance hit, so much that this won’t be usable
as-is. Not sure if this is the way has_many :through should work but
I’ll desribe what I am seeing.

Some prelim info:

The box is only about a year or so old, has dual xeon 3.06 HT
processors, 4G ram and dual raid-1 10kMhz scsi drives - it has handled
everything else I have thrown at it without a whimper.

Table 1: urls - has roughly 7,000 records
Table 2: phrases - has roughly 87,000 records
Table 3: relations - has 235,000 or so records, each one having 1 url
and 1 phrase from their respective DBs

Given a single url object, Merely doing a simple:

@phrases = url.phrases

I am watching 87,000 SQL queries fly by in my lighttpd output, similar
to this:

Relation Load (0.049132) SELECT * FROM relations WHERE (url_id=7163
AND phrase_id=77686) LIMIT 1

one for each phrase_id covering every phrase_id in the phrases table.

Ok, that in itself is an application killer because the above operation
would be one of many for a given page display, and just that is showing
noticable lag before the page is rendered.

So, my hunch tells me something is not right? Are these tables just too
big to be using this technique or is something else going on?

-Andy

Andrew C. wrote:

Josh, I didn’t get to play with this until this morning as my server was
getting a hardware upgrade last night – so far this is working out
great as far as being able to access url.phrases and phrase.urls …

I fully understand your meaning with renaming the Relation model, but
after much thought I really cannot find anything more approriate so for
the time being am leaving that.

One thing I am wondering is, how am I able to access the specific
relation that a url and phrase share?

Example:

given a list of single url @url
@url.phrases.each do |ph|

How can I access the relation that @url and ph share through the

Relation model here ?

end

I tried ph.relation on a whim and of course found it a non-existent
member.

I’ll keep poking at it but so far am not sure if I need to modify the
model or if something else is needed…

Thanks!
Andy