Forum: Ruby on Rails Confusion with expressing many to many relationship

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Af1ee2208795752559b049502ded660c?d=identicon&s=25 Andrew Cowan (k2mia)
on 2006-04-24 19:24
Hi folks, I am in the process of converting an existing non-rails
application to rails, and am not sure what the best approach would be
for specifying the relationship between the two sets of data.

I'll describe the existing table structures first.

Table 1: Urls
Each url has a unique id and two lists of Phrases, these are currently
setup as varchars with "," delimitation - so each url has many Phrases.

Table 2: Phrases
Each phrase has a unique id and two lists of urls, each url list is
setup as a varchar containing a "," delimited list of urls ids - so each
phrase also has many urls.

In trying to convert this to rails and improve the table structure I
find myself confused with the way rails business model allows table
relationships to be specified.

I'd be very grateful if someone more experienced could explain how they
might define these tables and the relationsip they share.

Thanks,
Andy
9f0f89bbd9e1ecfbaab6584e429b7a2f?d=identicon&s=25 Josh Susser (jsusser)
on 2006-04-24 21:49
Andrew Cowan wrote:
> Table 1: Urls
> Each url has a unique id and two lists of Phrases, these are currently
> setup as varchars with "," delimitation - so each url has many Phrases.
>
> Table 2: Phrases
> Each phrase has a unique id and two lists of urls, each url list is
> setup as a varchar containing a "," delimited list of urls ids - so each
> phrase also has many urls.
>
> In trying to convert this to rails and improve the table structure I
> find myself confused with the way rails business model allows table
> relationships to be specified.

Wow, that sounds gross. Packing foreign keys into a varchar list of ids
means you lose all the power of the database to create relations between
records or do queries with joins. You get more leverage from using
something like a join table to let the database manage those relations
for you. There are two ways to do what you want, has_and_belong_to_many
(join table) or has_many :through (join model). If you go with habtm
you'll need two join tables, one for each type of relationship. If you
use has_many :through, you can use one join model with an attribute that
indicates the type of the relationship. You can find a lot more
information about doing that on my blog.

--
Josh Susser
http://blog.hasmanythrough.com
Dcc2d3a8038cd47219af0eebe0a8a78e?d=identicon&s=25 Carl Fyffe (Guest)
on 2006-04-24 22:11
(Received via mailing list)
Can you explain why there are two lists of phrases and two lists of
urls? Is that because they are split into types?
Af1ee2208795752559b049502ded660c?d=identicon&s=25 Andrew Cowan (k2mia)
on 2006-04-24 23:01
Carl Fyffe wrote:
> Can you explain why there are two lists of phrases and two lists of
> urls? Is that because they are split into types?

Yes, one list is for _normal_ phrases and one is for relevant phrases.
Each url has a list of normal-related phrases and a list of relevant
phrases. Each phrase in turn has a list going backwards to the urls it
is normally-related to and the list of urls it is relevant to.

Its a very redundant schema, and years old, at the time it seemed like a
good way to go since it made lookups easy in either direction. But
moving to Rails it seems a horrible approach. ;)

Thanks
-Andy
Dcc2d3a8038cd47219af0eebe0a8a78e?d=identicon&s=25 Carl Fyffe (Guest)
on 2006-04-24 23:53
(Received via mailing list)
The approach would have been very suitable for a flat file database.
Flat file databases have their place and can be useful, imdb was flat
file for a very long time, but you want to move it to Rails so you
need to put it into a more logical (normalized) schema now...

You basically need to add a third table to the schema that will link
the phrases and urls. You should make it a full blown model and take
advantage of the "has_many :through", which Josh was talking about.
You want a full blown model because you need to apply a "type" field
that describes the relationship (normal phrase vs relevant phrase).
You could use HABTM but that is, in my opinion, a little more
confusing to implement.

Hope that helps!
Af1ee2208795752559b049502ded660c?d=identicon&s=25 Andrew Cowan (k2mia)
on 2006-04-25 05:39
Thank you both for the suggestions and for the pointer to your blog!

Yep, my approach was and is still gross, but I must admit I am concerned
with the size that the join table will take, as currently the urls table
has about 7000 records and the phrases table has about 87000 records...

So, being a newbie with the has_many :through relationship I'd like to
make sure I understand before commiting to its implementation. A quick
rundown to make sure I am getting what I need to do, please tear this
apart if it becomes apparent that I missed the boat -- used the example
from hasmanythrough.com.


Table 1: urls - has id and url
Table 2: phrases - has id and phrase
Table 3: relations - has id, url_id, phrase_id, isrelevant (boolean)

Ignore isrelevant for now as I can add class methods to weed out the
relationship during overloaded finds, am mostly concerned in the url <->
phrases relationship.

class Relation < ActiveRecord::Base
   belongs_to :url,    :foreign_key => "url_id",    :class_name => "Url"
   belongs_to :phrase, :foreign_key => "phrase_id", :class_name =>
"Phrase"
end

class Url < ActiveRecord::Base
   has_many :related_as_phrase, :foreign_key => 'phrase_id',
            :class_name => 'Relation'
   has_many :phrases,  :through => :related_as_phrase
end

class Phrase < ActiveRecord::Base
   has_many :related_as_url, :foreign_key => 'url_id',
            :class_name => 'Relation'
   has_many :urls,  :through => :related_as_urls
end


Do the above model rules look right for this? Anything I should be aware
of?

Thanks again, am much appeciated that this was pointed out to me!

-Andy



> Wow, that sounds gross. Packing foreign keys into a varchar list of ids
> means you lose all the power of the database to create relations between
> records or do queries with joins. You get more leverage from using
> something like a join table to let the database manage those relations
> for you. There are two ways to do what you want, has_and_belong_to_many
> (join table) or has_many :through (join model). If you go with habtm
> you'll need two join tables, one for each type of relationship. If you
> use has_many :through, you can use one join model with an attribute that
> indicates the type of the relationship. You can find a lot more
> information about doing that on my blog.
>
> --
> Josh Susser
> http://blog.hasmanythrough.com
9f0f89bbd9e1ecfbaab6584e429b7a2f?d=identicon&s=25 Josh Susser (jsusser)
on 2006-04-25 06:30
Andrew Cowan wrote:
> Thank you both for the suggestions and for the pointer to your blog!

No prob. That's what it's there for.

> Yep, my approach was and is still gross, but I must admit I am concerned
> with the size that the join table will take, as currently the urls table
> has about 7000 records and the phrases table has about 87000 records...

Don't worry about the size of the join table. Databases are built just
for doing this sort of thing, and unless your table is many millions of
rows long you don't need to be concerned.

> Table 1: urls - has id and url
> Table 2: phrases - has id and phrase
> Table 3: relations - has id, url_id, phrase_id, isrelevant (boolean)
>
> Ignore isrelevant for now as I can add class methods to weed out the
> relationship during overloaded finds, am mostly concerned in the url <->
> phrases relationship.

Your tables look fine, but you go a bit overboard in the model classes.
You'll have much more fun in Rails if you go with the flow, meaning just
follow the conventions and let Rails use the defaults unless you
actually need to specify something different. It looks like you got
confused looking at the polymorphic examples, which is serious overkill
for what you want. Just go with the standard has_many :through, like so:

  class Relation < ActiveRecord::Base
     belongs_to :url
     belongs_to :phrase
  end

  class Url < ActiveRecord::Base
     has_many :relations
     has_many :phrases, :through => :relations
  end

  class Phrase < ActiveRecord::Base
     has_many :relations
     has_many :urls,    :through => :relations
  end

One more thing - the model name "relation" is probably too generic.
Having a bad name isn't going to break anything, but having a meaningful
name can help you think more clearly about your objects. "Magazine
has_many :readers, :through => :subscritions" is a lot more meaningful
than "Magazine has_many :people, :through => :relations". It's *almost*
possible for Ruby code to be self-documenting if you give things good
names. Just something to think about.

--
Josh Susser
http://blog.hasmanythrough.com
Af1ee2208795752559b049502ded660c?d=identicon&s=25 Andrew Cowan (k2mia)
on 2006-04-25 06:51
Ahhhh! I like your suggested model changes much better, the node/edges
example on your site put some doubt in my mind that defining the models
would be as natural as the revised models end up being.

Thank you again for taking the time to help clear this up, it is much
appreciated! :)

-Andy




Josh Susser wrote:
> Andrew Cowan wrote:
>> Thank you both for the suggestions and for the pointer to your blog!
>
> No prob. That's what it's there for.
>
>> Yep, my approach was and is still gross, but I must admit I am concerned
>> with the size that the join table will take, as currently the urls table
>> has about 7000 records and the phrases table has about 87000 records...
>
> Don't worry about the size of the join table. Databases are built just
> for doing this sort of thing, and unless your table is many millions of
> rows long you don't need to be concerned.
>
>> Table 1: urls - has id and url
>> Table 2: phrases - has id and phrase
>> Table 3: relations - has id, url_id, phrase_id, isrelevant (boolean)
>>
>> Ignore isrelevant for now as I can add class methods to weed out the
>> relationship during overloaded finds, am mostly concerned in the url <->
>> phrases relationship.
>
> Your tables look fine, but you go a bit overboard in the model classes.
> You'll have much more fun in Rails if you go with the flow, meaning just
> follow the conventions and let Rails use the defaults unless you
> actually need to specify something different. It looks like you got
> confused looking at the polymorphic examples, which is serious overkill
> for what you want. Just go with the standard has_many :through, like so:
>
>   class Relation < ActiveRecord::Base
>      belongs_to :url
>      belongs_to :phrase
>   end
>
>   class Url < ActiveRecord::Base
>      has_many :relations
>      has_many :phrases, :through => :relations
>   end
>
>   class Phrase < ActiveRecord::Base
>      has_many :relations
>      has_many :urls,    :through => :relations
>   end
>
> One more thing - the model name "relation" is probably too generic.
> Having a bad name isn't going to break anything, but having a meaningful
> name can help you think more clearly about your objects. "Magazine
> has_many :readers, :through => :subscritions" is a lot more meaningful
> than "Magazine has_many :people, :through => :relations". It's *almost*
> possible for Ruby code to be self-documenting if you give things good
> names. Just something to think about.
>
> --
> Josh Susser
> http://blog.hasmanythrough.com
Af1ee2208795752559b049502ded660c?d=identicon&s=25 Andrew Cowan (k2mia)
on 2006-04-25 19:57
Josh, I didn't get to play with this until this morning as my server was
getting a hardware upgrade last night -- so far this is working out
great as far as being able to access url.phrases and phrase.urls ...

I fully understand your meaning with renaming the Relation model, but
after much thought I really cannot find anything more approriate so for
the time being am leaving that.

One thing I am wondering is, how am I able to access the specific
relation that a url and phrase share?

Example:

given a list of single url @url
@url.phrases.each do |ph|
   # How can I access the relation that @url and ph share through the
   # Relation model here ?
end

I tried ph.relation on a whim and of course found it a non-existent
member.

I'll keep poking at it but so far am not sure if I need to modify the
model or if something else is needed...

Thanks!
Andy





>
>   class Relation < ActiveRecord::Base
>      belongs_to :url
>      belongs_to :phrase
>   end
>
>   class Url < ActiveRecord::Base
>      has_many :relations
>      has_many :phrases, :through => :relations
>   end
>
>   class Phrase < ActiveRecord::Base
>      has_many :relations
>      has_many :urls,    :through => :relations
>   end
>
Af1ee2208795752559b049502ded660c?d=identicon&s=25 Andrew Cowan (k2mia)
on 2006-04-25 22:24

Actually, now that I have gotten to play with this some more I see that
there is a very real performance hit, so much that this won't be usable
as-is. Not sure if this is the way has_many :through should work but
I'll desribe what I am seeing.

Some prelim info:

The box is only about a year or so old, has dual xeon 3.06 HT
processors, 4G ram and dual raid-1 10kMhz scsi drives - it has handled
everything else I have thrown at it without a whimper.

Table 1: urls - has roughly 7,000 records
Table 2: phrases - has roughly 87,000 records
Table 3: relations - has 235,000 or so records, each one having 1 url
and 1 phrase from their respective DBs

Given a single url object, Merely doing a simple:

@phrases = url.phrases

I am watching 87,000 SQL queries fly by in my lighttpd output, similar
to this:

Relation Load (0.049132)   SELECT * FROM relations WHERE (url_id=7163
AND phrase_id=77686) LIMIT 1

one for each phrase_id covering every phrase_id in the phrases table.

Ok, that in itself is an application killer because the above operation
would be one of many for a given page display, and just that is showing
noticable lag before the page is rendered.

So, my hunch tells me something is not right? Are these tables just too
big to be using this technique or is something else going on?

-Andy



Andrew Cowan wrote:
> Josh, I didn't get to play with this until this morning as my server was
> getting a hardware upgrade last night -- so far this is working out
> great as far as being able to access url.phrases and phrase.urls ...
>
> I fully understand your meaning with renaming the Relation model, but
> after much thought I really cannot find anything more approriate so for
> the time being am leaving that.
>
> One thing I am wondering is, how am I able to access the specific
> relation that a url and phrase share?
>
> Example:
>
> given a list of single url @url
> @url.phrases.each do |ph|
>    # How can I access the relation that @url and ph share through the
>    # Relation model here ?
> end
>
> I tried ph.relation on a whim and of course found it a non-existent
> member.
>
> I'll keep poking at it but so far am not sure if I need to modify the
> model or if something else is needed...
>
> Thanks!
> Andy
>
This topic is locked and can not be replied to.