Ruby Forum Rails-core (closed, excessive spam) > Nested has_many :through patch

Posted by Bernardo de Pádua (Guest)
on 13.07.2007 09:36
(Received via mailing list)
Hi to all,

This is my first post here, I would first like to salute you all and
praise the great work you guys are doing on RoR.

There is a patch (http://dev.rubyonrails.org/ticket/6461) sitting for
quite a while on the Trac for nested has_many :through associations. I
think it is a must and I have seen it requested quite a few times
(http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/
4517fb8bf464921a/6f4c544fb948b9d1  and
http://groups.google.com/group/rubyonrails-talk/browse_thread/thread/1ee28a00febcbc42/b34f05aae9e01688
).

The patch seems to be all sorted out for more than 2 months (all tests
working). It was even made into a plugin by the author. Is there a
reason it was not applied yet? I didn't find anything about it in this
list or elsewhere.

I think it greatly simplifies apps that have complex (=large number)
models. To make a case for it, I will show a simple 'posts and
comments' example where it would be very useful (I am just writing the
essential).

class CommentRating < AR:Base
  belongs_to :comment
  belongs_to :user
end

class Comment < AR::Base
  has_many :comment_ratings
  belongs_to :user
end

class Post < AR::Base
  has_many :comments
  belongs_to :user
end

class User < AR::Base
  has_many :posts
  has_many :comments, :through=>:posts
  has_many :comment_ratings, :through=>:comments
end

If I wanted to allow a user to review only the ratings given to the
comments from his posts, I could use the contextualized finder:
current_user.comment_ratings.find(params[:id])

I know this example seems a little "forced", but as your modelspace
gets larger, without this kind of nesting, you find yourself having to
write more and more custom finders, with joins, scopes, etc, for
things that could be handled in a much simpler way.

Cheers,

Bernardo (from Brazil)
Posted by Steven Bristol (stevenbristol)
on 13.07.2007 13:57
(Received via mailing list)
>
>
>
> I think it greatly simplifies apps




+1

I just started using the plugin last night, so I can't attest to it's
stability yet, but the functionality seems like a good fit for core, 
IMHO.

--
Thank you,
Steven A Bristol
Posted by Pratik Naik (pratik)
on 13.07.2007 14:00
(Received via mailing list)
±0

Nested include could be quite costy at database level. Something like
this would need some more benchmarking.

On 7/13/07, Steven A Bristol <stevenbristol@gmail.com> wrote:
>
> --
> Thank you,
> Steven A Bristol
>
>
>  >
>


--
http://m.onkey.org
Posted by Steven Bristol (stevenbristol)
on 13.07.2007 14:26
(Received via mailing list)
While I agree that any additional trips to the database or additional 
joins
to a query are more costly than not doing them, and I would love to see 
some
benchmarking here and get any performance gains possible, I'm not sure I
agree with you implication that this is a reason for exclusion.

As I understand it, the purpose of the plugin is not to increase
performance, but rather to make it easier and cleaner to do things we 
are
already doing.

From Bernardo's example:

> class Post < AR::Base
>   has_many :comments
>   belongs_to :user
> end
>
> class User < AR::Base
>   has_many :posts
>   has_many :comments, :through=>:posts
>   has_many :comment_ratings, :through=>:comments
> end
>

With out the plugin one would have to do something like:


class User <AR::Base
  has_many :posts
  has_many :comments, :through=>:posts

  def comments_rating #or use find_by_sql
    ratings = []
    comments.each{ |c| ratings << c.comment_ratings}
    ratings.flatten
  end
end


And that is just to get a finder. One would have reimplement every hm 
method
that they need to use.

So I think the performance would be similar to the methods already being
used, but much much cleaner.


My use case is Business -> BusinessOwner -> User -> Contacts (where
BusinessOwner is a special kind of BusinessMember) (I do have a good
business rule reason why the contact is owned by the business owner and 
not
the business).

Here is the code:
 Business.find(5).contacts

Here is the query produced by the plugin:
 SELECT contacts.* FROM contacts INNER JOIN users ON (contacts.owner_id 
=
users.id) INNER JOIN business_members ON (users.id =
business_members.user_id) WHERE (business_members.business_id = 5 AND 
type =
'BusinessOwner')

Which I think is a much better query than the comment_ratings I wrote 
above
would use.


Thanks,
steven a bristol
Posted by Josh Peek (Guest)
on 13.07.2007 17:01
(Received via mailing list)
On Jul 13, 6:59 am, Pratik <pratikn...@gmail.com> wrote:
> ±0
>
> Nested include could be quite costy at database level. Something like
> this would need some more benchmarking.

-1

I agree with Pratik
Posted by Joshua Sierles (Guest)
on 13.07.2007 18:34
(Received via mailing list)
+1

Rails has usually been about clean queries over performant
ones. :include came way after associations, for example, but :include
solved a real problem visible in real applications.

No benchmarking makes sense outside the context of a real
application, and this patch would benefit real applcations which can
then determine
whether there is a painful spot that needs work.

In other words this would help most people most of the time who have
run into this rather common situation and have solved it in ways that
would be equally performant without
making specific optimizations like splitting into multiple queries.

Josh
Posted by Gabe da Silveira (Guest)
on 13.07.2007 19:01
(Received via mailing list)
+1 based on a cursory glance

The only reason to reject this would be that the algorithm is flawed.
The fact that it produces a complex query should not be an issue.  By
and large you WANT these kinds of things to be taken care of by the
database where indexes and other optimizations are more efficiently
brought to bear.  If you need this kind of logic, there's no getting
around the irreducible complexity.  Let's have ActiveRecord do its
best for the general case, and let application writers take care of
the hacky optimizations for individual cases.
Posted by Pratik Naik (pratik)
on 13.07.2007 19:04
(Received via mailing list)
When a new feature is implemented in rails, people will tend to use
it, believing it's the rails way to do things, without digging much
about the internals. Which is the correct way in most of the cases.
Itroducing a feature that could possibly be a bottleneck, needs a bit
of more research.

> In other words this would help most people most of the time who have
> run into this rather common situation and have solved it in ways that
> would be equally performant without
> making specific optimizations like splitting into multiple queries.
>

That makes it a plugin material imho.

Also in the given example above, this *might* already work :

class CommentRating < AR:Base
 belongs_to :comment
 belongs_to :user
end

class Comment < AR::Base
 has_many :comment_ratings
 belongs_to :user
end

class Post < AR::Base
 has_many :comments, :include => :comment_ratings
 belongs_to :user
end

class User < AR::Base
 has_many :posts
 has_many :comments, :through=>:posts
end

Needs to be verified though.

-Pratik
Posted by Josh Peek (Guest)
on 13.07.2007 23:30
(Received via mailing list)
On Jul 13, 11:32 am, Joshua Sierles <jsier...@gmail.com> wrote:
> this rather common situation

LOL

On Jul 13, 12:03 pm, Pratik <pratikn...@gmail.com> wrote:
> That makes it a plugin material imho.

Yeah, lets see this thing working and a little more optimized before
we include it into Core.

To me its just seems like a case of a weak domain and poor modeling.
Posted by Bernardo de Pádua (Guest)
on 14.07.2007 11:14
(Received via mailing list)
I (obviously) agree with Steven, Joshua and Gabe, not only is this
common (in any large app), but I don't see any performance issue with
it (on the contrary). I also think this kind of nesting is the logical
thing to support, as I've seen many users think (myself included) that
it was already there in Rails, try it and be surprised with the errors
it caused. If the Rails way was wanting the user to do custom
"optimized" queries all the time, perhaps it shouldn't even have
associations or finders.

In any reasonably large DB with more than 50 entities it is impossible
to model things in a way that you will never need more than 2 levels
of "INNER JOINING" to query things. So, I agree that the examples
given here may be a case of "bad modeling" (perhaps not bad, but that
could be handled in some other way) if considered separately. My real
app that needs this is way more complex than the example given, I just
made this example up to try to explain it faster. By supporting HMT
nesting Rails will not stimulate (sane) users to make their DB
unnecessarily more complex, but will help (a lot) developer of larger
apps, who can't avoid complexity.

I coded my example ('posts and comments') and run some benchmarks on
it. I implemented some finders in the user class, in 5 different ways.
The last one is just using the nested HMT plugin from the patch (I
needlessly put it in a method as well, just to say I wasn't being
unfair in the benchmark).

 def ratings_ruby_way(*arguments_to_find)
    ratings  = comments.find(:all).inject([]) do |arr, c|
      arr << c.comment_ratings.find(*arguments_to_find)
    end
    return ratings.flatten
  end

  def ratings_include_ruby_way(*arguments_to_find)
    ratings =
comments.find(:all, :include=>:comment_ratings).inject([])do |arr, c|
      arr << c.comment_ratings.find(*arguments_to_find)
    end
    return ratings.flatten
  end

  def ratings_find_all_include_ruby_way
    ratings =
comments.find(:all, :include=>:comment_ratings).inject([])do |arr, c|
      arr << c.comment_ratings
    end
    return ratings.flatten
  end

  def ratings_find_all_by_sql_way
    CommentRating.find_by_sql(
      [%q{
        Select comment_ratings.* FROM comment_ratings
          INNER JOIN comments ON
            comments.id = comment_ratings.comment_id
          INNER JOIN posts ON
            posts.id = comments.post_id
          INNER JOIN users ON
            users.id = posts.user_id
          WHERE
              users.id = ?},
      self.id] )
  end

  def ratings_nested_way(*arguments_to_find)
    ratings = comment_ratings.find(*arguments_to_find)
    return ratings
  end

I've filled the tables with some test data (about 700 rows each, 5000
comment_ratings). In the benchmark each method is run 500 times,
passing ':all' when appropriate, on a set of 500 random users (the
same set for each method, on the same run).

Results for PostgreSQL 8.2 (using pg dlls based adapter):
>> load 'lib/benchmark_hmt.rb'
                              user     system      total        real
ruby_way                  1.172000   0.422000   1.594000 (  3.328000)
ruby_include              2.562000   0.484000   3.046000 (  5.625000)
find_all_ruby_include  2.172000   0.203000   2.375000 (  4.078000)
find_all_sql              0.734000   0.203000   0.937000 (  2.156000)
nested                    1.016000   0.266000   1.282000 (  2.360000)

Results for MySQL 5.0.24a (with default Rails adapter):

>> load 'lib/benchmark_hmt.rb'
                              user     system      total        real
ruby_way                  2.766000   0.235000   3.001000 (  3.407000)
ruby_include              4.703000   0.422000   5.125000 (  7.656000)
find_all_ruby_include  3.453000   0.109000   3.562000 (  3.891000)
find_all_sql              1.485000   0.250000   1.735000 (  1.937000)
nested                    1.812000   0.234000   2.046000 (  2.250000)

It was run on a Pentium M 1.7GHZ, 1GB RAM, WinXP. The DBMSs were
local. I used Rails Edge (from yesterday).

Even though MySQL was slightly faster, when I initially tried it on a
larger set of test data (~ 20000 rows) it was slower than Postgres.
The methods with :include got unusable, and only one query was taking
more than 160 seconds (they virtually froze my notebook). MySQL didn't
like the mix of LEFT OUTER JOIN and INNER JOIN Rails generated. I
don't have the benchmark numbers for those tests and it takes too long
to enter the test data, specially in MySQL, so I decided to use this
smaller set.

The great surprise here is that using :include was slower than just
making a lot of smaller queries (method 1 versus method 3). Using the
nested has_many :through is almost as fast as doing the  find_by_sql
directly, and way faster than anything else. And the great plus of the
nested has_many :through is that you get all those fine association
methods, effortlessly.

If you want check the benchmark app, I made it available at:

http://www.bernardopadua.com/temp/benchmark_nested_has_many_through.zip

Cheers,

Bernardo
Posted by Joshua Sierles (Guest)
on 14.07.2007 11:34
(Received via mailing list)
On Jul 14, 2007, at 11:13 AM, Bernardo de Pádua wrote:


> I also think this kind of nesting is the logical
> thing to support, as I've seen many users think (myself included) that
> it was already there in Rails, try it and be surprised with the errors
> it caused. If the Rails way was wanting the user to do custom
> "optimized" queries all the time, perhaps it shouldn't even have
> associations or finders.
>

This is the most important point to make. If you all remember, nested
includes was merged into core because of its
elegant implementation and the obvious logical fit of the behavior.

http://dev.rubyonrails.org/ticket/3913

Josh
Posted by Steven Bristol (stevenbristol)
on 14.07.2007 14:21
(Received via mailing list)
Thanks Bernardo for doing all of that work!

Maybe we could all make another round of posts to try and discover the
maturity of the plugin. Koz & the rest of core might want to wait until 
the
plugin has been pounded in a few apps for a while before deciding one 
way or
another.

I for one have been using it only for a few days. The code hasn't moved 
to
production yet, but all of my tests are passing so far. Right now my 
feeling
is that if all of my tests are still passing when I am done with my 
feature
(I should be done by tomorrow night) then I will be comfortable with 
pushing
it to production and reporting back here after some real world testing.

How long have you all been using it and what are your experiences with 
it in
a real world app?

Thanks,
Steven A Bristol
Posted by Pratik Naik (pratik)
on 14.07.2007 14:37
(Received via mailing list)
In all the cases, the patch ( for plugin of course ) needs more tests.
E.g. with all different htm options ( uniq => true, :conditions, etc.
)

It also needs to be tested against all allowed association collection
methods ( that's from mapping defined in association.rb )

There should be no exception for nested hmt as well. ( e.g.
collection.foo will not work for nested htm )

Thanks
-Pratik

On 7/14/07, Steven A Bristol <stevenbristol@gmail.com> wrote:
> (I should be done by tomorrow night) then I will be comfortable with pushing
>
>  >
>


--
http://m.onkey.org
Posted by Matt Westcott (Guest)
on 16.07.2007 23:23
(Received via mailing list)
re: http://dev.rubyonrails.org/ticket/6461

Hi,
I'm the author of the patch, or at least the latest one to pick up the
mantle... many thanks to Bernado for pointing me to this thread, and
for doing the benchmarking.

On Jul 13, 1:25 pm, "Steven A Bristol" <stevenbris...@gmail.com>
wrote:
> As I understand it, the purpose of the plugin is not to increase
> performance, but rather to make it easier and cleaner to do things we are
> already doing.

Yep, that's pretty much it - I'd put it a bit more strongly, and say
that it makes it *possible* to do things in ActiveRecord that could
previously only be achieved with workarounds (namely, find_by_sql -
which doesn't support eager loading - or massaging the data set in
Ruby, which is a method I guess I haven't explored as fully as others
here, but just seems intuitively wrong to me when that's just the sort
of thing a database is good at).

It seems that a few people are still under the impression that this
kind of association only comes up in obscure scenarios, but the two
real-world use cases from my latest two projects have been "Find all
my friends' blog posts" (where Friendship has attributes that
necessitate it being a full model, rather than a habtm relationship)
and "Find all songs written by a member of this band" - all pretty
routine stuff.

As far as maturity goes, one of the above projects is in production
<http://footprint.wwf.org.uk/> and using the plugin lightly, and the
other one - still in the works - uses it extensively (in conjunction
with some fairly hefty cascading eager loading), and the 2007-04-28
revision of the patch has Just Worked for me so far. The only issue I
encountered was a table aliasing bug which turned out to be an
existing Rails bug <http://dev.rubyonrails.org/ticket/8267>, just one
that's easier to stumble upon with less twisty relations when you're
using this patch...

One thing it doesn't do yet is allow you to use a nested hmt
association within an :include, which would be a nice addition, but
probably be a mostly separate piece of development (and, sadly, will
be just as big a patch as the present one, I suspect). One of the unit
tests is also failing as of r7119, which I haven't had chance to look
into yet, but I'm hoping it's just triggered by the change in the
fixture.

Cheers for the feedback, everyone - keep me updated about any
deficiencies you find, and I'll give them my best shot.

- Matt