On Sat, Jan 7, 2012 at 12:20 AM, Linus P. <
[email protected]> wrote:
Wow! Thank you for all the help Peter! I really appreciate it.
No prob, thank you for getting closer to the root cause.
I will test that code tomorrow as it’s getting quite late here.
I did a small test now though. It seems that you can’t mix joins and
includes.
Indeed, strange …
subcategories.id: SELECT “categories”.* FROM “categ
“subcategories”.“id” AS t1_r0, “subcategories”.“name” AS
t1_r1, “subcategories”.“category_id” AS t1_r2,
“subcategories”.“created_at” AS t1_r3, “subcategories”.“updated_at” AS t1_r
4, “subcategories”.“permalink” AS t1_r5 FROM “categories” LEFT OUTER JOIN
“subcategories” ON “subcategories”.“category_id”
= “categories”.“id” WHERE (subcategories.id > 0)”
Indeed. I added a Project class with User has_many :projects to my test
project and can confirm:
010:0> a = Account.includes(:users).where(“users.name like ‘peter’”)
SQL (1.1ms) SELECT “accounts”.“id” AS t0_r0, “accounts”.“number” AS
t0_r1, “accounts”.“created_at” AS t0_r2, “accounts”.“updated_at” AS
t0_r3,
“users”.“id” AS t1_r0, “users”.“name” AS t1_r1, “users”.“account_id” AS
t1_r2, “users”.“created_at” AS t1_r3, “users”.“updated_at” AS t1_r4 FROM
“accounts” LEFT OUTER JOIN “users” ON “users”.“account_id” =
“accounts”.“id” WHERE (users.name like ‘peter’)
=> [#<Account id: 1, number: “123”, created_at: “2011-12-01 22:24:16”,
updated_at: “2011-12-01 22:24:16”>]
011:0> a1 = a.first
=> #<Account id: 1, number: “123”, created_at: “2011-12-01 22:24:16”,
updated_at: “2011-12-01 22:24:16”>
012:0> a1.users.first
=> #<User id: 1, name: “peter”, account_id: 1, created_at: “2011-12-01
22:24:16”, updated_at: “2011-12-01 22:24:16”>
This works as expected. 1 large query for accounts and associated users.
013:0> a1.users.first.projects
Project Load (1.0ms) SELECT “projects”.* FROM “projects” WHERE
“projects”.“user_id” = 1
=> [#<Project id: 1, name: “project_of_peter”, user_id: 1, created_at:
“2012-01-07
10:51:51”, updated_at: “2012-01-07 10:51:51”>]
Also as expected. Since “projects” was not joined or included in the
original query,
we need a second query here (which is then not filtered and thus not the
results we want).
014:0> a = Account.includes(:users).where(“users.name like
‘peter’”).joins(:users => :projects)
Account Load (1.3ms) SELECT “accounts”.* FROM “accounts” INNER JOIN
“users” ON “users”.“account_id” = “accounts”.“id” INNER JOIN “projects”
ON
“projects”.“user_id” = “users”.“id” WHERE (users.name like ‘peter’)
User Load (0.9ms) SELECT “users”.* FROM “users” WHERE
“users”.“account_id” IN (1)
=> [#<Account id: 1, number: “123”, created_at: “2011-12-01 22:24:16”,
updated_at: “2011-12-01 22:24:16”>]
This is the getting closer to the root cause as you discovered now. As
soon
as there is a “joins” added,
the :includes fails to Eagerly load only the filtered users in a
single
complex query, but goes to a second
query (that is not filtered on users; that would be “not filtered on
subcategories” in your code).
015:0> a = Account.includes(:users).where(“users.name like
‘peter’”).includes(:users => :projects)
SQL (1.5ms) SELECT “accounts”.“id” AS t0_r0, “accounts”.“number” AS
t0_r1, “accounts”.“created_at” AS t0_r2, “accounts”.“updated_at” AS
t0_r3,
“users”.“id” AS t1_r0, “users”.“name” AS t1_r1, “users”.“account_id” AS
t1_r2, “users”.“created_at” AS t1_r3, “users”.“updated_at” AS t1_r4,
“projects”.“id” AS t2_r0, “projects”.“name” AS t2_r1,
“projects”.“user_id”
AS t2_r2, “projects”.“created_at” AS t2_r3, “projects”.“updated_at” AS
t2_r4 FROM “accounts” LEFT OUTER JOIN “users” ON “users”.“account_id” =
“accounts”.“id” LEFT OUTER JOIN “projects” ON “projects”.“user_id” =
“users”.“id” WHERE (users.name like ‘peter’)
=> [#<Account id: 1, number: “123”, created_at: “2011-12-01 22:24:16”,
updated_at: “2011-12-01 22:24:16”>]
016:0> a.size
=> 1
017:0> a.first.users
=> [#<User id: 1, name: “peter”, account_id: 1, created_at: “2011-12-01
22:24:16”, updated_at: “2011-12-01 22:24:16”>]
018:0> a.first.users.first.projects
=> [#<Project id: 1, name: “project_of_peter”, user_id: 1, created_at:
“2012-01-07
10:51:51”, updated_at: “2012-01-07 10:51:51”>]
So, using :includes everywhere seems to not trigger this problem.
I don’t know if that is an acceptable fix for your project (depending on
much
unneeded data that loads for you)?
I don’t know if this is a “bug” or just misunderstanding the behavior of
:joins
and :includes in Eager loading from our side?
If I assume the latter (us not understanding it well enough), then
conclusions
could be:
- :includes is just a convenience for eager loading (performance
improvement)
and not designed to work together with joins in the way we expected
- trying to use has_many relationships with conditions may be a solution
…
First added a “lang” column to projects and trying a condition on the
projects table
(fails in the same way):
012:0> a = Account.includes(:users).joins(:users =>
:projects).where([‘projects.lang = ?’, ‘ruby’])
Account Load (1.2ms) SELECT “accounts”.* FROM “accounts” INNER JOIN
“users” ON “users”.“account_id” = “accounts”.“id” INNER JOIN “projects”
ON
“projects”.“user_id” = “users”.“id” WHERE (projects.lang = ‘ruby’)
User Load (0.5ms) SELECT “users”.* FROM “users” WHERE
“users”.“account_id” IN (1)
=> [#<Account id: 1, number: “123”, created_at: “2011-12-01 22:24:16”,
updated_at: “2011-12-01 22:24:16”>]
This is still correct. Only 1 account with id 1 has a user (peter) with
a
project with lang “ruby”.
013:0> a1 = a.first
=> #<Account id: 1, number: “123”, created_at: “2011-12-01 22:24:16”,
updated_at: “2011-12-01 22:24:16”>
Getting that first (and only) account from the array.
014:0> a1.users
=> [#<User id: 1, name: “peter”, account_id: 1, created_at: “2011-12-01
22:24:16”, updated_at: “2012-01-07 11:28:20”, gender: “M”>, #<User id:
3,
name: “Sarah”, account_id: 1, created_at: “2012-01-06 21:23:22”,
updated_at: “2012-01-07 11:28:24”, gender: “F”>]
But this is not what we want. We get all users from account 1, not just
the
users with a “ruby” project.
015:0> a1.users.first.projects
Project Load (0.9ms) SELECT “projects”.* FROM “projects” WHERE
“projects”.“user_id” = 1
=> [#<Project id: 1, name: “project_of_peter”, user_id: 1, created_at:
“2012-01-07
10:51:51”, updated_at: “2012-01-07 11:38:15”, lang: “ruby”>, #<Project
id:
2, name: “coffeescript_project”, user_id: 1, created_at: “2012-01-07
11:39:10”,
updated_at: “2012-01-07 11:39:10”, lang: “coffee”>]
And similar problem, for user ‘peter’ we get all projects, also the ones
in
a different language.
016:0> a1.users[1].projects
Project Load (0.8ms) SELECT “projects”.* FROM “projects” WHERE
“projects”.“user_id” = 3
=> []
And user ‘sarah’ has not projects at all, but still in the list of
users.
OK, trying another solution …
class Account < ActiveRecord::Base
has_many :users
attr_accessor :lang
has_many :users_by_project_lang,
:class_name => “User”,
:include => :projects,
:conditions => Proc.new { [“projects.lang = ?”, lang] }
end
I add a special has_many relationship on the Account class
(that would be your Category class). Now using that to
find the “users” LIMITED to a certain project_lang
(in your case that would be subcategories, limited to a certain
product.gender).
018:0> a1.lang=‘ruby’
=> “ruby”
The conditions will be late evaluated against the “self” (that is a1
here).
So
we need to set it on a1.
019:0> a1.users_by_project_lang
SQL (1.5ms) SELECT “users”.“id” AS t0_r0, “users”.“name” AS t0_r1,
“users”.“account_id” AS t0_r2, “users”.“created_at” AS t0_r3,
“users”.“updated_at” AS t0_r4, “users”.“gender” AS t0_r5,
“projects”.“id”
AS t1_r0, “projects”.“name” AS t1_r1, “projects”.“user_id” AS t1_r2,
“projects”.“created_at” AS t1_r3, “projects”.“updated_at” AS t1_r4,
“projects”.“lang” AS t1_r5 FROM “users” LEFT OUTER JOIN “projects” ON
“projects”.“user_id” = “users”.“id” WHERE “users”.“account_id” = 1 AND
(projects.lang = ‘ruby’)
=> [#<User id: 1, name: “peter”, account_id: 1, created_at: “2011-12-01
22:24:16”, updated_at: “2012-01-07 11:28:20”, gender: “M”>]
Now we get back our “complex” query, filtered on a parameter of the the
joined products table.
021:0> a1.users_by_project_lang.each{|user| puts “user = #{user} with
projects #{user.projects}”}
user = #User:0x94cd7c8 with projects [#<Project id: 1, name:
“project_of_peter”, user_id: 1, created_at: “2012-01-07 10:51:51”,
updated_at: “2012-01-07 11:38:15”, lang: “ruby”>]
=> [#<User id: 1, name: “peter”, account_id: 1, created_at: “2011-12-01
22:24:16”, updated_at: “2012-01-07 11:28:20”, gender: “M”>]
And now, only the correct users (‘peter’) with the filtered projects
(with
lang ‘ruby’) are returned.
So, I presume, if you make a similar has_many in Category
has_many :subcategories_by_product_gender
this would work.
But … since the has_many only has a :include (and not a :joins)
options,
this seems
equivalent to the first option of using all :includes for the entire
chain
(and
no :joins). So, in reality, it does not seem to be better than that
first
option
(which has the disavantage of loading all the associated info, even if
you
so not need it).
An alternative might be to make a dedicated has_many and use :finder_sql
to hand-code the SQL …
“:finder_sql
Specify a complete SQL statement to fetch the association. This is a
good
way
to go for complex associations that depend on multiple tables.”
It seems this feature request seems complex enough to warrant that.
I would certainly be interested in a “proper” solution for this issue
(using a
mix of :includes and :joins, where the :includes are eagerly loaded in
the SELECT part in the first query (so these results are filtered) and
the
:joins are only used to eveluate conditions or ordering, etc. but not
have
to be included in the SELECT part).
I looked into scopes, but I failed to find a proper way to apply a scope
to
a has_many relationship … (maybe with_scope {}, but that seems to be
depricated ?).
HTH,
Peter
–
Peter V.
http://twitter.com/peter_v
http://rails.vandenabeele.com