Count optimization

Sorry if this has been asked before…

I have a model named ticket (like a trouble ticket) and a ticket can
have many issues. When I list tickets in a table I want to output the
number of issues each ticket has. So, in my view I do something like:

<%= ticket.issues.count %>

But that hits the database with a query as many times as many tickets
there are.

In SQL I would do something like

SELECT tickets.id, count(*)
FROM tickets join issues on tickets.id = issues.ticket_id
GROUP BY tickets.id

What’s the usual approach for this kind of optimization?

Thanks!

Sergei

On Tue, Sep 16, 2008 at 3:55 PM, surge [email protected] wrote:

Sorry if this has been asked before…

I have a model named ticket (like a trouble ticket) and a ticket can
have many issues. When I list tickets in a table I want to output the
number of issues each ticket has. So, in my view I do something like:

<%= ticket.issues.count %>

You should use eager loading to include the issues when you get the
tickets, Ã la

@tickets = Ticket.find(:all, :include => :issues)

Franz S. wrote:

<%= ticket.issues.count %>

You should use eager loading to include the issues when you get the
tickets, Ã la

@tickets = Ticket.find(:all, :include => :issues)

Or use a counter cache if all you need is the number.


Jack C.
[email protected]

@tickets = Ticket.find(:all, :include => :issues)

That simple, huh? I kind of thought of this but I also thought that if
I did that, then if a ticket had, say, 3 issues, then the same ticket
would be listed 3 times. But I see now that I fell into the trap of
thinking of Rails model relationships as relational db relationships.
Lesson learned.

Jack, thanks for the counter cache tip!

Thanks!

On Sep 16, 9:59 pm, “Franz S.” [email protected] wrote:

You should use eager loading to include the issues when you get the
tickets, à la

@tickets = Ticket.find(:all, :include => :issues)

Depending upon your requirement, you could even optimize it further by
using a view or include that gets the count directly via SQL, and thus
avoid having to load all issues just to get the count.

On Sep 16, 10:51 pm, surge [email protected] wrote:

By “view” you mean an SQL view? Or did you mean something else? Also,
what kind of include did you have in mind?

Yeah, I meant SQL view. MySQL views seem to play nicely with Rails so
you should have no problems using that approach. If you create a
“tickets_with_counts” view with a COUNT subquery in it, you save the
db and AR from having to fetch all the corresponding records from
issues and composing them into AR objects.

I was meaning to say “select” instead of “include”. Assuming that your
association is a has_many, adding:

:select => “, COUNT(SELECT issues.id FROM issues
WHERE issues.ticket_id = tickets.id) AS issue_count”

in your Ticket.find call will give you an additional read-only field
called issue_count which does not rely on fetching and composing issue
records.

Gotcha. I was thinking about playing with the :select option. That was
my next step before I posted, but I decided to run it by the community
first. :select is nice but you lose the convenience of the Rails
automation, don’t you? You can also forget to select an id column for
example. Every new item would require a modification of the select
clause… An SQL view would work but I’m still working with mysql
4.1 :slight_smile: And I kind of want to keep as much as possible within the
application. Feels right that way. Let me know if you think I’m
missing something.

Thank you for your response all the same…

Depending upon your requirement, you could even optimize it further by
using a view or include that gets the count directly via SQL, and thus
avoid having to load all issues just to get the count.

By “view” you mean an SQL view? Or did you mean something else? Also,
what kind of include did you have in mind?

You get the scenario which you stated: a cartesian product of the
resultset. If you have 100 tickets with 100 issues each, you get
100x100 = 10,000 records, which AR has to compose into 100 AR objects
with a 1-to-many associations. It doesn’t mean that the tickets will
be listed 100 times each, it’s just that the resultset will return the
tickets 100 times each. But still, a cartesian product is not a very
good thing from an optimization point of view.

Good points…

:select => “*, COUNT(…) AS issue_count”
Aha! Now, that’s exactly what I was looking for! I will just need to
add a “group by”.

Although there’s a caveat with the :select approach: AFAIK, MySQL 4.1
does not support subqueries,

4.1. supports subqueries, but why would I need them in this case?
Isn’t it just an aggregate function?

On Sep 16, 11:34 pm, surge [email protected] wrote:

Thank you for your response all the same…
You were partially correct about the relational listing concept. Rails
2.1’s eager loading will use 2 SQL statements when you add
the :include => :issues option: 1 for the tickets table, and 1 for the
issues matching those tickets (in the form of a WHERE IN clause). It’s
up to AR to compose those records to AR objects with 1-to-many
associations.

If you have a :condition which specifies something like

:condition => “issues.active = 1”

You get the scenario which you stated: a cartesian product of the
resultset. If you have 100 tickets with 100 issues each, you get
100x100 = 10,000 records, which AR has to compose into 100 AR objects
with a 1-to-many associations. It doesn’t mean that the tickets will
be listed 100 times each, it’s just that the resultset will return the
tickets 100 times each. But still, a cartesian product is not a very
good thing from an optimization point of view.

If you really need all the fields from the table, you can always use:

:select => “*, COUNT(…) AS issue_count”

Although there’s a caveat with the :select approach: AFAIK, MySQL 4.1
does not support subqueries, so you might be stuck with the :include
option afterall.

Or you might want to look here:

http://railsexpress.de/blog/articles/2005/11/06/the-case-for-piggy-backed-attributes

HTH

Ah yes, I’ve totally forgotten about the GROUP BY. That’ll work
perfectly.

Erol, thank for your thoughts on this. I’m all set now. The :select
approach is definitely the fastest because it’s closest to the db.

Thanks!!!

On Sep 17, 12:08 am, surge [email protected] wrote:

4.1. supports subqueries, but why would I need them in this case?
Isn’t it just an aggregate function?

Ah yes, I’ve totally forgotten about the GROUP BY. That’ll work
perfectly.

Hi, using a counter cache would be the most efficient because there’s
no query. After you set it up, you’ll do the following:

ticket.issues.size

Good luck,

-Conrad

Sent from my iPhone

Well, I’m trying to go the easy route first – using eager loading.

Interestingly enough, when I added “:issues” to :include, the query
was correctly modified by rails to include a left join with issues. My
expectation now was that when I did ticket.issues.count, no additional
“count” query would be needed. But nope, Rails still does it.

I kind of don’t want to deal with a cache counter or :joins… I would
like to ride the automation wave as long as possible…

That would be nice to use a counter cache, but the problem is (as it
turns out), I have two counts to maintain. One is the total number of
issues in a ticket and the other, the number of completed issues. The
latter needs a condition. I guess I can’t use counter caches?

On Sep 17, 11:45 am, surge [email protected] wrote:

Well, I’m trying to go the easy route first – using eager loading.

Interestingly enough, when I added “:issues” to :include, the query
was correctly modified by rails to include a left join with issues. My
expectation now was that when I did ticket.issues.count, no additional
“count” query would be needed. But nope, Rails still does it.

I kind of don’t want to deal with a cache counter or :joins… I would
like to ride the automation wave as long as possible…

It should have been:

ticket.issues.size

That way you only count the number of “issues”. Running the scoped
count would do a COUNT from the db, irregardless if you’ve used eager
loading or not.

On Sep 17, 4:38 pm, “Conrad T.” [email protected] wrote:

Total number of issues open for a given ticket.

ticket.issues.open.size # performs a database query

Total number of issues open for a given ticket.

ticket.issues.closed.size # performs a database query

Good luck,

-Conrad

But I think that’s exactly what surge wanted to avoid: a database
query every time he wanted to get the counts, specially if it’s being
displayed on a list. Think of it as an N+1 problem.

On Tue, Sep 16, 2008 at 9:23 PM, surge [email protected] wrote:

That would be nice to use a counter cache, but the problem is (as it
turns out), I have two counts to maintain. One is the total number of
issues in a ticket and the other, the number of completed issues. The
latter needs a condition. I guess I can’t use counter caches?

Hi, you can use a counter cache to track the total number of issues for
a given ticket. Next, you’ll perform the query for the open and closed
ticket counts. The counter cache will track the number of objects
within
the database for a given model. Also, I would add index to the
appropriate
tables. For example, I would add an index to the following table:

issues

At this time, you should have a foreign key on the issue table called
ticket_id.
Thus, you’ll need to create a migration as follows:

script/generate migration add_indexes

Now, you’ll edit the migration file by adding the following to the
following:

class AddIndexes < ActiveRecord:Migration

def self.up
add_index :issues, :ticket_id
end

def self.down
remove_index :issues, :ticket_id
end

end

Now, you’ll need to run the following command:

rake db:migrate

Next, I would create named_scope on the Issue Model:

class Issue < ActiveRecord::Base

named_scope :open , :conditions => { :closed = false }
named_scope :closed, :conditions => { :closed = true }

end

Lastly, you should be able to do the following:

Total number of issues for a given ticket.

ticket.issues.size # uses the counter cache

Total number of issues open for a given ticket.

ticket.issues.open.size # performs a database query

Total number of issues open for a given ticket.

ticket.issues.closed.size # performs a database query

Good luck,

-Conrad

On Sep 18, 5:44 am, surge [email protected] wrote:

Well, I decided to bite the bullet and get closer to the metal by
using :joins and :select. That worked great. Now my query is very
compact – with only those columns I need in the table and there are
no extra count-related queries. It was a bit of a surprise for me that
when :include was used, :select was ignored, but it does make sense.

Yeah, that’s the default behavior of eager-loading; it ignores
your :select option.

So when I use :select and :join rails creates attr_readers for each of
the selected columns? For example, I used to have lines like <%=
ticket.priority.name %> in my view. When I changed to using :select, I
changed that line to <%= ticket.priority_name %> because in my :select
I retrieve the priority name directly from the db (by joining the
tickets table with the priorities table). So, I was expecting to see
something like “the ticket model doesn’t have a member named
‘priority_name’”, but nope, no error occurred. So, my guess is that
rails created an attr_reader for each of the columns mentioned
in :select. Is that approximately what happens?

That’s right. Rails will create an attr_accessor for each column
returned by the SQL, irregardless if it’s a real column (either from a
table or join) or a computed one.

Well, I decided to bite the bullet and get closer to the metal by
using :joins and :select. That worked great. Now my query is very
compact – with only those columns I need in the table and there are
no extra count-related queries. It was a bit of a surprise for me that
when :include was used, :select was ignored, but it does make sense.

So when I use :select and :join rails creates attr_readers for each of
the selected columns? For example, I used to have lines like <%=
ticket.priority.name %> in my view. When I changed to using :select, I
changed that line to <%= ticket.priority_name %> because in my :select
I retrieve the priority name directly from the db (by joining the
tickets table with the priorities table). So, I was expecting to see
something like “the ticket model doesn’t have a member named
‘priority_name’”, but nope, no error occurred. So, my guess is that
rails created an attr_reader for each of the columns mentioned
in :select. Is that approximately what happens?

Thank you, Erol, Conrad and everybody else for helping me!