DATABASE Query performance

Hi Everybody

Suppose We have two models named Book and Author. Make a suggestion on
how we will improve the database query performance on the following
code.

books = Book.find(:all)
books.each do | book |
puts book.author
end

Thanks in advance

On Sep 22, 5:39 pm, Barun A. [email protected] wrote:

Thanks in advance

Use eager-loading:

books = Book.find(:all, :include => :author)

You can use :select and :joins for the best performance, this is what I
use currently.

:include is much less efficient but it is the Rails way of doing things.
It used to create a JOIN but in a very ugly way.

Since Rails 2.x the :include code has been changed as it now makes 1
query on each table. It is still less efficient than custom :joins, but
it is easier to use its derived objects.

On Sep 22, 6:01 pm, Fernando P. [email protected]
wrote:

Since Rails 2.x the :include code has been changed as it now makes 1
query on each table. It is still less efficient than custom :joins, but
it is easier to use its derived objects.

Not for every scenario. For has_many associations, the Rails
2.x :include code is possibly better than a custom join when dealing
with large datasets, since a custom join would give you a cartesian
product while :include will only fetch the necessary records, at the
expense of course, of additional SQL statements. This specially holds
true if your webhost and db resides on different servers. If the data
is large enough, it may merit to add indices to your reference fields
(or foreign keys):

http://erolfornoles.blogspot.com/2008/08/optimize-eager-loading-in-rails-21.html

For 1-to-1 associations like belongs_to, Simpler Piggy-Backing makes
for a good alternative to custom joins:

http://railsexpress.de/blog/articles/2006/05/29/simpler-piggy-backing

Not for every scenario. For has_many associations, the Rails
2.x :include code is possibly better than a custom join

Have you tested? because at the time of Rails 1.2.6 where the :include
created a gigantic JOINs, my custom :joins was at least x3 times faster!
mostly because I only pick the fields I really need and because AR
doesn’t create full objects with their references and all the stuff.

I tried finding some info on MySQL forums and websites about 1 query
with multiple JOIN versus multiple queries, but it is not clear which is
more efficient.

Things get really complicated when I read: “How adding another table to
JOIN can improve performance ?”

On 22 Sep 2008, at 14:01, Fernando P. wrote:

I tried finding some info on MySQL forums and websites about 1 query
with multiple JOIN versus multiple queries, but it is not clear
which is
more efficient.

I wrote the :include code in 2.1 because of performance problems I had
with :include in 1.2.x. :including multiple has_manys from the same
object very quickly leads to the database generating a very large
number of rows (far greater than the number of actual objects you
want) which doesn’t help the database or ActiveRecord (who has to wade
through them and discard portions of rows it has already seen.

Fred

Yeah, in my case the Rails 2.1 :include was significantly faster than
a :join for multiple has_many relationships, like:

Interesting.

What is the size of the DB on which your performed the tests? Mine were
done on tables that have 100.000 and 1.000.000.000 rows. The :joins gave
nice results that were around 0.1s on my good old PowerPC dev machine.
The old :include method (rails 1.2.x) took almost 1s to perform the same
query.

I wrote the :include code in 2.1 because of performance problems I had
with :include in 1.2.x.
Hi Fred, yeah the :include in 1.2.x was pretty lame. Do you have some
test results of the new :include versus custom :joins?

On Sep 22, 9:01 pm, Fernando P. [email protected]
wrote:

Have you tested? because at the time of Rails 1.2.6 where the :include
created a gigantic JOINs, my custom :joins was at least x3 times faster!
mostly because I only pick the fields I really need and because AR
doesn’t create full objects with their references and all the stuff.

Yeah, in my case the Rails 2.1 :include was significantly faster than
a :join for multiple has_many relationships, like:

class Depot
has_many :trucks
has_many :containers
has_many :cranes
end

Of course, we may be talking about apples and bananas here, since my
example involves has_many associations while the scenario I could
think of given your description is a belongs_to or has_one.

I agree that belongs_to or has_one associations are better off piggy-
backed via a :join and :select, but using them eventually leads into a
lot of repetitions in our code. That’s why I prefer using the Simpler
Piggy-Backing plugin:

http://railsexpress.de/blog/articles/2006/05/29/simpler-piggy-backing

The author even claims: “In fact, using the extension is slightly
faster than coding the select and joins manually (1%-3%).”

I tried finding some info on MySQL forums and websites about 1 query
with multiple JOIN versus multiple queries, but it is not clear which is
more efficient.

The Depot example I mentioned earlier is better composed using atleast
2 SQL statements, as against a single one. Not only would the single
SQL approach (using joins) produce a cartesian product, it is
logically incorrect since trucks, cranes and containers are
independent of each other.

Then again, it all boils down to AR being ridiculously slow in
composing AR objects from a resultset.

Things get really complicated when I read: “How adding another table to
JOIN can improve performance ?”

http://www.mysqlperformanceblog.com/2008/08/01/how-adding-another-tab

There are a lot of quirks to doing optimizations in MySQL. The “one
index per one query” is one, which is why I usually shy away from
MySQL when doing data warehousing. I tried using it once to do CDR
(call detail record) reporting, but the performance was such a
nightmare that I had to migrate it to Postgre.

On Sep 22, 10:23 pm, Fernando P. [email protected]
wrote:

What is the size of the DB on which your performed the tests? Mine were
done on tables that have 100.000 and 1.000.000.000 rows. The :joins gave
nice results that were around 0.1s on my good old PowerPC dev machine.
The old :include method (rails 1.2.x) took almost 1s to perform the same
query.

Are the seconds you mentioned DB fetch times? Or do they include AR
times? And what kind of associations are involved?

The Depot model I mentioned is not that large, yet. A couple of dozen
depots, which has a hundred to a few thousand trucks and containers
each, and a couple of dozen cranes each. If you were going to use a
join on them, the yield will be a cartesian product of:

20 depots x 1000 trucks x 1000 containers x 10 cranes = 200,000,000
rows on the average, which I’m pretty sure is going to take AR longer
to compose.

On 22 Sep 2008, at 15:23, Fernando P. wrote:

gave
nice results that were around 0.1s on my good old PowerPC dev machine.
The old :include method (rails 1.2.x) took almost 1s to perform the
same
query.

I wrote the :include code in 2.1 because of performance problems I
had
with :include in 1.2.x.
Hi Fred, yeah the :include in 1.2.x was pretty lame. Do you have some
test results of the new :include versus custom :joins?

Those are almost by definition hard to come by: custom stuff is likely
to be specific to one app (and just some custom joins does not
accomplish the primary goal of :include (ie setup your associations).

Fred

On Sep 22, 10:30 pm, Frederick C. [email protected]
wrote:

Those are almost by definition hard to come by: custom stuff is likely
to be specific to one app (and just some custom joins does not
accomplish the primary goal of :include (ie setup your associations).

I agree. has_one and belongs_to associations can be piggy-backed
via :join and :select, but has_many associations cannot.

Are the seconds you mentioned DB fetch times? Or do they include AR
times? And what kind of associations are involved?

These were “completed” times.

If you were going to use a
join on them, the yield will be a cartesian product of:

20 depots x 1000 trucks x 1000 containers x 10 cranes = 200,000,000
rows on the average, which I’m pretty sure is going to take AR longer
to compose.

Your figures are misleading. You never display all the records at once,
you always pass a :conditions option to the query, don’t you? Even your
:include does, so what are you talking about?

Hi Erol,

I now perfectly understand your point. Thank you for clarifying.

I will definitely update my code from using :joins to the new :include.

However I am having some problems. Using the following find:

@order = Order.find(:all, :include => [:items, :products],
:conditions => [‘orders.id = ?’, session[:order_id]])

I get the following SQL:

SELECT * FROM orders WHERE (orders.id = 15)
SELECT items.* FROM items WHERE (items.order_id IN (15))
CACHE SELECT items.* FROM items WHERE (items.order_id IN (15))
SHOW FIELDS FROM products
SELECT * FROM products WHERE (products.id IN (1,2))
SELECT * FROM products WHERE (products.id = 2)
SELECT * FROM products WHERE (products.id = 1)

Why is Rails making unnecessary duplicate queries?

Woops, I made a typo, instead of [,] one should use { => }.

I will definitely update my code from using :joins to the new :include.

Just like what Fred said, it’s a case-to-case basis. But, uhmmm, it’s
your call I guess.

CACHE SELECT items.* FROM items WHERE (items.order_id IN (15))
SHOW FIELDS FROM products
SELECT * FROM products WHERE (products.id IN (1,2))
SELECT * FROM products WHERE (products.id = 2)
SELECT * FROM products WHERE (products.id = 1)

Why is Rails making unnecessary duplicate queries?

Try Order.find(session[:order_id], :include => [:items, :products])

There’s no way I know of to eliminate SHOW FIELDS FROM …, unless AR
gets updated.

On Sep 22, 11:57 pm, Fernando P. [email protected]
wrote:

These were “completed” times.

And in what type of association are they in?

If you were going to use a
join on them, the yield will be a cartesian product of:

20 depots x 1000 trucks x 1000 containers x 10 cranes = 200,000,000
rows on the average, which I’m pretty sure is going to take AR longer
to compose.

Your figures are misleading. You never display all the records at once,
you always pass a :conditions option to the query, don’t you? Even your
:include does, so what are you talking about?

Nope, those figures aren’t misleading. Those are actual records
fetched for a report, if I use traditional joins. Of course it won’t
be displayed on-screen, but AR has to compose the corresponding AR
objects given that many records. The conditions aren’t present since I
have a rake task scheduled via cron that handles archiving of data,
which means trucks, containers, and cranes for the past month aren’t
included in the resultset. It’s more like an incoming/outgoing report
that is generated weekly and monthly, used for billing and accounting
purposes.

What the new :include does is to break those into manageable SQL.

20 depots x 1000 trucks = 20,000, assuming again, an average of 1000
trucks per depot
20 depots x 1000 containers = 20,000
20 depots x 10 cranes = 200

All in all, AR will only need to compose from 40,200 results on the
average.

This is a sample of the report which needs to be generated:

  • Depot -
    • List of Trucks -
    • List of Containers -
    • List of Cranes -
  • Depot -
    • List of Trucks -
    • List of Containers -
    • List of Cranes -

If you can give me a :select :join construct (without resorting to
multiple SQL’s) which could efficiently generate that kind of report,
I’d be more than happy to use that.

On Sep 23, 1:05 am, Fernando P. [email protected]
wrote:

@order = Order.find(:all, :include => [:items, :products],
:conditions => [‘orders.id = ?’, session[:order_id]])

Plus, there’s a caveat here. Issuing:

Order.find(:all, include => [:items, :products], :conditions =>
“items.disabled IS NULL”)

Or referencing the any of the included tables in :conditions will
force AR to revert to the old eager-loading scheme.

Or referencing the any of the included tables in :conditions will
force AR to revert to the old eager-loading scheme.

No, it depends how you build the :include. I have yet to found how it
actually works. Sometimes I get the old way, sometimes I get the
multi-query, it really depends on how you pass the hash.

There’s no way I know of to eliminate SHOW FIELDS FROM …

You cannot, and you shouldn’t. This is how AR knows which attributes a
model has. I already tried in the past to by-pass AR, and I got more
problems than anything else, and the speed improvement was not that
great. Anyway these queries are fast and they are cached.

Using the following query:

@order = Order.find(:all, :include => [ :products, :items],
:conditions => [‘orders.id = ?’, session[:order_id]])

I get the correct multi SQL queries. But I cannot understand why when I
try to do:

<%- for item in @order.items -%>
<%= debug item %>
<%- end -%>

I get the following error message:

undefined method `items’ for #Array:0x2e9c224

How do I access the list of items associated to the order? Why is it an
array?

On 22 Sep 2008, at 19:06, Fernando P. wrote:

Or referencing the any of the included tables in :conditions will
force AR to revert to the old eager-loading scheme.

No, it depends how you build the :include. I have yet to found how it
actually works. Sometimes I get the old way, sometimes I get the
multi-query, it really depends on how you pass the hash.

Whenever AR thinks you are referencing a column from a table other
that the base table it reverts to the old style :include (that’s not
to say it gets it right all the time)

There’s no way I know of to eliminate SHOW FIELDS FROM …

You cannot, and you shouldn’t. This is how AR knows which attributes a
model has. I already tried in the past to by-pass AR, and I got more
problems than anything else, and the speed improvement was not that
great. Anyway these queries are fast and they are cached.

Yup, it;s only in dev mode that they are repeated anyway.