DATABASE Query performance

On 22 Sep 2008, at 19:15, Fernando P. wrote:


<%- for item in @order.items -%>
<%= debug item %>
<%- end -%>

I get the following error message:

undefined method `items’ for #Array:0x2e9c224

because @order is an array of orders. The items method exists only on
the elements of that array.

Fred

Whenever AR thinks you are referencing a column from a table other
that the base table it reverts to the old style :include (that’s not
to say it gets it right all the time)

In such case, one must check his generated query, and if such condition
happens, then probably one can manually create a “pre-query” to save AR
from trying to be be too clever.

because @order is an array of orders.
Ok I get it.

By the way thank you Fred for having written the new :include code,
because god knows how people have been able to live until now.

On Sep 23, 2:06 am, Fernando P. [email protected]
wrote:

Or referencing the any of the included tables in :conditions will
force AR to revert to the old eager-loading scheme.

No, it depends how you build the :include. I have yet to found how it
actually works. Sometimes I get the old way, sometimes I get the
multi-query, it really depends on how you pass the hash.

Actually, yes, it depends on whether you reference non-main - or
therefore included - tables in :conditions or :order.

Taken directly from
http://api.rubyonrails.com/classes/ActiveRecord/Associations/ClassMethods.html:

Since only one table is loaded at a time, conditions or orders cannot
reference tables other than the main one. If this is the case Active
Record falls back to the previously used LEFT OUTER JOIN based
strategy. For example

Post.find(:all, :include => [ :author, :comments ], :conditions =>
[‘comments.approved = ?’, true])

On Sep 23, 2:15 am, Fernando P. [email protected]
wrote:

<%= debug item %>
<%- end -%>

I get the following error message:

undefined method `items’ for #Array:0x2e9c224

How do I access the list of items associated to the order? Why is it an
array?

@order = Order.find(session[:order_id], :include =>
[:products, :items])

That’ll return a single Order instance instead of an array of Orders.

On Sep 23, 2:49 am, Fernando P. [email protected]
wrote:

Thanks Erol, it is 100% safe, or is it preferable to add the
:conditions?

I never know when Rails sanitizes the user input or not. Even if the
data comes from the session, I do not trust any input at all.

It looks safe; I’ve tried a bunch of SQL-injection-like params on it
but it always end up as WHERE (id = 0), so it’s safe to assume that
it is being sanitized.

20 depots x 1000 trucks x 1000 containers x 10 cranes = 200,000,000
rows on the average, which I’m pretty sure is going to take AR longer
to compose.

Woops I didn’t spot that in the first place. But you are talking about
200.000.000 rows over all the possibilities, but your figure as I said
is misleading, as not all depots hold all the containers at the same
time, this is not possible.

I was talking about my tables being:
products: 100 rows
order: 100.000 rows
items: 1.000.000 rows

using :joins keeps processing time under 1s, whereas the old :include
was around 1s.

So following your theory I would have: 100x100.000x1.000.000 = 10e13
possibilities which is obviously false.

:joins certainly owns :include, but the major drawback is that you
cannot have real objects as the result would be a franken crane
having a diesel engine and 500m2 of storage space and 5t of bananas.

I have talked to MySQL power users, and their advice is that having 1
single query with JOIN is faster than having multiple queries, but the
real answer, is “it depends on how much data your fetching each time”.

Here are my test results:

Product: 100 rows
Orders: 100.000 rows
Items: 1.000.000 rows

@orders = Order.find( :all, :include => { :items => :product },
:conditions => [‘user_id = ?’, session[:user_id]])

Takes: 0.04s of DB time

The equivalent with :joins, only takes 0.004s of DB time.

The big advantage of :joins, is that I can use :select to only fetch the
rows I need, whereas with :include I need to retrieve all the fields.
The drawback of :joins, is that I don’t have objects with relationships
as one could expect with :include, but I consider that having objects is
overwhelming. The goal is to present retrieved data from the DB to the
user or to a reporting system, therefore objects are not that important.

Anyway I would like to see Erol F. come with proof that :include
is faster than :joins.

My conclusion is :joins >> :include by a factor 5-10, and I would not
recommend any advice from the railsenvy.com as they claim that you can
abuse of :include. This is certainly not true and can make your app go
really slow.

Use :include cautiously and only when it is absolutely needed. I know
this is going slightly against the Rails Way of doing things, but I
don’t care as long as my webapp is blazing fast, and that’s possible
with Rails.

Have a nice day.

Thanks Erol, it is 100% safe, or is it preferable to add the
:conditions?

I never know when Rails sanitizes the user input or not. Even if the
data comes from the session, I do not trust any input at all.

On 23 Sep 2008, at 14:58, Fernando P. wrote:

Takes: 0.04s of DB time

The equivalent with :joins, only takes 0.004s of DB time.

That’s not the pathological case. the bad case is if orders had two
has_manys on it (ie you were doing :include =>
[:items, :something_else]) and with orders having quite a few of both.

It largely depends on what you are doing. If what is important is the
business intelligence in your models then it’s really handy to have
them. Or for the classic n+1 problem of displaying a list of orders
along with the items and the products (eg your customer’s account
history): you don’t want to be writing custom selects and what not for
something mundane like that.

Fred

Your setup is definitely puzzling me now. When I have some time, I will
do something similar and make my own tests. One day I might run into it,
so I want to be ready with the best solution :include VS :joins.

I see one possible area of improvement that you can quickly test though:
did you add indexes (I presume on depot_id) for both the trucks and
containers models? By default, Rails only creates indexes for the id
column, not for the referenced columns. JOINs are very sensitive to
indexes.

Can you make a quick test and report back here?

Even if my models have simple relationships, if I remove the indexes
then yes, it is going to become a nightmare, remember I am talking about
a 1M rows table here!

Apologies to the railsenvy.com guys, what they mean by “using and
abusing :include”, is that you can as I demonstrated, run into very long
queries. So they have good advice… so do I :stuck_out_tongue_winking_eye:

On Sep 24, 1:08 am, Fernando P. [email protected]
wrote:

Your setup is definitely puzzling me now. When I have some time, I will
do something similar and make my own tests. One day I might run into it,
so I want to be ready with the best solution :include VS :joins.

The setup is pretty normal. If I was going to do it via Crystal
Reports, I would have had to use sub-reports, which is going to hit
the db hard.

I see one possible area of improvement that you can quickly test though:
did you add indexes (I presume on depot_id) for both the trucks and
containers models? By default, Rails only creates indexes for the id
column, not for the referenced columns. JOINs are very sensitive to
indexes.

Can you make a quick test and report back here?

The benchmark I ran earlier had indexes on both primary keys and
foreign keys. It is actually recommended that you make indices for
foreign keys when using :include for large resultsets, otherwise MySQL
is going to have to do a table scan on a potentially large table.

Even if my models have simple relationships, if I remove the indexes
then yes, it is going to become a nightmare, remember I am talking about
a 1M rows table here!

Uhmmm, 1M rows is not large for some db folks. Look for my post at the
comp.database.mysql and see how most of the MySQL folks replied. I was
actually hoping to get tips on what to configure and if I should
already move it to a cluster but here is what I got instead:

“I work with databases that generate 5TB+/mth. To do some quick
calculations,
162M*30=4.86B records and depending on the record size - I would say
you have a
pretty healthy size database. Let’s say your nominal record length is
250
bytes, then you are storing ~1.22Tb /mth.” - Michael Austin

On Sep 23, 9:58 pm, Fernando P. [email protected]
wrote:

The equivalent with :joins, only takes 0.004s of DB time.

Unfortunately, your findings is also misleading in a sense that your
associations are simplistic compared to mine.

  • Fred mentioned that the bad case is orders having two has_manys,
    which orders doesn’t. My depot model, on the other hand, has 3
    has_manys.
  • My reporting requirements are more intensive than yours.
  • You didn’t even bother to post your :joins and :select construct.

The big advantage of :joins, is that I can use :select to only fetch the
rows I need, whereas with :include I need to retrieve all the fields.
The drawback of :joins, is that I don’t have objects with relationships
as one could expect with :include, but I consider that having objects is
overwhelming. The goal is to present retrieved data from the DB to the
user or to a reporting system, therefore objects are not that important.

Again, your model is too simplistic. Objects are going to be important
when you’re generating a report with a higher-degree of complexity to
what you’ve mentioned.

Anyway I would like to see Erol F. come with proof that :include
is faster than :joins.

Fair enough, here is my benchmark with the following dataset:

  • 10 Depots
  • 1000 Trucks per Depot
  • 1000 Containers per Depot

Don’t bother telling me that this is impossible because this is what’s
really happening; there are on average a hundred truck and a hundred
container movements per day per depot.

Depot.find(:all, :includes => [:trucks, :containers])
= 0.040000 + 1.482000 + 1.732000

Depot.find(:all, :select => “depots.id, depots.name,
trucks.plate_no”, :joins => “LEFT JOIN trucks ON trucks.depot_id =
depots.id”)
= 1.242000

Looks faster than :include, BUT, we haven’t included Containers yet,
so we’ll included it now to go with your 1-SQL-using-joins theme:

Depot.find(:all, :select => “depots.id, depots.name, trucks.plate_no,
containers.serial_no”, :joins => “LEFT JOIN trucks ON trucks.depot_id
= depots.id LEFT JOIN containers ON containers.depot_id = depots.id”)
= 20+ minutes and still going strong!

Now I would like to see Fernando P. make an efficient :select :join
for MULTIPLE has_many relationships (without relying on multiple
SQLs) which can generate the report below, given the dataset
conditions that I have given above.

Depot

  • List of Trucks -
  • List of Containers -
    Depot
  • List of Trucks -
  • List of Containers -

My conclusion is :joins >> :include by a factor 5-10, and I would not
recommend any advice from the railsenvy.com as they claim that you can
abuse of :include. This is certainly not true and can make your app go
really slow.

Again, it depends on the scenario. You’re basing your conclusion on a
simplistic model with a has_many → belongs_to chain. The
circumstances greatly change when you’re dealing with multiple
has_many relations.

Have a nice day.