Find(:all) - lazy/eager

Hello experts,

Given some simple model and controller and simple views, what happens if
I have an Employee table with 100,000+ rows? If my controller calls
.find(:all), then when do all the objects get loaded? I know there are
issues with lazy and eager loading.

The reason I ask is because if I load the Employees into a collection
inside the script/console, then the console echos the collection to the
screen and you can see all the objects and all the “has_many” objects.
At this point, the system loaded everything.

Now, what happens in the controller if my views do not use the
collection variable?

Does the application detect that I never referenced the collection
(other than when I specified that the .find(:all) method should be
used)? Does the application never load the collection?

Philip D. wrote:

Hello experts,

Given some simple model and controller and simple views, what happens if
I have an Employee table with 100,000+ rows? If my controller calls
.find(:all), then when do all the objects get loaded? I know there are
issues with lazy and eager loading.

The reason I ask is because if I load the Employees into a collection
inside the script/console, then the console echos the collection to the
screen and you can see all the objects and all the “has_many” objects.
At this point, the system loaded everything.

Now, what happens in the controller if my views do not use the
collection variable?

Does the application detect that I never referenced the collection
(other than when I specified that the .find(:all) method should be
used)? Does the application never load the collection?

If you assign all of your employees to a variable they will be stored in
RAM. Until that variable gets reassigned or the page finishes loading it
will take up your RAM, regardless if you use it or not. If you just call
find(:all) without assigning it to a variable then it will take up RAM
as it executes, once it finishes executing it will free itself from RAM.

My advice is to only load things you wish to use, obviously. Also don’t
load 100,000 rows into a variable because it will kill your ram.

Ben J. wrote:

If you assign all of your employees to a variable they will be stored in
RAM. Until that variable gets reassigned or the page finishes loading it
will take up your RAM, regardless if you use it or not. If you just call
find(:all) without assigning it to a variable then it will take up RAM
as it executes, once it finishes executing it will free itself from RAM.

My advice is to only load things you wish to use, obviously. Also don’t
load 100,000 rows into a variable because it will kill your ram.

Thanks Ben for the input. This is exactly what is making me question
the “loading” of rails models. I feel like I have been tricked by the
Rails community into thinking that the filtering/loading/finding methods
in Rails are object oriented. (Of course you need to know a little
something about relational databases.) However, If you want to load a
subset of objects based on other objects owned by the objects, then you
can not do it without using SQL or join models or polymorphic code. The
join models are cool and nice but they are not exactly feeling like
first rate objects.

I know this is a cumbersome argument but please understand I would like
to know the trade-offs involved with the different approaches for
loading when using:

  1. use the simple single model dynamic finders
  2. use some sql
  3. use join models

I was trying to understand the first approach and how to load a subset
of parent models based on some value in it’s child model. I can’t seem
to figure out how to do this without resorting to steps 2 or 3.

To put it another way, no matter how clever I think I can be, I still
end up doing some post collection load processing. It feels like rails
is tricking us into thinking that it does not always load every row.
However, when addressing non-simple models, I feel like I am always
forced to load all rows and then do post-processing.

Interestingly, there seems to be a relationship between the
filtering/loading styles and the combination of polymorphic and join
model features. For websites of any serious use, there will be lots of
parent models which own child models and there will be times when I want
a subset of parent models based on the child model values. I do not
think there is a way to address these parent-child type filters without
using sql or join models or polymorphic type objects.

Thanks,

Eric A. wrote:
ActiveRecord is a thin wrapper around the relational model. There are
other ORMs that operate at a higher level although I find that they
often don’t give you the low-level access that you occasionally need. I
think Rails reaches a nice compromise of being a thin wrapper with some
high-level features (such as dynamic finders and :include), but also
providing complete low-level access.


Wow, Eric. You nailed everything in place. That last paragraph you
wrote is the gem of information I was looking for. I appreciate your
very generous response. I have read the Agile Rails book. Sadly the
book fails to re-iterate the point you made. Also, it is easy to forget
(as I did) that everything eventually boils down to find_by_sql.

Thanks for clearing up my head.

Philip

Philip D. wrote:

The reason I ask is because if I load the Employees into a collection
inside the script/console, then the console echos the collection to the
screen and you can see all the objects and all the “has_many” objects.
At this point, the system loaded everything.

I’m not sure I understand what you are doing here. What is the
“has_many” objects you are referring to? You have an employee model:

class Employee < ActiveRecord::Base
end

If also said that you are trying to search based on a related object (in
your other email). So I assume you have something like:

class Employee < ActiveRecord::Base
has_many :tasks, :dependent => :destroy
end
class Task < ActiveRecord::Base
belongs_to :employee
end

Now you want to load all employees that have high priority tasks. So you
do:

@lazy_employees = Employee.find :all, :include => ‘tasks’,
:conditions => [‘tasks.priority = ?’, ‘high’]

Now your complaint is that the tasks that meet the conditions are loaded
with the above code which you feel is a bit wasteful? Am I understanding
your problem correctly?

If that is the case there are a few things you can do:

  1. Make sure it really matters. Does that extra loading of data actually
    cause a performance problems? If not don’t worry about it. The above
    code is simple to understand. If it is an actual performance problem
    then proceed.

  2. You could switch from using :include to :joins. This allows you to
    join without loading the actual related objects. So:

@lazy_employees = Employee.find :all,
:joins => %q{INNER JOIN tasks ON
tasks.employee_id = employees.id AND
tasks.priority = ‘high’}

You are removing the convienance of what :include provides (automatic
joining and automatic object loading of the related object) but you gain
more control. Your join is now a inner join instead of an outer join
which is much more efficient. Also you are not loading those related
objects.

  1. If you want to keep the loading of related objects (for example if
    you use those objects in your code and want to avoid N+1 query problem)
    you can also improve the first attempt using the :select option so only
    the fields you actually use are loaded. For example:

@lazy_employees = Employee.find :all, :include => ‘tasks’,
:select => ‘employees.name, task.name, tasks.due_on’,
:conditions => [‘tasks.priority = ?’, ‘high’]

This limits you to only being able to print the employee name, task name
and due date but it saves on the amount of data that has to be loaded.

  1. If you don’t care about related objects and only want certain fields
    you can combine step 2 and 3 to produce:

@lazy_employees = Employee.find :all,
:select => ‘employees.name’,
:joins => %q{INNER JOIN tasks ON
tasks.employee_id = employees.id AND
tasks.priority = ‘high’}

Now you are only loading the data you need, using an INNER JOIN to
reduce the load in SQL and not loading the related objects that you
don’t plan to print anything out about. But obviously now it is much
uglier. If you really need the performance try to push the code down to
the model level so your controllers and view remain easy to work with.
See Buckblog: Skinny Controller, Fat Model
for more info on this process.

In the end everything resolves to find_by_sql. Depending on your needs
you might be able to operate at a very high level like the first
attempt. Or you may need to operate almost at the SQL-level like the
last attempt. You always still have the option of operating completely
at sql if you need absolute control. Determining at what level to
operate at is determined by a balance of performance and code
readability. I have found in practice I rarely use actual SQL code. Most
my my queries look like:

Employee.find_by_last_name(‘Anderson’)
Employee.find :all, :include => :departments,
:conditions => [‘departments.name = ?’, ‘IT’]

But every app will have a few queries that get closer to SQL. I have
developed around 10 Rails applications thus far and never had a need for
find_by_sql although I don’t rule it out.

ActiveRecord is a thin wrapper around the relational model. There are
other ORMs that operate at a higher level although I find that they
often don’t give you the low-level access that you occasionally need. I
think Rails reaches a nice compromise of being a thin wrapper with some
high-level features (such as dynamic finders and :include), but also
providing complete low-level access.

Eric

Philip D. wrote:

Wow, Eric. You nailed everything in place. That last paragraph you
wrote is the gem of information I was looking for. I appreciate your
very generous response. I have read the Agile Rails book. Sadly the
book fails to re-iterate the point you made. Also, it is easy to forget
(as I did) that everything eventually boils down to find_by_sql.

In addition to the article on skinny controller, fat model Jamis also
has some excellent “under the hood” articles.

http://weblog.jamisbuck.org/under-the-hood

The articles about ActiveRecord::Base.find you might find to be useful.

Eric