ActiveRecord.find.each do

I have noticed that when I use the construct…

ActiveRecordChild.find(:all).each do |record|
record.do_something
end

That it still returns the array of all records. With larger datasets
this can be murder on memory…
I was trying to do this with one of my migrations and my computer
choked after the migration had consumed 1.8GB of memory.
I ended up having to rewrite the migration as…

max = ActiveRecordChild.maximum(:id)
1.upto(max).each do |i|
if record = ActiveRecordChild.find_by_id(i)
record.do_something
end
end

What I would like to propose is having find accept a block like this.

ActiveRecordChild.find(:all) do |record|
record.do_something
end

where instead of even composing a array of ActiveRecord objects, it
would just pass in to the block each record as it gets it processed,
and then forget about it as soon as the block exits.

A possible additional feature might be to collect the results of the
block like a ‘collect’ call and possibly even omitting that when the
block returns something that evaluates as false ( false or nil ). Over
all however, I think for memory considerations, I would rather have
the form of find with a block return nothing, as if I really want to
collect the results, I can alwase push something to an array.


-Robert Ferney ( Kolbe 4357 Demonstrator / Myer Brigs INTJ )

On Sep 4, 2008, at 12:59 PM, Robert Ferney wrote:

choked after the migration had consumed 1.8GB of memory.

block returns something that evaluates as false ( false or nil ). Over
all however, I think for memory considerations, I would rather have
the form of find with a block return nothing, as if I really want to
collect the results, I can alwase push something to an array.


-Robert Ferney ( Kolbe 4357 Demonstrator / Myer Brigs INTJ )

That would cause a query for each record and performance would very
likely suffer. When I’ve had to do a similar thing over a table with
many (100,000+) records, I’ve done something like:

total = Model.count(to_refresh)
limit = [ 100, total ].min
0.step(total-1, limit) do |offset|
Model.find(:all, :limit => limit, :offset => offset).each do |model|
# do stuff
end
end

If you have a condition that limits what comes back, you might have to
tweak the offset if the “stuff” you do causes records to fall out of
the condition.

Of course, if the “do_something” is simple enough you can
use .update_all (but that’s a small subset of all the things that you
could do).

-Rob

Rob B. http://agileconsultingllc.com
[email protected]

If you’re using will_paginate (why wouldn’t you be using it anyway)
you can just call:

Image.paginated_each( :per_page => 20, :conditions => { :cached =>
false }, :order => ‘created_at asc’) do |image|
#do something with your image here
end

The paginated_each method will automatically paginate your objects so
you won’t have to load them all.

On Thu, Sep 4, 2008 at 4:13 PM, Rob B.
[email protected] wrote:

end
and then forget about it as soon as the block exits.
-Robert Ferney ( Kolbe 4357 Demonstrator / Myer Brigs INTJ )
end
-Rob

Rob B. http://agileconsultingllc.com
[email protected]


Maurício Linhares
http://alinhavado.wordpress.com/ (pt-br) | http://blog.codevader.com/
(en)
João Pessoa, PB, +55 83 8867-7208

On 4 Sep 2008, at 18:59, “Robert Ferney” [email protected] wrote:

I have noticed that when I use the construct…

ActiveRecordChild.find(:all).each do |record|
record.do_something
end

That it still returns the array of all records.
It’s just a normal array on which you call find. What each returns is
immaterial - as far as memory concerns go it’s already too late.

end
end

Yuck. Would have been faster to fetch them in chunks.

A possible additional feature might be to collect the results of the
block like a ‘collect’ call and possibly even omitting that when the
block returns something that evaluates as false ( false or nil ). Over
all however, I think for memory considerations, I would rather have
the form of find with a block return nothing, as if I really want to
collect the results, I can alwase push something to an array.

Do the database adapters allow you to page through results before
they’ve received them all?