Updating a Field in All Records of a Model, Efficiently?

Hello Everyone,

I have a Model for which there are many thousands (hundreds of
thousands) of records. I’d like to update the value of one field for
each record in a rake task.

A ruby based function must be applied to this field for all records
(i.e. can’t perform the algorithm in an SQL UPDATE alone).

First Thought:
MyModel.find(:all).each { |m| m.my_field = function_call_here;
m.save }
Load all the models into memory, and I’d like to avoid that.

Second Thought:
Get the maximum “id” of the Model’s records and program a loop, doing
a “find_by_id” for each record.
While the model does have an “id” field, some records have been
deleted, so I would have to check that the “find_by_id” is nil…

I’m not worried about new records being added to the database as this
update will be run during a “maintenance” period.

Anyone have any thoughts on how you would update every record for a
model when you have a large number of records?

Thank you for your time and help.

-Jim

I may not understand your second thought, but I have a hard time
imagining it will be faster than the loop over all records in your first
thought. Since you’ve got to retrieve the record & make it an AR
instance in any case, you may as well not mess around guessing IDs…

Are you sure you can’t translate your ruby function to SQL? That’s
going to be loads faster… Can you break your records up into chunks
that should all get the same value of my_field & do a series of subset
UPDATEs maybe?

On Sep 19, 2008, at 6:55 PM, Pardee, Roy wrote:

I have a Model for which there are many thousands (hundreds of
Second Thought:

Thank you for your time and help.

-Jim

Jim,

When I’ve had to do a similar thing over a table with
many (100,000+) records, I’ve done something like:

total = Model.count
limit = [ 100, total ].min
0.step(total-1, limit) do |offset|
Model.find(:all, :limit => limit, :offset => offset).each do |model|
# do stuff
end
end

If you have a condition that limits what comes back, you might have to
tweak the offset if the “stuff” you do causes records to fall out of
the condition.

Roy’s idea of a series of Model.update_all() calls for each unique set
of whatever goes into your function sounds like a winner if the
function is predictable enough.

-Rob

Rob B. http://agileconsultingllc.com
[email protected]