ActiveRecord row-by-row processing


#1

I want to use ActiveRecord outside of Rails to do processing on large
tables. I want to be able to process the rows as they are retrieved
rather than having to store the results in an intermediate array.

Any suggestions?

Looks like I could hack the adapter to provide an accessor to the
underlying DBI handle. Anything cleaner than that?


#2

Check out
http://wiki.rubyonrails.com/rails/pages/HowToUseActiveRecordOutsideRails,
http://wiki.rubyonrails.com/rails/pages/HowToPopulateYourDbFromScript
and
http://wiki.rubyonrails.com/rails/pages/HowToUseActiveRecordThroughStandAloneScript
for some suggestions that might help.

Regards

Dave M.


#3

Once I needed to export a big table to CVS file so I did it with this
soultion:

render :text => Proc.new { |response, output|
    # Connect to database and start generation of CSV.
    # First we send a first row, which is sorted field names, then

we write
# a row for each group of same response_id. Everything is done
by streaming
# into the output
copt = ActiveRecord::Base.configurations[RAILS_ENV]
db = Mysql.real_connect(copt[‘host’], copt[‘username’],
copt[‘password’], copt[‘database’])
mysql_result = db.query(‘SELECT * FROM table ORDER BY id’)

    while r = mysql_result.fetch_row
        # use r[1], r[2], r[3] as columns here
        # output.write(r[1] + "\n")
    end
    db.close
}

The above is used with MySQL, for other DB engine you need to use
different adapter.


Kamil


#4

If you want to do this in ActiveRecord, you wont have much luck. I
worked on some code to provide functionality like you want but there are
multiple problems.

class CoolBob < ActiveRecord::Base
end

CoolBob.each do |bob|
// DO something with each record (as it is returned from the adapter)
end

In this implementation, only 1 CoolBob object is created, and its
@attributes are updated with each row fetch. The problem here is that
Ruby being a GC’d language, it doesnt save you anything OTHER than
object creation time since it doesnt create a new object with each row.
It doesn’t save on any memory use really. Each @attributes (which is the
row data as an array) will still use memory until collected. So not much
different than creating all the objects up front.

Also, the Ruby MySQL adapter (as well as others) doesn’t support
unbuffered queries which is where you would get your speed increase and
judicial use of memory from anyways. So in short, what you want to do
isn’t really possible using AR, but is possible using the method in the
previous post, even though that still buffers the complete result set
before being given access to it.

Bob S.
http://www.railtie.net/