Inserting A Large Amount of Records into DB using Rake, Ruby

I have a text file with a couple hundred thousand rows that I use ruby
to massage and then active record to insert each row after being
massaged into the DB. Now… inserting a couple hundred thousand rows
using ActiveRecord and ruby is crazy. I know.

It was beautiful to write my rake task to do everything for me but now
the problem is time. For 1,000 records it is taking me on average
1:00 minute to process. I have an old PowerBook 867.

Now, I noticed when looking at the log file that the COMMIT opperation
is what is taking so much time. I was wondering if I can extend the
commit so the commit is done at the end or every X amount of updates?

Or any other suggestions on how to speed this up?

Thanks my friends :-).


John K.
[email protected]

http://www.kopanas.com

http://www.soen.info

You might want to consider exporting to a file which can be bulk
loaded into the DB. We are dealing with data sources which have tens
of millions of rows and bulk loading is the only way to effeciently
load it.

If you absolutely want to manage the commit manually you can, via
ActiveRecord::Base.connection methods:

begin_db_transaction
commit_db_transaction
rollback_db_transaction

Or you may be able to wrap the inserts in a transaction block.

Still, I’d highly recommend considering bulk load.

V/r
Anthony E.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John K. wrote:

is what is taking so much time. I was wondering if I can extend the
commit so the commit is done at the end or every X amount of updates?

Or any other suggestions on how to speed this up?

ActiveRecord::Extensions (for MySQL) handles bulk inserts using
multi-value insert statements. It goes up to 40x faster then current
ActiveRecord behavior.

API looks like:
MyModel.import( columns, array_of_value_sets, options_hash )

Documentation on it can be found here:
http://www.continuousthinking.com/are/import

Here are some MySQL benchmarks for current ActiveRecord#create vs.
ActiveRecord::Extensions#import comparing MyISAM to InnoDb to Memory
tables:
http://www.continuousthinking.com/are/import-benchmarks-for-mysql

You can download the latest version from:
http://rubyforge.org/projects/arext/

What db adapter do you use? I am currently adding PostgreSQL support
over this holiday weekend.

Zach

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFZjJ7Myx0fW1d8G0RAt33AJ4tpTNptf5jDzK/JxmAkmWAWyRoIgCaAoiM
Ha5cR5ZOPzmaNpTlxeZEHQY=
=XPVC
-----END PGP SIGNATURE-----