Forum: Ruby on Rails how to insert a batch of records into db more efficiently

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Guanliang L. (Guest)
on 2007-05-21 15:44
I am currently using the standard AR method create to insert records
into oracle db. It seems that the create method generates an INSERT sql
command for each record, and execute them separately. I felt that it was
a bit slow.(For a record with about 50 columns, the insert speed is 25.3
records per second, and for a record with 4 columns, the insert speed is
115.25 records per second)

Since all the data have been stored in an array/hash (as the structure
in db) before the inserting, I wonder if there is a more efficient way
to write all the data into the db.(e.g. a table level command/an one-off
command to insert or update a table)

Is there anyone who is experienced in that field? I really appreciate
your help.


cheers
MichaelLatta (Guest)
on 2007-05-21 19:23
(Received via mailing list)
I would not say this is the best style (pretty ugly in fact), but for
bulk inserts I have gotten much better performance using a raw SQL
insert.

You can get the connection to the DB using the 'connection' method on
an AR class.  Then there are methods to execute raw SQL commands
documented under "database statements" module at
http://api.rubyonrails.org/classes/ActiveRecord/Co....

Using this you can insert multiple rows, or just avoid the AR overhead
for creating and dealing with objects.

On May 21, 4:44 am, Guanliang L. <removed_email_address@domain.invalid>
Ball, Donald A Jr (Library) (Guest)
on 2007-05-21 20:49
(Received via mailing list)
> more efficient way to write all the data into the db.(e.g. a
> table level command/an one-off command to insert or update a table)

You could either use the AR connection directly to write raw sql, or you
could try using the ActiveRecord Extensions module. I don't know that it
supports true batch insertion, but it does obviate the need to construct
a model object for each new row, which is where most of the cost of mass
imports seems to be. My mass imports went from hours to minutes once I
switched to it.

- donald
dasil003 (Guest)
on 2007-05-21 22:00
(Received via mailing list)
Yes, ar::extensions does do a true batch insert, it even supports
MySQL's ON DUPLICATE KEY UPDATE syntax, and it also verifies the size
of the query and breaks it down into individual queries as necessary
(ie. if you are inserting more than the MAX_PACKET_SIZE).  It also
does a lot of other cool stuff.  The hash-based condition extensions
via suffix are my favorite:

http://www.continuousthinking.com/tags/arext



On May 21, 10:47 am, "Ball, Donald A Jr (Library)"
Guanliang L. (Guest)
on 2007-05-22 17:55
Thanks a lot to all you guys :)

I will try the AR::extensions when I finish the current module. It
sounds  a neater way than writing raw sql directly.
Guanliang L. (Guest)
on 2007-05-22 19:35
OK, here is another quick question.

I have installed the AR-E using the following command

gem install ar-extensions

the prompted message said the installation was successful. but when I
try to invoke the *import* method, I got

NoMethodError in PasController#import
undefined method `import' for Paspatient:Class


I did exactly the same thing as the example showed in
http://www.continuousthinking.com/are/import. Is there any setups I
should do before using it? or do I need to add a *require '...'*
somewhere in my Paspatient class or in my controller?


Many thanks.
zdennis (Guest)
on 2007-06-08 07:54
(Received via mailing list)
I answered this in a private email, but in case someone searches
archives and has this problem. The issue was that import was being
called on the controller and not on the model. It must be called on
the model,

  ie: Book.import

See http://www.continuousthinking.com/tags/arext  for more information

Zach

On May 22, 11:35 am, Guanliang L. <removed_email_address@domain.invalid>
EP (Guest)
on 2007-06-15 11:34
Hi, I don't know if anyone can help me but I was trying to do a batch
insert but got an error saying that there was a problem with the
generated SQL.
Anyone know how to fix this?

<2007-06-15 00:10:46><DEBUG>   SQL (0.012701)   INSERT INTO numbers
(`num1`,`num2`) VALUES
(509081924,1620644),(509081924,500231348),(509081924,509659840),(509081924,524396516),(509081924,528220398),(509081924,534085839),(509081924,547781856),(509081924,573547171),(509081924,574312014),(509081924,577805238),(509081924,582805398),(509081924,589756118),(509081924,607286192),(509081924,607425581),(509081924,619176048),(509081924,620010268),(509081924,634632018),(509081924,642685358),(509081924,657185365),(509081924,692756207),(509081924,692790495),(509081924,704706222),(509081924,771555161),(509081924,772265233),(509081924,791020130),(509081924,898075257)
<2007-06-15 00:10:46><DEBUG>   Review Load (0.000000)   Mysql::Error:
#42000You have an error in your SQL syntax; check the manual that
corresponds to your MySQL server version for the right syntax to use
near '=> 509081924,1620644) LIMIT 1' at line 1: SELECT * FROM reviews
WHERE (user_id => 509081924,1620644) LIMIT 1



zdennis wrote:
> I answered this in a private email, but in case someone searches
> archives and has this problem. The issue was that import was being
> called on the controller and not on the model. It must be called on
> the model,
>
>   ie: Book.import
>
> See http://www.continuousthinking.com/tags/arext  for more information
>
> Zach
>
> On May 22, 11:35 am, Guanliang L. <removed_email_address@domain.invalid>
zdennis (Guest)
on 2007-07-11 06:13
(Received via mailing list)
Can you send me the import code you had written that produced this
error? Email me privately, thanks,

Zach
Karthi k. (Guest)
on 2007-12-07 08:11
zdennis wrote:
> Can you send me the import code you had written that produced this
> error? Email me privately, thanks,
>
> Zach

Can any one show the sample code here for how to use the
ActiveRecord::extensions methods?

It will be useful for every one to get some idea on that.
Michael G. (Guest)
on 2010-03-26 18:31
My friend Jonah wrote Crewait, which is easier and faster than
ARExtensions for bulk insertion.

http://www.jonah.org/articles/crewait_go_.html
Fitz F. (Guest)
on 2010-04-01 19:50
I have been facing this problem for a while now.  ar-extensions is
really good but it doesn't preserve the links between models like
one-to-one many-to-many etc.

I am trying to insert ~1M records daily.  ActiveRecord hasn't a hope of
doing this in a reasonable time.

I am hoping that I am using the plugin in the wrong way.  Anyone else
have similar problems?

Cheers,
Patrick
Josh _ (Guest)
on 2010-04-01 23:26
(Received via mailing list)
On Thu, Apr 1, 2010 at 8:50 AM, Fitz F. <removed_email_address@domain.invalid> 
wrote:

> I have been facing this problem for a while now.  ar-extensions is
> really good but it doesn't preserve the links between models like
> one-to-one many-to-many etc.
>

Have you tried doing batches via transactions? By default activerecord
will
make every save operation it's own transaction which adds overhead to
the DB
in many setups. See ActiveRecord::Transactions::ClassMethods, and put
ten or
a hundred records at a time in a transaction wrapper and see what you
gain.
The improvement you see from this depends on the DB engine in use as
well as
how much the DB is actually the bottleneck versus the processing you're
doing application-side per record. I suspect your problem is actually a
combination of both DB overhead and unoptimized application logic. You
might
want to try some profiling of your application to see where the
bottleneck
really is - at the scale of 1M operations per day it's worth doing some
of
that.

That's my 2 cents, the disclaimer is that I am by no means a rails
expert.
Just trying to help out where I can.

jsw
Fernando F. (Guest)
on 2011-05-30 05:02
Josh _ wrote in post #901845:
> On Thu, Apr 1, 2010 at 8:50 AM, Fitz F. <removed_email_address@domain.invalid> wrote:
>
>> I have been facing this problem for a while now.  ar-extensions is
>> really good but it doesn't preserve the links between models like
>> one-to-one many-to-many etc.
>>
>
> Have you tried doing batches via transactions? By default activerecord
> will
> make every save operation it's own transaction which adds overhead to
> the DB
> in many setups. See ActiveRecord::Transactions::ClassMethods, and put
> ten or
> a hundred records at a time in a transaction wrapper and see what you
> gain.

A m a z i n g !

I had outstanding improvements !

Just wrap your bulk create like this:
Model.transaction do
  something.each {
        m = create(...)
        m.save
  }
end

And the best thing is you dont loose your validations.

Thanks a lot,
Fernando.
This topic is locked and can not be replied to.