Forum: Ruby on Rails Fast Bulk Inserts?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Steve S. (Guest)
on 2005-12-15 21:34
(Received via mailing list)
I need to insert ~100k rows into the database, but using
"Table.new(...).save"
is painfully slow (I notice in the logs that it's wrapping each insert
in a
transaction -- that can't be fast).

Any suggestions on how I can improve the speed of bulk inserts like
this?

Thanks ...

-- Steve
Pat M. (Guest)
on 2005-12-15 21:37
(Received via mailing list)
I don't know if this is the good option, but based on the fact that it
wraps each insert in a transaction, you could just start a
transaction, insert all the rows, and then commit.  Is that going to
be significantly faster?  I'm not sure.

Pat
Obie F. (Guest)
on 2005-12-15 21:43
(Received via mailing list)
On 12/15/05, Pat M. <removed_email_address@domain.invalid> wrote:
> I don't know if this is the good option, but based on the fact that it
> wraps each insert in a transaction, you could just start a
> transaction, insert all the rows, and then commit.  Is that going to
> be significantly faster?  I'm not sure.
>
> Pat

+1 It will be significantly faster in my experience. I used a REXML
and ActiveRecord ruby script to do a mass import of XML data into one
of my app's tables back in June. Not necessarily the fastest
performance, but quick and easy to write. Probably faster now since AR
has had some performance optimizations since then.
Steve S. (Guest)
on 2005-12-16 00:49
(Received via mailing list)
Pat M. wrote:
> I don't know if this is the good option, but based on the fact that it
> wraps each insert in a transaction, you could just start a
> transaction, insert all the rows, and then commit.  Is that going to
> be significantly faster?  I'm not sure.

Yup, had the same thought, and after some benchmarking, wrapping the
whole thing
in a single transaction gives about a %22 speed increase.  However, I'm
still
only (?) getting about 186 rows/sec insert velocity on MySQL/InnoDB.
Any other
ideas?

Also, I'm wondering if this method will work on DBs that don't support
transactions, i.e. MySQL/MyISAM.  Or are we not bothering to support
those silly
people? ;)

-- Steve
Andreas S. (Guest)
on 2005-12-16 01:05
Steve S. wrote:
> Pat M. wrote:
>> I don't know if this is the good option, but based on the fact that it
>> wraps each insert in a transaction, you could just start a
>> transaction, insert all the rows, and then commit.  Is that going to
>> be significantly faster?  I'm not sure.
>
> Yup, had the same thought, and after some benchmarking, wrapping the
> whole thing
> in a single transaction gives about a %22 speed increase.  However, I'm
> still
> only (?) getting about 186 rows/sec insert velocity on MySQL/InnoDB.
> Any other
> ideas?

Try disabling indexes.
Steve S. (Guest)
on 2005-12-16 01:17
(Received via mailing list)
Andreas S. wrote:
>> However, I'm stillonly (?) getting about 186 rows/sec insert velocity on MySQL/InnoDB.
>> Any other ideas?
>
> Try disabling indexes.

Uh, how?  Short of dropping/adding them, that is.

-- Steve
Andreas S. (Guest)
on 2005-12-16 01:36
Steve S. wrote:
> Andreas S. wrote:
>>> However, I'm stillonly (?) getting about 186 rows/sec insert velocity on MySQL/InnoDB.
>>> Any other ideas?
>>
>> Try disabling indexes.
>
> Uh, how?  Short of dropping/adding them, that is.

MySQL has a function for it. I can't tell you off-hand how it's called,
you have to look it up in the manual.
Jason E. (Guest)
on 2005-12-16 01:56
(Received via mailing list)
Steve S. wrote:

> Andreas S. wrote:
>
>>> However, I'm stillonly (?) getting about 186 rows/sec insert
>>> velocity on MySQL/InnoDB. Any other ideas?
>>
>>
>> Try disabling indexes.
>
>
> Uh, how?  Short of dropping/adding them, that is.

drop, then adding indexes is correct. That's the only way I know how on
mysql and postgresql.

Jason
snacktime (Guest)
on 2005-12-16 02:14
(Received via mailing list)
On 12/15/05, Steve S. <removed_email_address@domain.invalid> wrote:
> I need to insert ~100k rows into the database, but using "Table.new(...).save"
> is painfully slow (I notice in the logs that it's wrapping each insert in a
> transaction -- that can't be fast).
>
> Any suggestions on how I can improve the speed of bulk inserts like this?

Is using the native database tools an option, or do you need to use
ActiveRecord?  Both Postresql and Mysql have fast methods for bulk
loading data.

Chris
Jeroen H. (Guest)
on 2005-12-16 12:01
(Received via mailing list)
snacktime wrote:
> ActiveRecord?  Both Postresql and Mysql have fast methods for bulk
> loading data.

Does anybody know if ActiveRecord automatically takes advantage of
Postgresql prepared statements?

http://www.postgresql.org/docs/8.1/interactive/sql...

Jeroen
Nicholas S. (Guest)
on 2005-12-17 07:47
(Received via mailing list)
On 15-Dec-05, at 2:33 PM, Steve S. wrote:

> I need to insert ~100k rows into the database, but using "Table.new
> (...).save" is painfully slow (I notice in the logs that it's
> wrapping each insert in a transaction -- that can't be fast).
>
> Any suggestions on how I can improve the speed of bulk inserts like
> this?

It's pretty easy to add a class method #batch_insert to your AR
class, or to all classes. If you are using mysql, the syntax is
VALUES (...) (..) etc.

I suggest saving a few thousand records every statement; the string
for 100k records would get pretty big. :-)
Steve S. (Guest)
on 2005-12-17 14:42
(Received via mailing list)
Jason E. wrote:
> Steve S. wrote:
>> Andreas S. wrote:
 >>>
>>> Try disabling indexes.
 >
> drop, then adding indexes is correct. That's the only way I know how on
> mysql and postgresql.

Dropping the indexes seems to have a negligable impact (5:23 instead of
5:29),
but that's not too surprising, as the the table only has 3 columns (and
the
index is on two of them).

Any other ideas?  E.g., is there any way to insert multiple rows with a
single
SQL statement?  I know MySQL will do it, but PgSQL won't, and I don't
think it's
ANSI anyway.

-- Steve
Phillip H. (Guest)
on 2005-12-17 14:42
(Received via mailing list)
On 16/12/2005, at 12:51 PM, Steve S. wrote:

> Any other ideas?  E.g., is there any way to insert multiple rows
> with a single SQL statement?  I know MySQL will do it, but PgSQL
> won't, and I don't think it's ANSI anyway.

PostgreSQL has a handy COPY command which imports TSV or CSV data,
either from stdin or a file:
COPY my_table(column1, column2) FROM STDIN WITH CSV;
"thing","stuff"
"foo","bar"

etc.

See http://www.postgresql.org/docs/8.1/interactive/sql-copy.html for
more details


--
Phillip H.
removed_email_address@domain.invalid
Zach D. (Guest)
on 2005-12-17 14:42
(Received via mailing list)
Steve S. wrote:
>  >
> single SQL statement?  I know MySQL will do it, but PgSQL won't, and I
> don't think it's ANSI anyway.
>

Are you still using TableModel.new for each record? If so, this is
incredibly slow.

If you are using MySQL contact me offlist if you want my active record
extension (which only works for mysql right now) which increases the
speed 100x when inserting large sets of data into mysql.

Zach
This topic is locked and can not be replied to.