Fast Bulk Inserts?


#1

I need to insert ~100k rows into the database, but using
“Table.new(…).save”
is painfully slow (I notice in the logs that it’s wrapping each insert
in a
transaction – that can’t be fast).

Any suggestions on how I can improve the speed of bulk inserts like
this?

Thanks …

– Steve


#2

I don’t know if this is the good option, but based on the fact that it
wraps each insert in a transaction, you could just start a
transaction, insert all the rows, and then commit. Is that going to
be significantly faster? I’m not sure.

Pat


#3

On 12/15/05, Pat M. removed_email_address@domain.invalid wrote:

I don’t know if this is the good option, but based on the fact that it
wraps each insert in a transaction, you could just start a
transaction, insert all the rows, and then commit. Is that going to
be significantly faster? I’m not sure.

Pat

+1 It will be significantly faster in my experience. I used a REXML
and ActiveRecord ruby script to do a mass import of XML data into one
of my app’s tables back in June. Not necessarily the fastest
performance, but quick and easy to write. Probably faster now since AR
has had some performance optimizations since then.


#4

Pat M. wrote:

I don’t know if this is the good option, but based on the fact that it
wraps each insert in a transaction, you could just start a
transaction, insert all the rows, and then commit. Is that going to
be significantly faster? I’m not sure.

Yup, had the same thought, and after some benchmarking, wrapping the
whole thing
in a single transaction gives about a %22 speed increase. However, I’m
still
only (?) getting about 186 rows/sec insert velocity on MySQL/InnoDB.
Any other
ideas?

Also, I’m wondering if this method will work on DBs that don’t support
transactions, i.e. MySQL/MyISAM. Or are we not bothering to support
those silly
people? :wink:

– Steve


#5

Andreas S. wrote:

However, I’m stillonly (?) getting about 186 rows/sec insert velocity on MySQL/InnoDB.
Any other ideas?

Try disabling indexes.

Uh, how? Short of dropping/adding them, that is.

– Steve


#6

Steve S. wrote:

Pat M. wrote:

I don’t know if this is the good option, but based on the fact that it
wraps each insert in a transaction, you could just start a
transaction, insert all the rows, and then commit. Is that going to
be significantly faster? I’m not sure.

Yup, had the same thought, and after some benchmarking, wrapping the
whole thing
in a single transaction gives about a %22 speed increase. However, I’m
still
only (?) getting about 186 rows/sec insert velocity on MySQL/InnoDB.
Any other
ideas?

Try disabling indexes.


#7

Steve S. wrote:

Andreas S. wrote:

However, I’m stillonly (?) getting about 186 rows/sec insert velocity on MySQL/InnoDB.
Any other ideas?

Try disabling indexes.

Uh, how? Short of dropping/adding them, that is.

MySQL has a function for it. I can’t tell you off-hand how it’s called,
you have to look it up in the manual.


#8

On 12/15/05, Steve S. removed_email_address@domain.invalid wrote:

I need to insert ~100k rows into the database, but using “Table.new(…).save”
is painfully slow (I notice in the logs that it’s wrapping each insert in a
transaction – that can’t be fast).

Any suggestions on how I can improve the speed of bulk inserts like this?

Is using the native database tools an option, or do you need to use
ActiveRecord? Both Postresql and Mysql have fast methods for bulk
loading data.

Chris


#9

Steve S. wrote:

Andreas S. wrote:

However, I’m stillonly (?) getting about 186 rows/sec insert
velocity on MySQL/InnoDB. Any other ideas?

Try disabling indexes.

Uh, how? Short of dropping/adding them, that is.

drop, then adding indexes is correct. That’s the only way I know how on
mysql and postgresql.

Jason


#10

snacktime wrote:

ActiveRecord? Both Postresql and Mysql have fast methods for bulk
loading data.

Does anybody know if ActiveRecord automatically takes advantage of
Postgresql prepared statements?

http://www.postgresql.org/docs/8.1/interactive/sql-prepare.html

Jeroen


#11

Jason E. wrote:

Steve S. wrote:

Andreas S. wrote:

Try disabling indexes.

drop, then adding indexes is correct. That’s the only way I know how on
mysql and postgresql.

Dropping the indexes seems to have a negligable impact (5:23 instead of
5:29),
but that’s not too surprising, as the the table only has 3 columns (and
the
index is on two of them).

Any other ideas? E.g., is there any way to insert multiple rows with a
single
SQL statement? I know MySQL will do it, but PgSQL won’t, and I don’t
think it’s
ANSI anyway.

– Steve


#12

On 15-Dec-05, at 2:33 PM, Steve S. wrote:

I need to insert ~100k rows into the database, but using “Table.new
(…).save” is painfully slow (I notice in the logs that it’s
wrapping each insert in a transaction – that can’t be fast).

Any suggestions on how I can improve the speed of bulk inserts like
this?

It’s pretty easy to add a class method #batch_insert to your AR
class, or to all classes. If you are using mysql, the syntax is
VALUES (…) (…) etc.

I suggest saving a few thousand records every statement; the string
for 100k records would get pretty big. :slight_smile:


#13

Steve S. wrote:

single SQL statement? I know MySQL will do it, but PgSQL won’t, and I
don’t think it’s ANSI anyway.

Are you still using TableModel.new for each record? If so, this is
incredibly slow.

If you are using MySQL contact me offlist if you want my active record
extension (which only works for mysql right now) which increases the
speed 100x when inserting large sets of data into mysql.

Zach


#14

On 16/12/2005, at 12:51 PM, Steve S. wrote:

Any other ideas? E.g., is there any way to insert multiple rows
with a single SQL statement? I know MySQL will do it, but PgSQL
won’t, and I don’t think it’s ANSI anyway.

PostgreSQL has a handy COPY command which imports TSV or CSV data,
either from stdin or a file:
COPY my_table(column1, column2) FROM STDIN WITH CSV;
“thing”,“stuff”
“foo”,“bar”

etc.

See http://www.postgresql.org/docs/8.1/interactive/sql-copy.html for
more details


Phillip H.
removed_email_address@domain.invalid