Importing / Parsing Large Excel Files?


#1

Hello,

I’m running into a project where a client has large Excel files
(60.000+ records per file) and he needs an application to import them
into a database to use this data for useful operations (reporting,
calculations … etc).

I know about the available libraries:

http://raa.ruby-lang.org/project/parseexcel/
http://rubyforge.org/projects/spreadsheet/
http://rubyforge.org/projects/roo/

I’ve used parseexcel before but for small files. The point is that
parseexcel and spreadhsheet libraries have a reputation that they
can’t handle large excel files (I don’t know about ‘roo’).
The guy here (http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-
talk/223230) created some kind of a bridge between Ruby and Java to
use ‘JavaExcelAPI’ and it did work fine for him!

Has anyone tried importing large excel files with ruby? Do I really
need to use a Java library through a bridge to get my job done?
Personally I prefer to hire a Java developer to do that Excel
importing module than doing a bridge and correct me if I’m wrong.

What do you suggest guys? Is there some great ruby library to do the
job that I never heard of before? Or do you suggest some tweaks like
parsing the Excel file using ‘parseexcel’ like 500 records or
something at a time?

I’m kinda lost and I need your help

Regards


#2

Other options include ruby’s WIN32OLE and DBI libraries. For the
former, you’ll have to be running on a machine w/excel installed. You’d
use the latter on a windows machine to read the spreadsheets in via
ODBC. Your connect string would be something like:

connection_string = “dbi:ADO:” +
“Provider=MSDASQL;” +
“Persist Security Info=False;” +
“Extended
Properties=“dbq=c:/path/to/spreadsheet.xls”;”

HTH,

-Roy


#3

Actually being on Linux does not prevent WIN2OLE and DBI if you use
WINE…

You still need to install all the supporting software (in this case
Excel).


#4

Hello,

On 16 Okt., 20:02, “AN@S” removed_email_address@domain.invalid wrote:

Hello,

I’m running into a project where a client has large Excel files
(60.000+ records per file) and he needs an application to import them
into a database to use this data for useful operations (reporting,
calculations … etc).

I know about the available libraries:

http://raa.ruby-lang.org/project/parseexcel/http://rubyforge.org/projects/spreadsheet/http://rubyforge.org/projects/roo/

I’m the author of the roo gem. Roo can handle such huge spreadsheet
files without problems but it may be very slow. You can try if it is
fast enough for your purposes.

-Thomas


#5

I’m running a Linux machine, so I think WIN2OLE and DBI libraries are
not an option.


#6

If you decide to use roo you can try running the processing task in
the background.

backgroundRB ??