Some more info:
Recently, I was confronted with a task in one of the apps. I’m building
that would allow the parsing of data in an Excel spreadsheet where the
number of rows could be on the order of 30000/40000/50000 or higher.
Originally, I was using the parseexcel gem to handle the parsing -
however, it proved to be fairly slow and consumed a lot of memory. When
I presented it with a > 42000 row spreadsheet, it basically cratered.
So I had to figure out another way to handle this problem. Someone
mentioned that there was a nice open source Java - based Excel parser
called JExcelAPI (http://jexcelapi.sourceforge.net/). A quick native
Java test showed that the performance and memory footprint would be much
much better.
In order to take advantage of JExcelAPI, I looked at JRuby briefly - but
still had problems implementing that (and I didn’t want to run this app.
on it yet since it’s still so young), so I took a look at some of the
Java - Ruby bridges. I gave one called Rjb
(http://arton.no-ip.info/collabo/backyard/?RubyJavaBridge) a shot. I
was very pleasantly surprised - it was really easy to use this to
integrate with the JExcelAPI.
If I understand correctly, Rjb uses JNI to start, and then interact with
an available JVM (a JDK, not a JRE). Works on Windows or UNIX. You
basically embed a JVM in your Ruby interpreter and then load classes
into it and start using them. Basic type casting to/from Java types is
done for you. The documentation is terrible but there’s just enough of
it to get you started.
Here’s what I did:
- Get the Rjb gem using “gem install rjb”
- Put the JAR file that I wanted to use - jxl.jar in my RAILS_ROOT/lib
directory.
- Start the JVM using Rjb::load(“#{RAILS_ROOT}/lib/jxl.jar”,
[‘-Xms256M’, ‘-Xmx512M’]) - the array is a set of parameters to send to
the JVM for startup.
- Load classes using Rjb::import(classname)
Here’s an example of using it in my app.:
file_class = Rjb::import(‘java.io.File’)
workbook_class = Rjb::import(‘jxl.Workbook’)
workbook = workbook_class.getWorkbook(file_class.new(filename))
Some things to notice:
- filename is a Ruby string - that’s being passed to the File.new() Java
method.
- The return of the call to file_class.new is a wrapped Java File object
and can be immediately passed to the getWorkbook method.
- workbook is a Java object that can then be used in other parts of the
app.
The good news: Once you get past loading a class and/or instantiating an
object, doing method calls is as simple as just calling the methods on
the Java objects you’ve instantiated or received from other method
calls.
The bad news: This is so seamless, it would be very easy to forget that
some of the objects that you’re dealing with are effectively Java
objects, and then you might forget how to use them correctly.
For production for this app, I may need to change approaches since I’ll
prob. be running multiple Mongrel processes and I don’t know if I want
to have one embedded JVM per process (if I understand Mongrel deployment
correctly - currently I’m doing Apache/FastCGI so I know it’s a problem
there). That may force using DRb in a separate process to host this
Excel parser component and allow it to be used from anywhere (if that
happens - could also do a Web service-y thing on top of a JRuby process
or whatever).
Hope this is useful for someone.
Wes