Database and xml parsers

Hi,

I’m converting a RoR app to run under JRuby. Things are going well so
far.

There are some data-intensive portions to this intranet site, parsing
large XML files and populating database (MySQL) tables with the content
from the file. Previously, I was using a native XML parser, which I
cannot do in JRuby.

I am trying to find a replacement SAX-based XML parser (Hpricot?) that
works under JRuby…any suggestions?

Another possibility is to re-write the XML parsing and database
insertion routines in a Java/JAR package and call that from JRuby. Is
it possible from Java to access the MySQL configuration information that
is contained in RoR database.yml? Or does that information need to be
hard-coded in the Java code as well?

Thanks,
Kevin

On Thu, Oct 22, 2009 at 9:53 AM, Kevin T. [email protected]
wrote:

Hi,

I’m converting a RoR app to run under JRuby. Things are going well so
far.

There are some data-intensive portions to this intranet site, parsing
large XML files and populating database (MySQL) tables with the content
from the file. Previously, I was using a native XML parser, which I
cannot do in JRuby.

By native, do you mean libxml?

I am trying to find a replacement SAX-based XML parser (Hpricot?) that
works under JRuby…any suggestions?

On the Ruby side, you could try Nokogiri 1 or REXML + JREXML 2.
The latter is probably at best a stop-gap.

Another possibility is to re-write the XML parsing and database
insertion routines in a Java/JAR package and call that from JRuby. Is
it possible from Java to access the MySQL configuration information that
is contained in RoR database.yml? Or does that information need to be
hard-coded in the Java code as well?

A third option you didn’t mention would be to drive a Java SAX parser
from Ruby. You can even extend the DefaultHandler class in Ruby and
hand it to the Java XML parser:

class RubyHandler < org.xml.sax.helpers.DefaultHandler
def startElement(namespace, local, qname, attrs)
end

end

factory = javax.xml.parsers.SAXParserFactory.newInstance

configure factory if desired

parser = factory.newSAXParser
parser.parse(“my/file.xml”, RubyHandler.new)

Cheers,
/Nick


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Nick S. wrote:

On Thu, Oct 22, 2009 at 9:53 AM, Kevin T. [email protected]
wrote:

Hi,

I’m converting a RoR app to run under JRuby. �Things are going well so
far.

There are some data-intensive portions to this intranet site, parsing
large XML files and populating database (MySQL) tables with the content
from the file. �Previously, I was using a native XML parser, which I
cannot do in JRuby.

By native, do you mean libxml?

Yes, we were using libxml (expat) previously.

I am trying to find a replacement SAX-based XML parser (Hpricot?) that
works under JRuby…any suggestions?

On the Ruby side, you could try Nokogiri [1] or REXML + JREXML [2].
The latter is probably at best a stop-gap.

I believe I tried to install the Nokogiri gem through JRuby, but it had
native elements (the web site you provided a link to says it uses
libxml2). Is there a fully working JRuby solution yet? I found some
posts from Jan 09 that said it was still in progress.

Another possibility is to re-write the XML parsing and database
insertion routines in a Java/JAR package and call that from JRuby. �Is
it possible from Java to access the MySQL configuration information that
is contained in RoR database.yml? �Or does that information need to be
hard-coded in the Java code as well?

A third option you didn’t mention would be to drive a Java SAX parser
from Ruby. You can even extend the DefaultHandler class in Ruby and
hand it to the Java XML parser:

class RubyHandler < org.xml.sax.helpers.DefaultHandler
def startElement(namespace, local, qname, attrs)
end

end

factory = javax.xml.parsers.SAXParserFactory.newInstance

configure factory if desired

parser = factory.newSAXParser
parser.parse(“my/file.xml”, RubyHandler.new)

Since I already have the SAX state machine coded up in ruby, using the
Java SAX parser seems like a good idea as well. Less dependencies/gems
to worry about. Thanks so much, I would have never even thought of this
option.