The Mongodb project is adding a native java extension to support JRuby
from their Ruby driver. You can see the current code here [1].
There's a "slight" problem though. The native Java code is only about 2x
faster than pure ruby which makes it around 5x slower than MRI using the
C extension. To make matters worse, the MRI C extension itself is about
10x slower than the pure C driver and 8x slower than the pure Java
driver. All in all, Ruby performance in general is quite poor.
I've forked the project and submitted a few performance patches back
into this JRuby branch, but its performance still sucks. I have an idea
for improving it but before I go and write a bunch of code I'd like to
ask here to make sure it's a feasible idea.
The current Java code makes sure that it boxes all primitives as JRuby
primitives before doing any real operations on them. For instance, a Map
is allocated as a RubyHash, an integer is allocated as a RubyFixnum and
a string is allocated as a RubyString. Then the JavaEmbedUtils are used
for calling ruby methods to operate on these boxed objects.
e.g.
RubyString rkey = RubyString.newString(_runtime, name);
JavaEmbedUtils.invokeMethod(_runtime, current, "[]=",
new Object[] { (IRubyObject)rkey, o }, Object.class);
I'm wondering if all of this boxing of primitives can be avoided or at
least done lazily. I've written a bit of Ruby before that access Java
objects like Maps and Lists and I recall that the runtime added a bunch
of syntactic sugar to these classes. So even though Map does not define
the #[]= method, I could use that and JRuby would make sure the right
thing got done. Similarly, I remember from those old experiments that my
Maps and Lists could contain Java primitives (int, Integer, String, etc)
and I didn't need to do anything special to use them from Ruby. Again,
JRuby did the right thing.
So here is my real question (I love to bury the lede).
Can I modify the driver to just use Maps, Lists and regular Java boxed
primitives and leave it up to the runtime to lazily convert them to Ruby
objects as they are accessed? What are the downsides? Can I count on
this behavior being supported in future versions of JRuby?
If that takes care of improving the decoding phase, the encoding phase
is still rather pokey. What tricks can be used to rapidly convert Ruby
objects to Java objects so they can be BSON encoded very fast?
I've run the existing code under the VisualVM profiler, but it's *very*
hard to pick out the slow spots when there is so much internal JRuby
stuff in use. :(
Thanks for any suggestions.
cr
[1]
http://github.com/mongodb/mongo-ruby-driver/blob/jruby/ext/java/src/org/jbson/
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email
on 2010-08-09 20:20
on 2010-08-12 21:47
On Mon, Aug 9, 2010 at 1:20 PM, Chuck Remes <cremes.devlist@mac.com> wrote: > The Mongodb project is adding a native java extension to support JRuby from their Ruby driver. You can see the current code here [1]. > > There's a "slight" problem though. The native Java code is only about 2x faster than pure ruby which makes it around 5x slower than MRI using the C extension. To make matters worse, the MRI C extension itself is about 10x slower than the pure C driver and 8x slower than the pure Java driver. All in all, Ruby performance in general is quite poor. I assume the JRuby driver just uses the Java driver, and they're trying to make it an extension so it will perform better than calling the Java driver from Ruby directly? In this case, I wonder if just using the Java driver from Ruby might actually be as fast or faster... > I've forked the project and submitted a few performance patches back into this JRuby branch, but its performance still sucks. I have an idea for improving it but before I go and write a bunch of code I'd like to ask here to make sure it's a feasible idea. > > The current Java code makes sure that it boxes all primitives as JRuby primitives before doing any real operations on them. For instance, a Map is allocated as a RubyHash, an integer is allocated as a RubyFixnum and a string is allocated as a RubyString. Then the JavaEmbedUtils are used for calling ruby methods to operate on these boxed objects. > > e.g. > > RubyString rkey = RubyString.newString(_runtime, name); > Â Â Â Â Â JavaEmbedUtils.invokeMethod(_runtime, current, "[]=", > Â Â Â Â Â Â new Object[] { (IRubyObject)rkey, o }, Object.class); This seems like a lot of overhead. Normally, a JRuby extension is implemented such that it does all its work on the Java side of things and only coerces or calls Ruby when absolutely necessary. Perhaps they're trying to pattern this after the MRI C extension, or perhaps they have a lot of Ruby code that they need to function too? > I'm wondering if all of this boxing of primitives can be avoided or at least done lazily. I've written a bit of Ruby before that access Java objects like Maps and Lists and I recall that the runtime added a bunch of syntactic sugar to these classes. So even though Map does not define the #[]= method, I could use that and JRuby would make sure the right thing got done. Similarly, I remember from those old experiments that my Maps and Lists could contain Java primitives (int, Integer, String, etc) and I didn't need to do anything special to use them from Ruby. Again, JRuby did the right thing. > > So here is my real question (I love to bury the lede). > > Can I modify the driver to just use Maps, Lists and regular Java boxed primitives and leave it up to the runtime to lazily convert them to Ruby objects as they are accessed? What are the downsides? Can I count on this behavior being supported in future versions of JRuby? This would probably be preferable to actively trying to coerce everything, whether it's used or not. I don't have a good picture in my head of how this driver is structured, though. > If that takes care of improving the decoding phase, the encoding phase is still rather pokey. What tricks can be used to rapidly convert Ruby objects to Java objects so they can be BSON encoded very fast? Ok, at this point I decided to look at the code. The encoder logic looks *extremely* heavy, doing almost all its logic by manipulating JRuby data structures and Ruby objects. * the initial entry points into the encoder and decoder/callback are using Java integration (http://github.com/mongodb/mongo-ruby-driver/blob/jruby/lib/bson/bson_java.rb#L17) * from there, the encoder walks the Ruby structure, introspecting each contained object in turn and writing out to the buffer. * there's numerous places where a new String is constructed for every key lookup * there are places where java.util.* interfaces are used against Ruby arrays or Ruby hashes, which causes them to coerce their elements on the way out The Ruby version has a lot of room for improvement itself: * it has multiple large case/when statements, which end up doing up to N "===" calls. In fact, the results of one very large case/when is used to drive another very large case/when * there are quite a few type checks that could just be dispatches if they decorated a few core types Is there a benchmark you're using to test these? It might be fun to take a crack at both the pure Ruby and the Java/JRuby versions to see if we can improve their performance. > I've run the existing code under the VisualVM profiler, but it's *very* hard to pick out the slow spots when there is so much internal JRuby stuff in use. :( Oftentimes, if the profile doesn't show any major standout bottleneck, then the bottleneck is simply in all those objects being allocated. That seems like it could easily be the problem here as well (and potentially for MRI too). - Charlie --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
on 2010-08-13 06:05
On Aug 12, 2010, at 2:46 PM, Charles Oliver Nutter wrote: > On Mon, Aug 9, 2010 at 1:20 PM, Chuck Remes <cremes.devlist@mac.com> wrote: >> The Mongodb project is adding a native java extension to support JRuby from their Ruby driver. You can see the current code here [1]. >> >> There's a "slight" problem though. The native Java code is only about 2x faster than pure ruby which makes it around 5x slower than MRI using the C extension. To make matters worse, the MRI C extension itself is about 10x slower than the pure C driver and 8x slower than the pure Java driver. All in all, Ruby performance in general is quite poor. > > I assume the JRuby driver just uses the Java driver, and they're > trying to make it an extension so it will perform better than calling > the Java driver from Ruby directly? In this case, I wonder if just > using the Java driver from Ruby might actually be as fast or faster... As you discovered later, the java native extension is manipulating jruby structures. It isn't a simple wrap of the existing java driver. > This seems like a lot of overhead. > > Normally, a JRuby extension is implemented such that it does all its > work on the Java side of things and only coerces or calls Ruby when > absolutely necessary. Perhaps they're trying to pattern this after the > MRI C extension, or perhaps they have a lot of Ruby code that they > need to function too? Most of the driver logic is in ruby. The performance-critical parts are the pieces that deal with encoding/decoding BSON (essentially a Binary JSON format). The BSON code is a small overall part of the driver's functionality, but it's where 80% of the processing time ends up. >> I'm wondering if all of this boxing of primitives can be avoided or at least done lazily. I've written a bit of Ruby before that access Java objects like Maps and Lists and I recall that the runtime added a bunch of syntactic sugar to these classes. So even though Map does not define the #[]= method, I could use that and JRuby would make sure the right thing got done. Similarly, I remember from those old experiments that my Maps and Lists could contain Java primitives (int, Integer, String, etc) and I didn't need to do anything special to use them from Ruby. Again, JRuby did the right thing. >> >> So here is my real question (I love to bury the lede). >> >> Can I modify the driver to just use Maps, Lists and regular Java boxed primitives and leave it up to the runtime to lazily convert them to Ruby objects as they are accessed? What are the downsides? Can I count on this behavior being supported in future versions of JRuby? > > This would probably be preferable to actively trying to coerce > everything, whether it's used or not. I don't have a good picture in > my head of how this driver is structured, though. Sorry for my poor job at explaining how it is laid out. I thought a quickie overview would be sufficient to get a few perf tips. > contained object in turn and writing out to the buffer. > * there's numerous places where a new String is constructed for every key lookup > * there are places where java.util.* interfaces are used against Ruby > arrays or Ruby hashes, which causes them to coerce their elements on > the way out I suspected this which is why I posed my original question. It just seemed like a lot of extra work to constantly be accessing things inside the jruby structures instead of pure java objects. Convert it (lazily) at the end and save some work. > The Ruby version has a lot of room for improvement itself: > > * it has multiple large case/when statements, which end up doing up to > N "===" calls. In fact, the results of one very large case/when is > used to drive another very large case/when > * there are quite a few type checks that could just be dispatches if > they decorated a few core types I'm interested to hear how the basic types could be decorated to avoid the giant case statements. When viewing all of the drivers for mongodb, they all pretty much follow the same structure. Their BSON logic is usually comprised of a few giant case or if/else structures just like this one. If there's a relatively simple win for this driver, it's likely applicable to the other drivers too. > Is there a benchmark you're using to test these? It might be fun to > take a crack at both the pure Ruby and the Java/JRuby versions to see > if we can improve their performance. I just committed the benchmark that I was using. You can find it here: http://github.com/chuckremes/mongo-ruby-driver/blob/jruby/bin/bson_benchmark.rb I'm not the original author of the driver (that would be Kyle Banker) but I am interested in seeing better performance for Ruby (particularly JRuby). The changes in my fork have already been rolled up to the master project, so interested parties should fork that repository [1] instead of mine. cr [1] http://github.com/mongodb/mongo-ruby-driver (look at the jruby branch) --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
on 2010-08-13 06:34
On 13 August 2010 14:05, Chuck Remes <cremes.devlist@mac.com> wrote: > I'm not the original author of the driver (that would be Kyle Banker) but I am interested in seeing better performance for Ruby (particularly JRuby). The changes in my fork have already been rolled up to the master project, so interested parties should fork that repository [1] instead of mine. Just a minor nit, but in RubyBSONEncoder.java, you need to ensure all accesses to _runtimeCache, and any map retreived from it are properly synchronized. Either wrap them using Collections.synchronizedMap(), or make all methods that directly interact with them synchronized. Also, you should make _runtimeCache a WeakHashMap, so when a Ruby runtime instance is no longer used, it gets ejected from the cache. Also, if the keys you're using are _always_ constant strings (i.e. "blah"), then you can use an IdentityHashMap, which will be a bit faster for lookups. > > --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
on 2010-08-15 21:26
On Aug 12, 2010, at 11:34 PM, Wayne Meissner wrote: > > Also, you should make _runtimeCache a WeakHashMap, so when a Ruby > runtime instance is no longer used, it gets ejected from the cache. > > Also, if the keys you're using are _always_ constant strings (i.e. > "blah"), then you can use an IdentityHashMap, which will be a bit > faster for lookups. Good points. I'll make these fixes. cr --------------------------------------------------------------------- To unsubscribe from this list, please visit: http://xircles.codehaus.org/manage_email
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.