Hey Guys. I just watched the talk by Charles here http://confreaks.com/videos/1235-aloharuby2012-why-jruby It was really interesting and Charles makes very strong case for jruby. I was struck by one thing he said about running rails though. He mentioned that if your rails app did a lot of database calls jruby might not be faster. Maybe I got that wrong so I wanted to ask here. Is this because the JDBC drivers are slower than the C drivers?
on 2012-10-27 09:53
on 2012-10-27 17:11
My guess (and it's *just* a guess) is that it's because the data base would be the bottleneck, and would make any Ruby implementation differences in performance insignificant. ...so that it wouldn't make JRuby faster *or slower*. - Keith Keith R. Bennett http://about.me/keithrbennett
on 2012-10-28 18:14
That is part of it, but we also sometimes pay a penalty with Java Charset <-> Ruby bytelist translation which can make us slower. From my memory we were <2x slower for most generic queries (and at times the same speed). These were queries where actual query time itself was minimal. For more ordinary queries (where DB engine time becomes most of the time) this difference drops off to insignificance pretty quickly. -Tom On Sat, Oct 27, 2012 at 10:10 AM, Keith Bennett <keithrbennett@gmail.com> wrote: > >> faster. Maybe I got that wrong so I wanted to ask here. > > --------------------------------------------------------------------- > To unsubscribe from this list, please visit: > > http://xircles.codehaus.org/manage_email > > -- blog: http://blog.enebo.com twitter: tom_enebo mail: tom.enebo@gmail.com
on 2012-10-28 19:08
Now that 1.9-mode encoding support is improved in JRuby, has the possibility of optimizing Java-Ruby interop with UTF-16 internal encoding been considered? In other words, calling a java method that returns java.lang.String could return a ruby String in UTF-16 encoding, wrapped instead of converted to bytes? Since java.lang.String is immutable, it would still require some copy-on-write special handling. Said another way, Ruby may support many internal encodings, but JRuby might run most optimally when its internal encoding matches the JVM's UTF-16? --David
on 2012-10-28 21:28
On Mon, Oct 29, 2012 at 6:12 AM, Thomas E Enebo <tom.enebo@gmail.com> wrote: > That is part of it, but we also sometimes pay a penalty with Java > Charset <-> Ruby bytelist translation which can make us slower. From > my memory we were <2x slower for most generic queries (and at times > the same speed). These were queries where actual query time itself > was minimal. For more ordinary queries (where DB engine time becomes > most of the time) this difference drops off to insignificance pretty > quickly. > Thanks for the explanation. That's quite interesting and I guess things like number of fields, number of text fields etc would also make a difference.
on 2012-10-29 16:20
We know that we can save that one transcode internally by saving the UTF-16 bytes, but if the DB is returning strings as UTF-8 and Java is retranslating that back to UTF-16, then we only save part of the transcoding work. Still it is something which could help a bit. We have talked about native adapters which capture the bytes off the wire and don't bother to translate anything. This obviously means not using JDBC's abstraction. -Tom On Sun, Oct 28, 2012 at 1:07 PM, David Kellum <dek94@gravitext.com> wrote: > --David > > > On Sun, 2012-10-28 at 12:12 -0500, Thomas E Enebo wrote: > > That is part of it, but we also sometimes pay a penalty with Java > Charset <-> Ruby bytelist translation which can make us slower. -- blog: http://blog.enebo.com twitter: tom_enebo mail: tom.enebo@gmail.com
on 2012-10-29 16:24
Yes. Number and type of fields can easily show differences in microbenches. As I said before, in isolation, it always looks like a huge difference but in the picture of more complicated queries, networking, and the rest of the stack it is not so dramatic. -Tom On Sun, Oct 28, 2012 at 3:28 PM, Tim Uckun <timuckun@gmail.com> wrote: > Thanks for the explanation. That's quite interesting and I guess > things like number of fields, number of text fields etc would also > make a difference. > > --------------------------------------------------------------------- > To unsubscribe from this list, please visit: > > http://xircles.codehaus.org/manage_email > > -- blog: http://blog.enebo.com twitter: tom_enebo mail: tom.enebo@gmail.com
on 2012-10-29 19:57
Some JDBC drivers do provide access to the raw bytes, but I'm not sure
we've ever attempted to use them directly. For example:
on ResultSet:
byte[] getBytes(int columnIndex)
throws SQLException
Retrieves the value of the designated column in the current row of
this ResultSet object as a byte array in the Java programming
language. The bytes represent the raw values returned by the driver.
- Charlie
on 2012-10-29 20:07
An example using Postgresql:
irb(main):029:0> conn.exec_sql_query 'select * from hello;'
=> #<Java::OrgPostgresqlJdbc4::Jdbc4ResultSet:0x4f38f663>
irb(main):030:0> rs = _
=> #<Java::OrgPostgresqlJdbc4::Jdbc4ResultSet:0x4f38f663>
irb(main):031:0> rs.next
=> true
irb(main):032:0> rs.get_bytes(1)
=> byte[68, 117, 100, 101]@73dab220
irb(main):033:0> bytes = rs.get_bytes(1)
=> byte[68, 117, 100, 101]@73dab220
irb(main):034:0> bytes.each {|i| puts i.chr}
D
u
d
e
=> byte[68, 117, 100, 101]@73dab220
On Mon, Oct 29, 2012 at 1:55 PM, Charles Oliver Nutter
on 2012-10-29 20:54
I wonder if those bytes are really the original from the wire, or if the driver always decodes to characters, and in this case is re-encoding as bytes? Also a UTF-16 String optimized jruby could yield perf benefits outside of JDBC. The Nokogiri java port comes to mind, if memory serves. --David
on 2012-10-29 21:26
Oddly enough, using UTF-16 as the encoding for the Ruby String we get
out of a Java String seems to *hurt* perf.
Here's numbers on Java 8 + perf patches for indy for a benchmark that
just causes a lot of Java String => RubyString conversion:
$ jruby -rbenchmark -e "java_import java.lang.System; 10.times { puts
Benchmark.measure { 100000.times { System.getProperty('java.home') } }
}"
1.140000 0.030000 1.170000 ( 0.363000)
0.260000 0.010000 0.270000 ( 0.139000)
0.150000 0.010000 0.160000 ( 0.142000)
0.130000 0.000000 0.130000 ( 0.122000)
0.140000 0.020000 0.160000 ( 0.145000)
0.190000 0.010000 0.200000 ( 0.142000)
0.130000 0.000000 0.130000 ( 0.128000)
0.140000 0.000000 0.140000 ( 0.127000)
0.130000 0.000000 0.130000 ( 0.127000)
0.130000 0.000000 0.130000 ( 0.128000)
This is always turning Java strings into UTF-8 or whatever the default
internal encoding in JRuby is set to. Generally, it will allocate a
byte[] for a single-byte encoding and transcode to that using Java's
Charset stuff.
Here's if we instead always transcode into UTF-16:
$ jruby -rbenchmark -e "java_import java.lang.System; 10.times { puts
Benchmark.measure { 100000.times { System.getProperty('java.home') } }
}"
1.570000 0.040000 1.610000 ( 0.510000)
0.410000 0.010000 0.420000 ( 0.192000)
0.220000 0.020000 0.240000 ( 0.172000)
0.160000 0.000000 0.160000 ( 0.152000)
0.170000 0.020000 0.190000 ( 0.177000)
0.290000 0.010000 0.300000 ( 0.176000)
0.150000 0.000000 0.150000 ( 0.150000)
0.160000 0.000000 0.160000 ( 0.145000)
0.150000 0.000000 0.150000 ( 0.140000)
0.190000 0.000000 0.190000 ( 0.138000)
Performance is on average worse. I would guess this is because:
* Transcoding still has to walk the same number of characters, even
though it can just dump them as two-byte chunks
* UTF-16 allocates at least length * 2 bytes, where UTF-8 can allocate
something less than that usually
So the savings we get from not transcoding into a one-byte format
doesn't seem to outweigh the fact that we have to allocate and
populate a larger array.
This does not say anything about cases where we can get the original
bytes first. That could be much faster.
- Charlie
on 2012-10-29 21:46
A little while ago I tried to get the lowdown on ActiveRecord caching. From the scant documentation out there it seems like AR does not execute the same query twice in the same request. If that's true then that may also effect the benchmarks. I guess if you are doing aggressive caching then this whole translation thing might be moot as you are (hopefully) caching translated results.
on 2012-10-30 17:19
I had a more radical (and possibly more misguided!) optimization in
mind: When calling java methods that return String (or any
CharSequence), wrap as a Ruby String without copy and serve at least the
character read operations via CharSequence.charAt(), etc. This Ruby
String would then need to support copy-on-write semantics for any
mutation. Also possibly any ruby String passed in as a Java String would
have its conversion preserved with the same representation, such that my
manual optimization in the second bench below would no longer yield any
benifit:
% jruby -rbenchmark -rjava -e "java_import java.lang.System; 5.times {
puts Benchmark.measure { 1000000.times { System.getProperty('java.home')
} } }"
1.699000 0.000000 1.699000 ( 1.699000)
0.741000 0.000000 0.741000 ( 0.741000)
0.612000 0.000000 0.612000 ( 0.612000)
0.615000 0.000000 0.615000 ( 0.615000)
0.614000 0.000000 0.614000 ( 0.614000)
% jruby -rbenchmark -rjava -e "java_import java.lang.System; 5.times {
jhome = 'java.home'.to_java; puts Benchmark.measure { 1000000.times {
System.getProperty(jhome) } } }"
0.981000 0.000000 0.981000 ( 0.981000)
0.895000 0.000000 0.895000 ( 0.896000)
0.525000 0.000000 0.525000 ( 0.525000)
0.471000 0.000000 0.471000 ( 0.471000)
0.452000 0.000000 0.452000 ( 0.452000)
(Note I changed the number of iterations from your original.)
--David
on 2012-10-30 17:39
Yeah I would like to see us do this at some point. It's tricky because we have so much code built around byte[] (for obvious reasons) that many/most things you might do to manipulate or search a String would want to use the raw bytes anyway. We also cache (inconsistently) the java.lang.String object created, which helps reduce overhead when we're not recreating the Ruby side every time as well. I wish Ruby had just made literal Strings be immutable or something...would make our job a lot easier :) - Charlie
on 2012-10-30 19:23
I've used byte access with jdbc on H2 database (java based SQL server). I was storing md5sum in bytes using Sequel as ORM, then later on went down to jdbc prepared statements. On 10/29/12, Charles Oliver Nutter <headius@headius.com> wrote: > >> wire and don't bother to translate anything. This obviously means not >>> java.lang.String could return a ruby String in UTF-16 encoding, wrapped >>> On Sun, 2012-10-28 at 12:12 -0500, Thomas E Enebo wrote: >> --------------------------------------------------------------------- > > > -- Christian
on 2012-10-30 21:53
> I wish Ruby had just made literal Strings be immutable or > something...would make our job a lot easier :) > Please excuse my stupidity but could you not create a ImmutableString type or something?
on 2012-10-31 03:05
What was your experience wrt performance? I would hope the bytes coming out of there are as close to "off the wire" as possible, but I have not looked into the implementation of any specific JDBC driver to see if that's the case. If we could get close-to-the-wire bytes to use for AR-JDBC, it could mean a tremendous reduction in object overhead for walking a result set. - Charlie On Tue, Oct 30, 2012 at 1:21 PM, Christian MICHON
on 2012-10-31 03:06
On Tue, Oct 30, 2012 at 3:53 PM, Tim Uckun <timuckun@gmail.com> wrote: > Please excuse my stupidity but could you not create a ImmutableString > type or something? We certainly could, but no existing code uses it. If we were to use it for literal strings, all code that expects a literal string in code to create a new mutable String object would start to fail. Mutable Strings by default just seems to be a bad decision. - Charlie
Please log in before posting. Registration is free and takes only a minute.
Existing account
(Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
Log in with Google account | Log in with Yahoo account
No account? Register here.