Forum: JRuby The jruby talk on aloha ruby.

Posted by Tim Uckun (Guest)
on 2012-10-27 09:53
(Received via mailing list)
Hey Guys.

I just watched the talk by Charles here
http://confreaks.com/videos/1235-aloharuby2012-why-jruby  It was
really interesting and Charles makes very strong case for jruby. I was
struck by one thing he said about running rails though. He mentioned
that if your rails app did a lot of database calls jruby might not be
faster. Maybe I got that wrong so I wanted to ask here.

Is this because the JDBC drivers are slower than the C drivers?
Posted by Keith B. (keith_b)
on 2012-10-27 17:11
(Received via mailing list)
My guess (and it's *just* a guess) is that it's because the data base
would be the bottleneck, and would make any Ruby implementation
differences in performance insignificant.

...so that it wouldn't make JRuby faster *or slower*.

- Keith

Keith R. Bennett
http://about.me/keithrbennett
Posted by Thomas E Enebo (Guest)
on 2012-10-28 18:14
(Received via mailing list)
That is part of it, but we also sometimes pay a penalty with Java
Charset <-> Ruby bytelist translation which can make us slower.  From
my memory we were <2x slower for most generic queries (and at times
the same speed).   These were queries where actual query time itself
was minimal.  For more ordinary queries (where DB engine time becomes
most of the time) this difference drops off to insignificance pretty
quickly.

-Tom

On Sat, Oct 27, 2012 at 10:10 AM, Keith Bennett 
<keithrbennett@gmail.com> wrote:
>
>> faster. Maybe I got that wrong so I wanted to ask here.
>
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
>
>    http://xircles.codehaus.org/manage_email
>
>



--
blog: http://blog.enebo.com       twitter: tom_enebo
mail: tom.enebo@gmail.com
Posted by David Kellum (Guest)
on 2012-10-28 19:08
(Received via mailing list)
Now that 1.9-mode encoding support is improved in JRuby, has the
possibility of optimizing Java-Ruby interop with UTF-16 internal
encoding been considered?  In other words, calling a java method that
returns java.lang.String could return a ruby String in UTF-16 encoding,
wrapped instead of converted to bytes?  Since java.lang.String is
immutable, it would still require some copy-on-write special handling.

Said another way, Ruby may support many internal encodings, but JRuby
might run most optimally when its internal encoding matches the JVM's
UTF-16?

--David
Posted by Tim Uckun (Guest)
on 2012-10-28 21:28
(Received via mailing list)
On Mon, Oct 29, 2012 at 6:12 AM, Thomas E Enebo <tom.enebo@gmail.com> 
wrote:
> That is part of it, but we also sometimes pay a penalty with Java
> Charset <-> Ruby bytelist translation which can make us slower.  From
> my memory we were <2x slower for most generic queries (and at times
> the same speed).   These were queries where actual query time itself
> was minimal.  For more ordinary queries (where DB engine time becomes
> most of the time) this difference drops off to insignificance pretty
> quickly.
>

Thanks for the explanation. That's quite interesting and I guess
things like number of fields, number of text fields etc   would also
make a difference.
Posted by Thomas E Enebo (Guest)
on 2012-10-29 16:20
(Received via mailing list)
We know that we can save that one transcode internally by saving the
UTF-16 bytes, but if the DB is returning strings as UTF-8 and Java is
retranslating that back to UTF-16, then we only save part of the
transcoding work.  Still it is something which could help a bit.

We have talked about native adapters which capture the bytes off the
wire and don't bother to translate anything.  This obviously means not
using JDBC's abstraction.

-Tom

On Sun, Oct 28, 2012 at 1:07 PM, David Kellum <dek94@gravitext.com> 
wrote:
> --David
>
>
> On Sun, 2012-10-28 at 12:12 -0500, Thomas E Enebo wrote:
>
> That is part of it, but we also sometimes pay a penalty with Java
> Charset <-> Ruby bytelist translation which can make us slower.



--
blog: http://blog.enebo.com       twitter: tom_enebo
mail: tom.enebo@gmail.com
Posted by Thomas E Enebo (Guest)
on 2012-10-29 16:24
(Received via mailing list)
Yes.  Number and type of fields can easily show differences in
microbenches.  As I said before, in isolation, it always looks like a
huge difference but in the picture of more complicated queries,
networking, and the rest of the stack it is not so dramatic.

-Tom

On Sun, Oct 28, 2012 at 3:28 PM, Tim Uckun <timuckun@gmail.com> wrote:
> Thanks for the explanation. That's quite interesting and I guess
> things like number of fields, number of text fields etc   would also
> make a difference.
>
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
>
>     http://xircles.codehaus.org/manage_email
>
>



--
blog: http://blog.enebo.com       twitter: tom_enebo
mail: tom.enebo@gmail.com
Posted by Charles Nutter (headius)
on 2012-10-29 19:57
(Received via mailing list)
Some JDBC drivers do provide access to the raw bytes, but I'm not sure
we've ever attempted to use them directly. For example:

on ResultSet:

byte[] getBytes(int columnIndex)
                throws SQLException
Retrieves the value of the designated column in the current row of
this ResultSet object as a byte array in the Java programming
language. The bytes represent the raw values returned by the driver.

- Charlie
Posted by Charles Nutter (headius)
on 2012-10-29 20:07
(Received via mailing list)
An example using Postgresql:

irb(main):029:0> conn.exec_sql_query 'select * from hello;'
=> #<Java::OrgPostgresqlJdbc4::Jdbc4ResultSet:0x4f38f663>
irb(main):030:0> rs = _
=> #<Java::OrgPostgresqlJdbc4::Jdbc4ResultSet:0x4f38f663>
irb(main):031:0> rs.next
=> true
irb(main):032:0> rs.get_bytes(1)
=> byte[68, 117, 100, 101]@73dab220
irb(main):033:0> bytes = rs.get_bytes(1)
=> byte[68, 117, 100, 101]@73dab220
irb(main):034:0> bytes.each {|i| puts i.chr}
D
u
d
e
=> byte[68, 117, 100, 101]@73dab220

On Mon, Oct 29, 2012 at 1:55 PM, Charles Oliver Nutter
Posted by David Kellum (Guest)
on 2012-10-29 20:54
(Received via mailing list)
I wonder if those bytes are really the original from the wire, or if the
driver always decodes to characters, and in this case is re-encoding as
bytes?

Also a UTF-16 String optimized jruby could yield perf benefits outside
of JDBC.  The Nokogiri java port comes to mind, if memory serves.

--David
Posted by Charles Nutter (headius)
on 2012-10-29 21:26
(Received via mailing list)
Oddly enough, using UTF-16 as the encoding for the Ruby String we get
out of a Java String seems to *hurt* perf.

Here's numbers on Java 8 + perf patches for indy for a benchmark that
just causes a lot of Java String => RubyString conversion:

$ jruby -rbenchmark -e "java_import java.lang.System; 10.times { puts
Benchmark.measure { 100000.times { System.getProperty('java.home') } }
}"
  1.140000   0.030000   1.170000 (  0.363000)
  0.260000   0.010000   0.270000 (  0.139000)
  0.150000   0.010000   0.160000 (  0.142000)
  0.130000   0.000000   0.130000 (  0.122000)
  0.140000   0.020000   0.160000 (  0.145000)
  0.190000   0.010000   0.200000 (  0.142000)
  0.130000   0.000000   0.130000 (  0.128000)
  0.140000   0.000000   0.140000 (  0.127000)
  0.130000   0.000000   0.130000 (  0.127000)
  0.130000   0.000000   0.130000 (  0.128000)

This is always turning Java strings into UTF-8 or whatever the default
internal encoding in JRuby is set to. Generally, it will allocate a
byte[] for a single-byte encoding and transcode to that using Java's
Charset stuff.

Here's if we instead always transcode into UTF-16:

$ jruby -rbenchmark -e "java_import java.lang.System; 10.times { puts
Benchmark.measure { 100000.times { System.getProperty('java.home') } }
}"
  1.570000   0.040000   1.610000 (  0.510000)
  0.410000   0.010000   0.420000 (  0.192000)
  0.220000   0.020000   0.240000 (  0.172000)
  0.160000   0.000000   0.160000 (  0.152000)
  0.170000   0.020000   0.190000 (  0.177000)
  0.290000   0.010000   0.300000 (  0.176000)
  0.150000   0.000000   0.150000 (  0.150000)
  0.160000   0.000000   0.160000 (  0.145000)
  0.150000   0.000000   0.150000 (  0.140000)
  0.190000   0.000000   0.190000 (  0.138000)

Performance is on average worse. I would guess this is because:

* Transcoding still has to walk the same number of characters, even
though it can just dump them as two-byte chunks
* UTF-16 allocates at least length * 2 bytes, where UTF-8 can allocate
something less than that usually

So the savings we get from not transcoding into a one-byte format
doesn't seem to outweigh the fact that we have to allocate and
populate a larger array.

This does not say anything about cases where we can get the original
bytes first. That could be much faster.

- Charlie
Posted by Tim Uckun (Guest)
on 2012-10-29 21:46
(Received via mailing list)
A little while ago I tried to get the lowdown on ActiveRecord caching.
From the scant documentation out there it seems like AR does not
execute the same query twice in the same request. If that's true then
that may also effect the benchmarks.

I guess if you are doing aggressive caching then this whole
translation thing might be moot as you are (hopefully) caching
translated results.
Posted by David Kellum (Guest)
on 2012-10-30 17:19
(Received via mailing list)
I had a more radical (and possibly more misguided!) optimization in
mind:  When calling java methods that return String (or any
CharSequence), wrap as a Ruby String without copy and serve at least the
character read operations via CharSequence.charAt(), etc.  This Ruby
String would then need to support copy-on-write semantics for any
mutation. Also possibly any ruby String passed in as a Java String would
have its conversion preserved with the same representation, such that my
manual optimization in the second bench below would no longer yield any
benifit:


% jruby -rbenchmark -rjava -e "java_import java.lang.System; 5.times { 
puts Benchmark.measure { 1000000.times { System.getProperty('java.home') 
} } }"
  1.699000   0.000000   1.699000 (  1.699000)
  0.741000   0.000000   0.741000 (  0.741000)
  0.612000   0.000000   0.612000 (  0.612000)
  0.615000   0.000000   0.615000 (  0.615000)
  0.614000   0.000000   0.614000 (  0.614000)
% jruby -rbenchmark -rjava -e "java_import java.lang.System; 5.times { 
jhome = 'java.home'.to_java; puts Benchmark.measure { 1000000.times { 
System.getProperty(jhome) } } }"
  0.981000   0.000000   0.981000 (  0.981000)
  0.895000   0.000000   0.895000 (  0.896000)
  0.525000   0.000000   0.525000 (  0.525000)
  0.471000   0.000000   0.471000 (  0.471000)
  0.452000   0.000000   0.452000 (  0.452000)


(Note I changed the number of iterations from your original.)

--David
Posted by Charles Nutter (headius)
on 2012-10-30 17:39
(Received via mailing list)
Yeah I would like to see us do this at some point. It's tricky because
we have so much code built around byte[] (for obvious reasons) that
many/most things you might do to manipulate or search a String would
want to use the raw bytes anyway. We also cache (inconsistently) the
java.lang.String object created, which helps reduce overhead when
we're not recreating the Ruby side every time as well.

I wish Ruby had just made literal Strings be immutable or
something...would make our job a lot easier :)

- Charlie
Posted by Christian MICHON (Guest)
on 2012-10-30 19:23
(Received via mailing list)
I've used byte access with jdbc on H2 database (java based SQL
server). I was storing md5sum in bytes using Sequel as ORM, then later
on went down to jdbc prepared statements.



On 10/29/12, Charles Oliver Nutter <headius@headius.com> wrote:
>
>> wire and don't bother to translate anything.  This obviously means not
>>> java.lang.String could return a ruby String in UTF-16 encoding, wrapped
>>> On Sun, 2012-10-28 at 12:12 -0500, Thomas E Enebo wrote:
>> ---------------------------------------------------------------------
>
>
>


--
Christian
Posted by Tim Uckun (Guest)
on 2012-10-30 21:53
(Received via mailing list)
> I wish Ruby had just made literal Strings be immutable or
> something...would make our job a lot easier :)
>


Please excuse my stupidity but could you not create a ImmutableString
type or something?
Posted by Charles Nutter (headius)
on 2012-10-31 03:05
(Received via mailing list)
What was your experience wrt performance? I would hope the bytes
coming out of there are as close to "off the wire" as possible, but I
have not looked into the implementation of any specific JDBC driver to
see if that's the case.

If we could get close-to-the-wire bytes to use for AR-JDBC, it could
mean a tremendous reduction in object overhead for walking a result
set.

- Charlie

On Tue, Oct 30, 2012 at 1:21 PM, Christian MICHON
Posted by Charles Nutter (headius)
on 2012-10-31 03:06
(Received via mailing list)
On Tue, Oct 30, 2012 at 3:53 PM, Tim Uckun <timuckun@gmail.com> wrote:
> Please excuse my stupidity but could you not create a ImmutableString
> type or something?

We certainly could, but no existing code uses it. If we were to use it
for literal strings, all code that expects a literal string in code to
create a new mutable String object would start to fail.

Mutable Strings by default just seems to be a bad decision.

- Charlie
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.