Ruby extensions question

ktheory · June 21, 2007, 7:22pm

So I’m getting slightly confused with writing ruby extensions and what
is best for performance.

I’m working on writing an extension for Ruby, but almost everything i’m
doing at the C level is with rb_* functions using the Ruby C API. I
understand how it works but isn’t that just as slow as using Ruby? Or
no?

thanks

ktheory · June 21, 2007, 8:01pm

On Jun 21, 10:22 am, Aaron S. [email protected] wrote:

–
Posted viahttp://www.ruby-forum.com/.

The performance improvement is not found in the Ruby interface to C
but rather in what is going on in the C code itself.

Example:

I write C code that does something computationally intensive - say it
takes about 1 hour to run. If I were to write a Ruby equivalent (only
in Ruby) then let’s say that it takes 2 hours. But if I extend that C
code using the Ruby C API then my performance would be 1 hour plus
interface (rb_*) overhead - maybe on the order of a few seconds or
perhaps minutes.

ktheory · June 21, 2007, 9:29pm

On Fri, 22 Jun 2007 03:55:11 +0900, Aaron S.
[email protected] wrote:

One more question. Maybe this will end up having the same answer… But
let’s say I have an object that I pass to the ruby extension. and all
operations in the C code i’m doing are on that object passed using
rb_funcall, rb_iv_get, etc. is it still seeing performance gains with C?

Usually not much. Local variables are a little cheaper, and there’s no
AST-walking for evaluation, but that’s about the extent of it. Ruby
method calls are a lot more expensive, so the big wins are when you can
avoid Ruby method calls (i.e. rb_funcall) altogether. Otherwise, it’s
just not worth the maintainability and portability problems.

-mental

ktheory · June 21, 2007, 8:55pm

unknown wrote:

On Jun 21, 10:22 am, Aaron S. [email protected] wrote:

–
Posted viahttp://www.ruby-forum.com/.

The performance improvement is not found in the Ruby interface to C
but rather in what is going on in the C code itself.

Example:

I write C code that does something computationally intensive - say it
takes about 1 hour to run. If I were to write a Ruby equivalent (only
in Ruby) then let’s say that it takes 2 hours. But if I extend that C
code using the Ruby C API then my performance would be 1 hour plus
interface (rb_*) overhead - maybe on the order of a few seconds or
perhaps minutes.

Thanks that helps.

One more question. Maybe this will end up having the same answer… But
let’s say I have an object that I pass to the ruby extension. and all
operations in the C code i’m doing are on that object passed using
rb_funcall, rb_iv_get, etc. is it still seeing performance gains with C?

thanks

ktheory · June 21, 2007, 10:28pm

MenTaLguY wrote:

On Fri, 22 Jun 2007 03:55:11 +0900, Aaron S.
[email protected] wrote:

One more question. Maybe this will end up having the same answer… But
let’s say I have an object that I pass to the ruby extension. and all
operations in the C code i’m doing are on that object passed using
rb_funcall, rb_iv_get, etc. is it still seeing performance gains with C?

Usually not much. Local variables are a little cheaper, and there’s no
AST-walking for evaluation, but that’s about the extent of it. Ruby
method calls are a lot more expensive, so the big wins are when you can
avoid Ruby method calls (i.e. rb_funcall) altogether. Otherwise, it’s
just not worth the maintainability and portability problems.

-mental

Thanks, helps more. I think where I’m going with this is that its best
to write a C program that does everything needed. Then come back and
extend Ruby with those pieces. Instead of worrying about Ruby
integration now, wait till the straight C functionality is there… Then
there should only be slight overhead of a few rb_* calls.

ktheory · June 21, 2007, 11:47pm

MenTaLguY wrote:

On Fri, 22 Jun 2007 05:28:17 +0900, Aaron S.
[email protected] wrote:

Thanks, helps more. I think where I’m going with this is that its best
to write a C program that does everything needed. Then come back and
extend Ruby with those pieces. Instead of worrying about Ruby
integration now, wait till the straight C functionality is there… Then
there should only be slight overhead of a few rb_* calls.

I’d actually say it’s the other way around – write the program in Ruby,
then
profile it to see where the slow spots are. Optimize those, and when
you’ve run out of things to optimize in Ruby, then see if you can cut
down
on method calls in critical places by pushing things into C. if you
need
to (you may not)

Trying to optimize too early just makes life harder for you, and there
are a
surprising number of gotchas when embedding Ruby in a C program (rather
than
vice-versa); the Ruby interpreter is not the most embedding-friendly.

-mental

Thanks, Yes I have the ruby application RubyAMF already done. But I’m
trying to optimize the AMF de/serialization and figured C would be the
best solution. So now I just need write the C instead of worrying about
Ruby/AMF integration… Once the C is working, tie that back into
Ruby/AMF.

ktheory · June 22, 2007, 1:18am

On Fri, 22 Jun 2007 06:47:22 +0900, Aaron S.
[email protected] wrote:

Thanks, Yes I have the ruby application RubyAMF already done. But I’m
trying to optimize the AMF de/serialization and figured C would be the
best solution. So now I just need write the C instead of worrying about
Ruby/AMF integration… Once the C is working, tie that back into
Ruby/AMF.

Ah, {de,}serialization code is one of those places where C is more often
helpful.

That said, have you done profiling and taken advantage of all of the
pure Ruby optimization opportunities you’ve had first?

Performance bottlenecks can happen in very unexpected places, and
heavily optimizing
one part of the code won’t help you much if 90% of the execution time is
happening
in a different part.

Another question to consider is if you are interested in supporting
other Ruby
implementations (e.g. JRuby or Ruby.NET). If so, then it’s best to
minimize the
amount of code you push into C.

Along those lines, profiling may highlight one or two specific methods
that need
to be done in C, versus rewriting a large part of the application.

-mental

ktheory · June 21, 2007, 10:33pm

On Fri, 22 Jun 2007 05:28:17 +0900, Aaron S.
[email protected] wrote:

Thanks, helps more. I think where I’m going with this is that its best
to write a C program that does everything needed. Then come back and
extend Ruby with those pieces. Instead of worrying about Ruby
integration now, wait till the straight C functionality is there… Then
there should only be slight overhead of a few rb_* calls.

I’d actually say it’s the other way around – write the program in Ruby,
then
profile it to see where the slow spots are. Optimize those, and when
you’ve run out of things to optimize in Ruby, then see if you can cut
down
on method calls in critical places by pushing things into C. if you
need
to (you may not)

Trying to optimize too early just makes life harder for you, and there
are a
surprising number of gotchas when embedding Ruby in a C program (rather
than
vice-versa); the Ruby interpreter is not the most embedding-friendly.

-mental

ktheory · June 22, 2007, 4:49am

MenTaLguY wrote:

On Fri, 22 Jun 2007 06:47:22 +0900, Aaron S.
[email protected] wrote:

Thanks, Yes I have the ruby application RubyAMF already done. But I’m
trying to optimize the AMF de/serialization and figured C would be the
best solution. So now I just need write the C instead of worrying about
Ruby/AMF integration… Once the C is working, tie that back into
Ruby/AMF.

Ah, {de,}serialization code is one of those places where C is more often
helpful.

That said, have you done profiling and taken advantage of all of the
pure Ruby optimization opportunities you’ve had first?

Performance bottlenecks can happen in very unexpected places, and
heavily optimizing
one part of the code won’t help you much if 90% of the execution time is
happening
in a different part.

Another question to consider is if you are interested in supporting
other Ruby
implementations (e.g. JRuby or Ruby.NET). If so, then it’s best to
minimize the
amount of code you push into C.

Along those lines, profiling may highlight one or two specific methods
that need
to be done in C, versus rewriting a large part of the application.

-mental

Hm. I haven’t thought about JRuby or Ruby.NET. Don’t know if it’s
something I should worry about. I’ve done some profiling and overall the
application is pretty well tuned. The times when the application
performs slowly is when returning thousands of records from mysql. When
it’s in the serializing code.

Here are a couple ruby-prof call graphs that illustrate the timeliness
of mysql VS returning a string

returning 3000 mysql records:
http://blog.rubyamf.org/profiling/serialize/mysql_3000_flat.txt
http://blog.rubyamf.org/profiling/serialize/mysql_3000_graph.txt

returning 1 string:
http://blog.rubyamf.org/profiling/serialize/string_flat.txt
http://blog.rubyamf.org/profiling/serialize/string_graph.txt

returning some number
http://blog.rubyamf.org/profiling/serialize/bignum_flat.txt
http://blog.rubyamf.org/profiling/serialize/fixnum_flat.txt
http://blog.rubyamf.org/profiling/serialize/float_flat.txt

You can start to see processing times increase in the mysql one at the
end of flat version… Take a look, let me know what you think. I think
the performance gains would definitely be seen writing the
de/serializers with C…

Thanks for talking shop.
-Aaron

ktheory · June 22, 2007, 6:45am

On Fri, 2007-06-22 at 11:49 +0900, Aaron S. wrote:

Here are a couple ruby-prof call graphs that illustrate the timeliness
of mysql VS returning a string

Yeah, it does indeed look like the serializer is a good candidate for a
C rewrite, which should remove the need for a lot of method calls.

I was also concerned about RUBYAMF::Util::BinaryString, given its heavy
use of delegation (which is quite expensive in principle), except
looking at the profile it doesn’t seem to contribute to the times
anywhere near as much.

Keep up the good work,

-mental

ktheory · June 22, 2007, 6:58am

MenTaLguY wrote:

On Fri, 2007-06-22 at 11:49 +0900, Aaron S. wrote:

Here are a couple ruby-prof call graphs that illustrate the timeliness
of mysql VS returning a string

Yeah, it does indeed look like the serializer is a good candidate for a
C rewrite, which should remove the need for a lot of method calls.

I was also concerned about RUBYAMF::Util::BinaryString, given its heavy
use of delegation (which is quite expensive in principle), except
looking at the profile it doesn’t seem to contribute to the times
anywhere near as much.

Keep up the good work,

-mental

I’ve been looking through these profiles a bit more. Look at this line:
33.45 11.50 11.31 11.31 0.00 5159152 0.00 0.00
String#==
(a little more than half way down the page).

After looking over it more and more, the de/serialization times don’t
seem that bad, there is a ton of time being spent in the String#==
though.

5 million calls? that’s nuts. I’m wondering if part of all those calls
is from ruby-prof itself. If not, It’s strange to me that the == is
being called so much. hmm. String#== has the most calls out of any
method call in the profile.

Do you think the de/serialization times could be noticeably improved?
Compared to trying to figure out why the String#== is called so much?

Any thoughts?

ktheory · June 22, 2007, 5:40am

oh, if you haven’t seen ruby-prof or call graphs… you can google “ruby
call graphs”… here is a good tutorial as well…
http://ruby-prof.rubyforge.org/graph.txt
-Aaron

ktheory · June 22, 2007, 5:47pm

On Fri, 22 Jun 2007 13:58:45 +0900, Aaron S.
[email protected] wrote:

Do you think the de/serialization times could be noticeably improved?
Compared to trying to figure out why the String#== is called so much?

I didn’t have much time to look at the profile carefully last night, but
the
impression I got was that a lot of the string comparison that was
happening
was coming from WEBrick.

(I would be curious how much difference using Mongrel versus using
WEBrick makes.)

-mental

ktheory · June 22, 2007, 6:00pm

MenTaLguY wrote:

On Fri, 22 Jun 2007 13:58:45 +0900, Aaron S.
[email protected] wrote:

Do you think the de/serialization times could be noticeably improved?
Compared to trying to figure out why the String#== is called so much?

I didn’t have much time to look at the profile carefully last night, but
the
impression I got was that a lot of the string comparison that was
happening
was coming from WEBrick.

(I would be curious how much difference using Mongrel versus using
WEBrick makes.)

-mental

Interesting, I’ll get that set up and test some more. I’ve also got
LightTPD working, I’ll do some profiling with that as well.

Thanks

ktheory · June 25, 2007, 11:01pm

Aaron S. wrote:

MenTaLguY wrote:

On Fri, 22 Jun 2007 13:58:45 +0900, Aaron S.
[email protected] wrote:

Do you think the de/serialization times could be noticeably improved?
Compared to trying to figure out why the String#== is called so much?

I didn’t have much time to look at the profile carefully last night, but
the
impression I got was that a lot of the string comparison that was
happening
was coming from WEBrick.

(I would be curious how much difference using Mongrel versus using
WEBrick makes.)

-mental

Interesting, I’ll get that set up and test some more. I’ve also got
LightTPD working, I’ll do some profiling with that as well.

Thanks

Hey mental,

Here is a Mongrel code profile of that same mysql cal that returns 3000
records…
http://blog.rubyamf.org/profiling/mongrel_mysql3000_graph.txt

the String#== method is still called 5 million times. Thats strange.
I’ve been trying to figure out where its coming from but can’t quite pin
point it. I’ll be poking around some more… Anymore thoughts?

Also, what do you think about all the delegation that is happening to
the BinaryString Mixins? Seems like it’s not a huge performance hit but
generally there are 3-4 method calls before any actual BinaryStirng
operation is happening (such as writing the packed bytes onto the
string)… I know someone had mentioned that method calls in Ruby are
taxing. Any thoughts?

Thanks man
Aaron

ktheory · July 10, 2007, 6:07am

Hi Aaron,

Here is a Mongrel code profile of that same mysql cal that returns 3000
records…
http://blog.rubyamf.org/profiling/mongrel_mysql3000_graph.txt

the String#== method is still called 5 million times. Thats strange.
I’ve been trying to figure out where its coming from but can’t quite pin
point it. I’ll be poking around some more… Anymore thoughts?

Not sure if you still need this. But he newest version of ruby-prof,
0.5.0, keeps track of where code is called from. The easiest way to see
this is create an graph html report and look at the last column, which
is called line. It is a hyperlink, and if you click on it, it will jump
you to the correct file. Unfortunately browsers won’t jump to the right
line in a text file, but look at the end of the url and you can see what
the line numbers is.

Hope this helps,

Charlie

ktheory · June 22, 2007, 6:59am

I’ve been looking through these profiles a bit more. Look at this line:
33.45 11.50 11.31 11.31 0.00 5159152 0.00 0.00
String#==
(a little more than half way down the page).

oh sorry. that String#== line is in the mysql_3000_flat.txt profile