Any tricks to speed up ruby?

So … I assume you tried

$ export CFLAGS=‘march=ppc -O3’ # march actually doesn’t work on mac’s
$ ./configure
$ make

Wow that does indeed help (once figured out). for me on my G4 it was
(after I figured out I had a 7450 processor)
export CFLAGS=’-mtune=7450 -mcpu=7450 -fast -fPIC’

and compile with --disable-pthread

and voila, a faster Ruby (not sure if the real reason was the compiler
flags, the pthread, or the updated version from p110 to p114, but
something helped it–I’m guessing it was the compiler options).

new ruby with compiler options:
time ./ruby -e “10000000.times {}”
real 0m4.049s

old ruby (the mac osx port one):
time ruby -e “10000000.times {}”
real 0m5.400s

Wow it’s a wonder they don’t automatically set these up at compile time
to be optimized since they’re so helpful.
Unfortunately gcc on mac’s doesn’t seem to have the -fwhole_program (at
least mine doesn’t), so will have to wait to try that one till I’m back
in x86 land. Thanks for your help!
-R

On Wed, 16 Jan 2008, M. Edward (Ed) Borasky wrote:

It’s an interesting thought. However, I wasn’t able to get gcc 4.2.2 to do
some simpler things, like profile-based optimization, on the Ruby source, so
I wouldn’t expect something that complex to work out of the box. There are a
lot of great things in gcc, but not many of them are as well tested as, say,
the standard modular C library or program, and, of course, the Linux kernel.

I’m not sure thats simpler… I also looked at that once and decided
there were some distinctly unsimple things going on.

If I’m right about kde, then yes, its a very well tested feature. Also
there would be major impetus from embedded systems users to use whole
program optimization features.

As far as I know, “-O3 -march=” is about the best you
can get out of gcc without a lot of work.

In the “medium work” category I suspect there are things relating to
function attributes and builtins that could get 5% or so more juice
(but tend to clutter the code with unportable improvements).

And there are a lot more things you can do at the Ruby source level
that have a bigger payoff than that does.

As always. My 100-10-1 rule of thumb is to expect speed up factors of up
to about 100x for using a much better algorithm, factors up to about
10x for code tweaks, factors up to 2x but usually near 1x for compiler
optimization tweaks.

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : [email protected]
New Zealand

On Wed, Mar 12, 2008 at 04:58:27PM +0900, Roger P. wrote:

Wow that does indeed help (once figured out). for me on my G4 it was
(after I figured out I had a 7450 processor)
export CFLAGS=’-mtune=7450 -mcpu=7450 -fast -fPIC’

and compile with --disable-pthread

and voila, a faster Ruby (not sure if the real reason was the compiler
flags, the pthread, or the updated version from p110 to p114, but
something helped it–I’m guessing it was the compiler options).

I suspect --disable-pthread had the largest impact. The cost of memory
allocations can be high when linking with the threading library. Re-run
your tests without the other options if you want to verify.

I’m also surprised you got improved performance with -fPIC. I thought
position-independant code was supposed to run slower, usually.

Paul

On 12/03/2008, Paul B. [email protected] wrote:

I suspect --disable-pthread had the largest impact. The cost of memory
allocations can be high when linking with the threading library. Re-run
your tests without the other options if you want to verify.

I’m also surprised you got improved performance with -fPIC. I thought
position-independant code was supposed to run slower, usually.

I suspect this is pretty much noop as most of the code is in libruby
anyway and it usually has to be compiled with -fPIC to link at all.

Thanks

Michal

On Thu, Mar 13, 2008 at 12:34:31AM +0900, Michal S. wrote:

I suspect this is pretty much noop as most of the code is in libruby
anyway and it usually has to be compiled with -fPIC to link at all.

Perhaps it depends on the platform, but on x86 linux, libruby is a
static library by default unless --enable-shared is used.

Paul

James B. wrote:

E) None of the above
Use Pascal.

A recent thread showed how slow Ruby is at generating
a string in which each character in the original string is
duplicated.

// Pascal (FreePascal)
uses sysutils { for timing }, strutils { for DupeString };

function dup_chars( var s: ansistring ): ansistring;
var
out: ansistring;
i: longint;
c: char;
begin
setlength( out, length(s) * 2 );
for i := 1 to length(s) do
begin
c := s[i];
out[2i-1] := c;
out[2
i] := c;
end;
exit( out )
end;

var
s : ansistring;
when : tDateTime;
begin
s := dupeString( ‘abc’, 1000000 );
when := Time;
dup_chars( s );
writeln( ((time - when) * secsPerDay):0:3, ’ seconds’ )
end.

and voila, a faster Ruby (not sure if the real reason was the compiler
flags, the pthread, or the updated version from p110 to p114, but
something helped it–I’m guessing it was the compiler options).

new ruby with compiler options:
time ./ruby -e “10000000.times {}”
real 0m4.049s

old ruby (the mac osx port one):
time ruby -e “10000000.times {}”
real 0m5.400s

So turns out that the difference seems to be that between p110 and p114.
compiler options were maybe 0.2s difference.

Some interesting benchmarks:

p110
5.4s

p111
5.23s

p114
4.1s

latest stable snapshot from the ruby-lang page:
5.8s

[latest snapshot build doesn’t run since it appears to be based on 1.9
(?) ]

Anybody have any idea what might going on here? All compiled similarly,
p114 seems to smoke the rest.

Thanks.
-R

[Fri Mar 21 15:39:11 ~/Downloads/ruby-1.8.6-p114 ]$ time ./ruby -e
“10000000.times {}”

real 0m4.143s
user 0m3.601s
sys 0m0.026s
[Fri Mar 21 15:39:18 ~/Downloads/ruby-1.8.6-p114 ]$ time ruby -e
“10000000.times {}” # ‘normal’ ruby p111

real 0m5.742s
user 0m4.616s
sys 0m0.063s

On Sat, Mar 22, 2008 at 06:40:26AM +0900, Roger P. wrote:

Anybody have any idea what might going on here? All compiled
similarly, p114 seems to smoke the rest.

I’m guessing it’s the way you built it; these are the only changes
listed in the ChangeLog between p111 and p114:

Mon Mar 3 23:34:13 2008 GOTOU Yuuzou [email protected]

  • lib/webrick/httpservlet/filehandler.rb: should normalize path
    separators in path_info to prevent directory traversal attacks
    on DOSISH platforms.
    reported by Digital Security Research Group [DSECRG-08-026].

  • lib/webrick/httpservlet/filehandler.rb: pathnames which have
    not to be published should be checked case-insensitively.

Mon Dec 3 08:13:52 2007 Kouhei S. [email protected]

  • test/rss/test_taxonomy.rb, test/rss/test_parser_1.0.rb,
    test/rss/test_image.rb, test/rss/rss-testcase.rb: ensured
    declaring XML namespaces.

Paul

On Tue, Mar 25, 2008 at 06:14:38AM +0900, Roger P. wrote:

The only truly slow one appears to be the macPort version. I don’t know
what compile flags they are using but it appears truly slower.

ruby -rrbconfig -e ‘puts Config::CONFIG[“configure_args”]’

Paul

Paul B. wrote:

I’m guessing it’s the way you built it; these are the only changes
listed in the ChangeLog between p111 and p114:

Yep you were right on.

stable branch is fast

[Mon Mar 24 15:09:40 ~/Downloads/ruby_stable ]$ time ./ruby -e
“10000000.times {}”

real 0m4.222s

p111 is fast
time /usr/bin/ruby_old -e “10000000.times {}”

real 0m4.276s

and the rest in between similarly are.

The only truly slow one appears to be the macPort version. I don’t know
what compile flags they are using but it appears truly slower.

time ruby -e “10000000.times {}”

real 0m5.710s
(consistently)
Despite that they both have similar startup speeds.

Thanks for pointing that out!