Basic Ruby performance

dubstep · February 2, 2012, 11:18pm

Hello all!

First, I should point out that I’m new to Ruby, although it seems pretty
similar in some regards to JavaScript and Perl.

Anyway, I’m not sure if it’s normal, or if it’s specifics of Ruby on Mac
OS X, or if I haven’t compiled it properly (although I used rvm to
install on my OS X Lion, to have the latest version - Lion comes with
1.8.7 by default, I believe, so I installed 1.9.3), but most things I
try to replicate with it that I used Perl to do before run about twice
slower. So I ran some basic benchmarks. Here’s one example:

Ruby:

for a in 0…1E8
a*2
end

Perl:

for $a ( 0…1E8 ) {
$a*2
}

Ruby takes 22 seconds, Perl - 9 seconds to execute this. This is very
similar to all other scenarios I tried (one of which is splitting
millions of comma separated rows into arrays).

I would really appreciate any useful suggestions: I would LOVE to be
able to use Ruby for most of the stuff I do (it’s not that I don’t like
Perl, but I love Ruby’s syntax )

Thanks!

dniq · February 2, 2012, 11:55pm

Here’s another example with significantly bigger performance difference:

Ruby:

s = “This is a test string”

re = Regexp.new( / test / )

for a in 0…1E7
re.match( s )
end

Perl:

my $s = “This is a test string”;

for my $a ( 0…1E7 ) {
$s =~ / test /;
}

Perl takes about 1.5 seconds to execute this, while Ruby takes a
whopping 16!!! :((( I have a very strong feeling that I didn’t compile
Ruby properly - there can’t be such a huge difference in regexp matching

dniq · February 3, 2012, 12:16am

On Feb 2, 2012, at 14:20 , Dmitry N. wrote:

}

Ruby takes 22 seconds, Perl - 9 seconds to execute this. This is very
similar to all other scenarios I tried (one of which is splitting
millions of comma separated rows into arrays).

I would really appreciate any useful suggestions: I would LOVE to be
able to use Ruby for most of the stuff I do (it’s not that I don’t like
Perl, but I love Ruby’s syntax )

Choosing the right language is a lot less important than choosing the
right algorithm:

5461 % time ruby -e ‘n = 108; p (n + 3*n2 + 2*n**3)/6’
333333338333333350000000

real 0m0.009s
user 0m0.004s
sys 0m0.004s

In most cases (depending on the domain, of course (*)), ruby is “fast
enough”. Often, with my slower ruby, I’ll finish coding long before you
would in your faster language. This coding-time difference is usually
sufficient to deal with run-time differences.

*) your domain is fast enough unless you work for the IRS, NASA,
wallstreet **, or pixar ***.
**) sufficient examples exist to show that those domains are also fast
enough.
***) prolly not here tho.

dniq · February 3, 2012, 12:20am

On Feb 2, 2012, at 14:55 , Dmitry N. wrote:

end
whopping 16!!! :((( I have a very strong feeling that I didn’t compile
Ruby properly - there can’t be such a huge difference in regexp matching

It’s all the parens, whitespace, and use of tabs that slows ruby down:

takes 26.6 seconds on my laptop:

s = “This is a test string”

re = Regexp.new( / test / )

for a in 0…1E7
re.match( s )
end

takes 8.67 seconds on my laptop:

s = “This is a test string”

for a in 0…1E7
s =~ / test /
end

dniq · February 3, 2012, 12:36am

On 02/02/2012 05:20 PM, Ryan D. wrote:

}

Perl takes about 1.5 seconds to execute this, while Ruby takes a
whopping 16!!! :((( I have a very strong feeling that I didn’t compile
Ruby properly - there can’t be such a huge difference in regexp matching

It’s all the parens, whitespace, and use of tabs that slows ruby down:

Ryan is being a little facetious about the parenthesis and whitespace in
case that isn’t clear. He has strong preferences about coding style.

Your test above runs in about 10 seconds on my system under Ruby 1.9.2.
The following equivalent code runs in about 6 seconds and is fairly
idiomatic Ruby:

s = “This is a test string”
(0…1E7).each do
s =~ / test /
end

This code runs in about 4 seconds, but it is a bit less pretty to my
eyes:

s = “This is a test string”
i = 0
while i < 1E7 do
s =~ / test /
i += 1
end

I’m sure there are other solutions as well. The thing to keep in mind
is that method calls in Ruby are relatively expensive, so if you need
speed, you should try to avoid them.

Don’t get hung up on micro benchmarks like the above though! They can
really be deceiving with respect to real world applications.

-Jeremy

dniq · February 3, 2012, 2:02am

On Feb 2, 2012, at 15:33 , Peter V. wrote:

The same “formatted” code with just replacing re.match( s) by
s =~ /test/ also causes the same change from 22 to 7 seconds
on my system (with the same formatting, spaces, etc.).

You’re replacing a method call (a.match b) with a syntactic construct a
=~ b, the latter of which bypasses method dispatch and goes straight to
the C-implimentation. Nothing else is really different, just a more
direct code path. The match data is still available via the usual
globals.

IMHO re.match is just as useless as Regexp.new, Array.new, and Hash.new
(assuming no args/blocks passed). They’re throwbacks to java devs and
serve no purpose but to make things more verbose. In this specific case,
there are tangible reasons to use =~ over #match.

dniq · February 3, 2012, 2:04am

On Feb 2, 2012, at 15:36 , Jeremy B. wrote:

Don’t get hung up on micro benchmarks like the above though! They can
really be deceiving with respect to real world applications.

I tried to drive that point home by showing a ruby solution that was
1000x faster than his perl solution, but unfortunately, rationality and
micro-benchmarking don’t often play well together.

dniq · February 3, 2012, 12:34am

On Fri, Feb 3, 2012 at 12:20 AM, Ryan D.
[email protected]wrote:

}

Perl takes about 1.5 seconds to execute this, while Ruby takes a
whopping 16!!! :((( I have a very strong feeling that I didn’t compile
Ruby properly - there can’t be such a huge difference in regexp matching

It’s all the parens, whitespace, and use of tabs that slows ruby down:

Euhmmm, I doubt that …

takes 8.67 seconds on my laptop:

s = “This is a test string”

for a in 0…1E7
s =~ / test /
end

The same “formatted” code with just replacing re.match( s) by
s =~ /test/ also causes the same change from 22 to 7 seconds
on my system (with the same formatting, spaces, etc.).

I rather expect it’s because

match and =~ do quite different things …

match returns a complete MatchData object

=~ returns the index (position) of the first match

017:0> re.match( s )
=> #<MatchData " test ">
018:0> s =~ /test/
=> 10

Maybe (speculation) the MatchData object takes more dynamic Object allocation and thus more calls to the GC ?

HTH,

Peter

dniq · February 3, 2012, 2:36am

On Thu, Feb 2, 2012 at 7:01 PM, Ryan D. [email protected]
wrote:

You’re replacing a method call (a.match b) with a syntactic construct a =~
b, the latter of which bypasses method dispatch and goes straight to the
C-implimentation.

Wow, I never knew that. I don’t understand how it accomplishes this, a
could be any kind of object with =~ defined anywhere on it, how can it
bypass method dispatch?

IMHO re.match is just as useless as Regexp.new, Array.new, and Hash.new
(assuming no args/blocks passed).

I usually use meth Hash.new instead of meth({}) I think it looks
cleaner.

dniq · February 3, 2012, 3:04am

On Feb 2, 2012, at 17:36 , Josh C. wrote:

bypass method dispatch?
MAGIC!

The code does extra type-checking at runtime.

IMHO re.match is just as useless as Regexp.new, Array.new, and Hash.new
(assuming no args/blocks passed).

I usually use meth Hash.new instead of meth({}) I think it looks
cleaner.

def meth h = {}

…

end

takes care of this entirely.

dniq · February 3, 2012, 3:19am

Jeremy B. wrote in post #1043805:

(0…1E7).each do
s =~ / test /
end

Hmmm… Didn’t realize this would make difference Thanks!

Don’t get hung up on micro benchmarks like the above though! They can
really be deceiving with respect to real world applications.

Well, I started doing these benchmarks after I’ve tried to rewrite parts
of the project I’m working on in Ruby. The project is rather
complicated, so it seemed as if Ruby’s neat, clean syntax would make it
easier to handle, but the performance was dreadful. Initially I tried
1.8.7 that came natively with OS X Lion, then installed 1.9.3, without
much difference in performance - it’s still mostly multiple times slower
than the Perl version I have The problem with Perl version, though,
is that once it reaches certain limit - it becomes rather hard to manage
(especially so if you focus on performance the most - there are tricks
in Perl that make code run significantly faster, but make it virtually
unreadable).

dniq · February 3, 2012, 3:21am

Ryan D. wrote in post #1043813:

IMHO re.match is just as useless as Regexp.new, Array.new, and Hash.new
(assuming no args/blocks passed). They’re throwbacks to java devs and
serve no purpose but to make things more verbose. In this specific case,
there are tangible reasons to use =~ over #match.

The reason I tried to use Regexp.new is because I figured it would
pre-compile the regexp - the way “qr/ test /” in Perl would do, so that
it doesn’t have to re-compile it on every iteration.

dniq · February 3, 2012, 3:22am

On Fri, Feb 03, 2012 at 10:36:11AM +0900, Josh C. wrote:

I usually use meth Hash.new instead of meth({}) I think it looks
cleaner.

That’s only because it does look cleaner.

dniq · February 3, 2012, 3:14am

Ryan D. wrote in post #1043801:

Choosing the right language is a lot less important than choosing the
right algorithm:

5461 % time ruby -e ‘n = 108; p (n + 3*n2 + 2*n**3)/6’
333333338333333350000000

real 0m0.009s
user 0m0.004s
sys 0m0.004s

My code was merely an example of very simple loop. Its purpose was not
to calculate something, but run through the loop, and execute
multiplication on every iteration.

My main area of development is processing of rather large amounts of
data (billions of entries, primarily processed by regular expressions,
with some statistical analysis on top, and potentially - addition of NLP
later). You have to iterate through every entry of the incoming data
(which might already be in the database, plain text file, or might be
just a “fire hose” of data pouring into the system in real time).

While I’d LOVE to have a nice and clean syntax, performance is still
number 1 on my list of priorities, therefore I asked if maybe there are
ways to improve Ruby performance.

dniq · February 3, 2012, 3:27am

I have to tell, however, that string.split() works much faster in Ruby
than it does in Perl, for some odd reason

dniq · February 3, 2012, 3:40am

On Fri, Feb 03, 2012 at 11:03:28AM +0900, Ryan D. wrote:

On Feb 2, 2012, at 17:36 , Josh C. wrote:

I usually use meth Hash.new instead of meth({}) I think it looks
cleaner.

def meth h = {}

…

end

takes care of this entirely.

. . . unless you have a different default for the method’s argument.

dniq · February 3, 2012, 3:42am

On 2/2/2012 9:14 PM, Dmitry N. wrote:

One thing you can do is to replace for loops with while loops. For loops
in Ruby will be translated to method calls to Enumerable#each, and in
Ruby 1.9, Enumerable#each is slower than using ordinary while loops
because of the overhead of processing enumerators. It is actually even
slower than Ruby 1.8’s Enumerable#each because 1.8 does not have
enumerators.

dniq · February 3, 2012, 3:45am

Su Zhang wrote in post #1043831:

On 2/2/2012 9:14 PM, Dmitry N. wrote:

One thing you can do is to replace for loops with while loops. For loops
in Ruby will be translated to method calls to Enumerable#each, and in
Ruby 1.9, Enumerable#each is slower than using ordinary while loops
because of the overhead of processing enumerators. It is actually even
slower than Ruby 1.8’s Enumerable#each because 1.8 does not have
enumerators.

Hmmm… Thanks! Definitely useful advice!

dniq · February 3, 2012, 3:58am

On 02/02/2012 08:21 PM, Dmitry N. wrote:

Ryan D. wrote in post #1043813:

IMHO re.match is just as useless as Regexp.new, Array.new, and Hash.new
(assuming no args/blocks passed). They’re throwbacks to java devs and
serve no purpose but to make things more verbose. In this specific case,
there are tangible reasons to use =~ over #match.

The reason I tried to use Regexp.new is because I figured it would
pre-compile the regexp - the way “qr/ test /” in Perl would do, so that
it doesn’t have to re-compile it on every iteration.

Everything in Ruby is an object, even regexps, so you can save your
regexp to a variable or a constant to avoid a recompile. In addition,
the // expression is pretty much just syntactic sugar for
Regexp.new(“some string”) or Regexp.new(/some regexp/), so you can
forgoe that noise. The sugar is probably faster too since it should
avoid Ruby method calls, unlike Regexp.new, not that it should be an
issue in this example.

To see if this helps at all, try changing the code to the following:

s = “This is a test string”
re = / test /
for a in 0…1E7
s =~ re
end

Try a similar change to the other looping variations that have been
discussed and see if and how much they may improve. For me I didn’t
really see any difference between using re as above or using the simple
regexp directly; however, the code was almost an order of magnitude
slower when I replaced the comparison as follows:

s =~ / test#{} /

It seems that Ruby is smart enough to see that the simple regexp will
never need to be re-evaluated. The regexp used above must force that
optimization off because #{} while constantly evaluated to the empty
string is technically dynamic, thus the regexp needs to be re-evaluated
in every iteration of the loop.

If you really need performance in the end, however, you might want to
consider coding your critical code paths in something like C and then
calling those from Ruby as a direct extension or using something like
ffi to call into a DLL containing the logic. Your overall code base may
be a little messy, but sometimes the speed you need requires such a
trade-off. Hopefully, you can keep the mess limited to only a small set
of your overall application logic. Of course, the same holds true for
Perl in this regard.

-Jeremy

dniq · February 3, 2012, 4:52am

On Fri, Feb 3, 2012 at 6:20 AM, Dmitry N. [email protected]
wrote:

Ruby:

for a in 0…1E8
a*2
end

omg, careful w big numbers in ruby…

x=Time.now();for a in 0…1E3;end; puts(Time.now-x)
0.000380635

x=Time.now();for a in 0…1E4;end; puts(Time.now-x)
0.00373148

x=Time.now();for a in 0…1E5;end; puts(Time.now-x)
0.029043426

x=Time.now();for a in 0…1E6;end; puts(Time.now-x)
0.201745265

x=Time.now();for a in 0…1E7;end; puts(Time.now-x)
1.939860867

x=Time.now();for a in 0…1E8;end; puts(Time.now-x)
19.276653266

note, the jump…

best regards -botp