1.9 significantly slower than 1.8 on Mac

I know 1.9.0 is still a work in progress, but I was expecting to see
some benefit from YARV already. However, this simple script showing a
typical bit of logfile processing takes over 70% LONGER under Ruby 1.9
on a Mac Pro:

{6AADF426-0D0C-4C20-A027-06A6DC8C6CE2}; src: 172.25.40.88; dst:
500000.times do
src=Hash.new(0)
logfile.each_line do |line|
if (m=line.match /src: (.*?);/)
src[m[1]]= src[m[1]] +1
end
end
end
end
end

$ ruby -v

ruby 1.8.6 (2007-09-24 patchlevel 111) [universal-darwin9.0]

$ ruby logreport2.rb

              user     system      total        real

Simple: 7.960000 0.000000 7.960000 ( 7.965567)

$ ruby19 -v

ruby 1.9.0 (2007-12-25 revision 14709) [i686-darwin9.1.0]

$ ruby19 logreport2.rb

Derek C. pisze:

$ ruby19 logreport2.rb

              user     system      total        real

Simple: 13.750000 0.010000 13.760000 ( 13.754742)

Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.
e.g: /src: (.*?);/ compiles down to:

[exact5:src: ]
[mem-start:1]
[jump:(1)]
[anychar]
[push:(-6)]
[mem-end:1]
[exact1:;]
[end]

seemingly simple match:

/src: (.*?);/ =~ “src: 172.25.20.79;”

turns out to:

0> "src: 17..."         [exact5:src: ]
5> "172.25...."         [mem-start:1]
5> "172.25...."         [jump:(1)]
5> "172.25...."         [push:(-6)]
5> "172.25...."         [mem-end:1]
5> "172.25...."         [exact1:;]
5> "172.25...."         [anychar]
6> "72.25.2..."         [push:(-6)]
6> "72.25.2..."         [mem-end:1]
6> "72.25.2..."         [exact1:;]
6> "72.25.2..."         [anychar]
7> "2.25.20..."         [push:(-6)]
7> "2.25.20..."         [mem-end:1]
7> "2.25.20..."         [exact1:;]
7> "2.25.20..."         [anychar]
8> ".25.20...."         [push:(-6)]
8> ".25.20...."         [mem-end:1]
8> ".25.20...."         [exact1:;]
8> ".25.20...."         [anychar]
9> "25.20.7..."         [push:(-6)]
9> "25.20.7..."         [mem-end:1]
9> "25.20.7..."         [exact1:;]
9> "25.20.7..."         [anychar]

10> “5.20.79…” [push:(-6)]
10> “5.20.79…” [mem-end:1]
10> “5.20.79…” [exact1:;]
10> “5.20.79…” [anychar]
11> “.20.79;” [push:(-6)]
11> “.20.79;” [mem-end:1]
11> “.20.79;” [exact1:;]
11> “.20.79;” [anychar]
12> “20.79;” [push:(-6)]
12> “20.79;” [mem-end:1]
12> “20.79;” [exact1:;]
12> “20.79;” [anychar]
13> “0.79;” [push:(-6)]
13> “0.79;” [mem-end:1]
13> “0.79;” [exact1:;]
13> “0.79;” [anychar]
14> “.79;” [push:(-6)]
14> “.79;” [mem-end:1]
14> “.79;” [exact1:;]
14> “.79;” [anychar]
15> “79;” [push:(-6)]
15> “79;” [mem-end:1]
15> “79;” [exact1:;]
15> “79;” [anychar]
16> “9;” [push:(-6)]
16> “9;” [mem-end:1]
16> “9;” [exact1:;]
16> “9;” [anychar]
17> “;” [push:(-6)]
17> “;” [mem-end:1]
17> “;” [exact1:;]
18> “” [end]
match at 0

The bottleneck here is the anychar (OP_ANYCHAR in regexec.c) instruction
which uses ‘enclen’ and ‘is_mbc_newline’ (which are function pointers,
so impossible to be inlined).

Joni goes a bit further and uses a bunch of singlebyte specialized
opcodes when it detects singlebyte encoding being used.

lopex

Marcin Mielżyński wrote:

The bottleneck here is the anychar (OP_ANYCHAR in regexec.c) instruction
which uses ‘enclen’ and ‘is_mbc_newline’ (which are function pointers,
so impossible to be inlined).

Joni goes a bit further and uses a bunch of singlebyte specialized
opcodes when it detects singlebyte encoding being used.

FYI, for those not following it, Joni is the new regular expression
engine for JRuby. It is a port of Oniguruma, the regexp engine for Ruby
1.9.

  • Charlie

On 13 Jan 2008, at 02:15, Marcin Mielżyński wrote:

Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.

So Ruby 1.9.0’s Regexp is slower than 1.8’s?
Damn - All my Ruby scripts make extensive use of Regexp.
Will this always be the case, or might 1.9.1 be worth me upgrading to?

Thomas H. pisze:

When those features are not used they don’t affect performance at all
(for matching). The main reasons of bottlenecks are the encoding
agnostic functions and main switch execution instead of inner loops in
some cases.

Oniguruma’s pretty mature by now, I wouldn’t expect the general
performance profile to change that significantly, although the recent
port to Java might trigger some changes.

Joni jvm bytecode compiler is on the way (now it uses int[] array for
the compiled code), it should greatly surpass Oniguruma performance once
it’s finished (it already does for some patterns even without the asm
compiler).

lopex

On 14 Jan 2008, at 17:03, Thomas H. wrote:

On 13 Jan 2008, at 02:15, Marcin Miel??y??ski wrote:

Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.
So Ruby 1.9.0’s Regexp is slower than 1.8’s?

I wouldn’t be too worried; I’ve used it for well over a year and not
really noticed any significant performance issues. In the worst case,
alternative engines could always be supported through extensions.

In your usage maybe. But in my experiments with 1.9, ALL my scripts
[all of them using Regexp extensively to parse files] are a minimum
of 40% slower, and the simple script in my original post takes 70%
more time to run.

On 13 Jan 2008, at 02:15, Marcin Miel??y??ski wrote:

Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.

So Ruby 1.9.0’s Regexp is slower than 1.8’s?

In some cases, probably; it does quite a bit more, not just in encoding
support, but more regexp features. e.g, from a quick glance through
サービス終了のお知らせ :

Zero-width lookbehind assertions (/(?<=foo)bar/ =~ ‘foobar’ # $& =>
“bar”)

Named captures (/(?bar))/ =~ ‘foobar’ # $~[:foo] => “bar”)

Fancier backreferences (/(?foo|bar)\k/ =~ “foobar” # => nil,
… =~ ‘foofoo’ # $& => “foofoo”)

The ability to call subexpressions
(/(?foo|bar)\g/ =~ “foobar” # $& => “foobar”, $2 => bar)

You might consider installing a separate Ruby 1.8 with Oniguruma patched
in, or installing the Oniguruma gem, so you can test the performance
without bringing YARV into the equation.

Damn - All my Ruby scripts make extensive use of Regexp. Will this
always be the case, or might 1.9.1 be worth me upgrading to?

Oniguruma’s pretty mature by now, I wouldn’t expect the general
performance profile to change that significantly, although the recent
port to Java might trigger some changes.

I wouldn’t be too worried; I’ve used it for well over a year and not
really noticed any significant performance issues. In the worst case,
alternative engines could always be supported through extensions.