I know 1.9.0 is still a work in progress, but I was expecting to see
some benefit from YARV already. However, this simple script showing a
typical bit of logfile processing takes over 70% LONGER under Ruby 1.9
on a Mac Pro:
{6AADF426-0D0C-4C20-A027-06A6DC8C6CE2}; src: 172.25.40.88; dst:
500000.times do
src=Hash.new(0)
logfile.each_line do |line|
if (m=line.match /src: (.*?);/)
src[m[1]]= src[m[1]] +1
end
end
end
end
end
The bottleneck here is the anychar (OP_ANYCHAR in regexec.c) instruction
which uses ‘enclen’ and ‘is_mbc_newline’ (which are function pointers,
so impossible to be inlined).
Joni goes a bit further and uses a bunch of singlebyte specialized
opcodes when it detects singlebyte encoding being used.
The bottleneck here is the anychar (OP_ANYCHAR in regexec.c) instruction
which uses ‘enclen’ and ‘is_mbc_newline’ (which are function pointers,
so impossible to be inlined).
Joni goes a bit further and uses a bunch of singlebyte specialized
opcodes when it detects singlebyte encoding being used.
FYI, for those not following it, Joni is the new regular expression
engine for JRuby. It is a port of Oniguruma, the regexp engine for Ruby
1.9.
On 13 Jan 2008, at 02:15, Marcin Mielżyński wrote:
Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.
So Ruby 1.9.0’s Regexp is slower than 1.8’s?
Damn - All my Ruby scripts make extensive use of Regexp.
Will this always be the case, or might 1.9.1 be worth me upgrading to?
When those features are not used they don’t affect performance at all
(for matching). The main reasons of bottlenecks are the encoding
agnostic functions and main switch execution instead of inner loops in
some cases.
Oniguruma’s pretty mature by now, I wouldn’t expect the general
performance profile to change that significantly, although the recent
port to Java might trigger some changes.
Joni jvm bytecode compiler is on the way (now it uses int[] array for
the compiled code), it should greatly surpass Oniguruma performance once
it’s finished (it already does for some patterns even without the asm
compiler).
On 13 Jan 2008, at 02:15, Marcin Miel??y??ski wrote:
Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.
So Ruby 1.9.0’s Regexp is slower than 1.8’s?
I wouldn’t be too worried; I’ve used it for well over a year and not
really noticed any significant performance issues. In the worst case,
alternative engines could always be supported through extensions.
In your usage maybe. But in my experiments with 1.9, ALL my scripts
[all of them using Regexp extensively to parse files] are a minimum
of 40% slower, and the simple script in my original post takes 70%
more time to run.
On 13 Jan 2008, at 02:15, Marcin Miel??y??ski wrote:
Ruby 1.9 uses Oniguruma which is an encoding agnostic engine.
So Ruby 1.9.0’s Regexp is slower than 1.8’s?
In some cases, probably; it does quite a bit more, not just in encoding
support, but more regexp features. e.g, from a quick glance through サービス終了のお知らせ :
The ability to call subexpressions
(/(?foo|bar)\g/ =~ “foobar” # $& => “foobar”, $2 => bar)
You might consider installing a separate Ruby 1.8 with Oniguruma patched
in, or installing the Oniguruma gem, so you can test the performance
without bringing YARV into the equation.
Damn - All my Ruby scripts make extensive use of Regexp. Will this
always be the case, or might 1.9.1 be worth me upgrading to?
Oniguruma’s pretty mature by now, I wouldn’t expect the general
performance profile to change that significantly, although the recent
port to Java might trigger some changes.
I wouldn’t be too worried; I’ve used it for well over a year and not
really noticed any significant performance issues. In the worst case,
alternative engines could always be supported through extensions.
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.