Fwd: 1.9 significantly slower than 1.8 on Mac

derek · January 15, 2008, 4:31pm

Hmm. Simplifying my test script further, I am not sure that Regexp is
the problem at all!
With the each_line block, my script take more than TWICE as long in
1.9 vs. 1.8.
But without the each_line block, but keeping the Regexp, it is 10%
FASTER.

So unless there is some internal optimisation that occurs when the
block is removed, it looks like each_line is the problem, not Regexp???

500000.times do
end
$ ruby logreport3.rb
user system total real
WITH each_line: 1.710000 0.000000 1.710000 ( 1.717034)
WITHOUT each_line: 1.080000 0.000000 1.080000 ( 1.077098)

$ ruby19 logreport3.rb

derek · January 15, 2008, 4:56pm

“D” == Derek C. [email protected] writes:

D> So unless there is some internal optimisation that occurs when the
D> block is removed, it looks like each_line is the problem, not
Regexp???

Well, some part of #each_line for 1.8.6

for (s = p, p += rslen; p < pend; p++) {
    if (rslen == 0 && *p == '\n') {
        if (*++p != '\n') continue;
        while (*p == '\n') p++;
    }

easy : increment p and test

the same for 1.9

while (p < pend) {
    int c = rb_enc_codepoint(p, pend, enc);
    int n = rb_enc_codelen(c, enc);

    if (rslen == 0 && c == newline) {
        while (p < pend && rb_enc_codepoint(p, pend, enc) ==

newline) {
p += n;
}
p -= n;
}

a little more complex :

retrieve the code point
retrieve its length
etc,

Guy Decoux

derek · January 15, 2008, 7:21pm

Derek C. pisze:

Hmm. Simplifying my test script further, I am not sure that Regexp is
the problem at all!
With the each_line block, my script take more than TWICE as long in
1.9 vs. 1.8.
But without the each_line block, but keeping the Regexp, it is 10%
FASTER.

Oops, It seems you’re right, just split the original logfile and use
each instead of each_line and it gets a whole lot faster (the
rb_str_each_line is encoding aware). Anyways, it doesn’t change the fact
that Oniguruma might be opted here as well.

lopex