String#chop slow? REALLY slow?

Mat_S · July 27, 2006, 6:14pm

I just did a quick benchmark to prove something to myself. But I’d
like to get a sanity check from the people on the list.

Basically I want to drop what will be a trailing “\n” from input.
But it appears that using String#[] and if statements is nearly 200
times more efficient than chop. Which just seems really weird, so
here’s the benchmark. Maybe I’m doing something wrong.

Does this seem right? Anyone care to comment?

---- index_vs_chop.rb

require ‘benchmark’

n = 100_000
bigstring = "I am a big string " * 5_000

Benchmark.bmbm do |bench|
bench.report(“Indexing”) {
n.times do
bigstring[0…-1]
end
}

bench.report(“Chop”) {
n.times do
bigstring.chop
end
}
end

---- end index_vs_shop.rb

output:

Rehearsal --------------------------------------------
Indexing 0.100000 0.000000 0.100000 ( 0.102362)
Chop 7.190000 13.890000 21.080000 ( 22.477807)
---------------------------------- total: 21.180000sec

            user     system      total        real

Indexing 0.100000 0.000000 0.100000 ( 0.108777)
Chop 7.290000 14.050000 21.340000 ( 22.755782)

Mat_S · July 27, 2006, 6:23pm

On Fri, 28 Jul 2006, Mat S. wrote:

---- index_vs_chop.rb
end

output:

Rehearsal --------------------------------------------
Indexing 0.100000 0.000000 0.100000 ( 0.102362)
Chop 7.190000 13.890000 21.080000 ( 22.477807)
---------------------------------- total: 21.180000sec
          user     system      total        real
Indexing 0.100000 0.000000 0.100000 ( 0.108777)
Chop 7.290000 14.050000 21.340000 ( 22.755782)

on my node:

harp:~ > ruby a.rb
Rehearsal --------------------------------------------
Indexing 0.150000 0.000000 0.150000 ( 0.145923)
Chop 4.210000 16.200000 20.410000 ( 20.910127)
Chop2 4.210000 0.220000 4.430000 ( 4.536517)
---------------------------------- total: 24.990000sec

            user     system      total        real

Indexing 0.140000 0.000000 0.140000 ( 0.142257)
Chop 0.110000 0.000000 0.110000 ( 0.104612)
Chop2 0.150000 0.000000 0.150000 ( 0.152083)

harp:~ > cat a.rb
require ‘benchmark’

n = 100_000
bigstring = "I am a big string " * 5_000

Benchmark.bmbm do |bench|
bench.report(“Indexing”) {
n.times do
bigstring[0…-1]
end
}

bench.report(“Chop”) {
n.times do
bigstring.chop
end
}

bench.report(“Chop2”) {
n.times do
bigstring = bigstring[0…-2]
end
}
end

-a

Mat_S · July 27, 2006, 6:31pm

On Jul 27, 2006, at 12:20 PM, [email protected] wrote:

Indexing 0.140000 0.000000 0.140000 ( 0.142257)
Chop 0.110000 0.000000 0.110000 ( 0.104612)
Chop2 0.150000 0.000000 0.150000 ( 0.152083)

Now that’s interesting. I wonder why the rehearsal and the real run
are so different…

Mat_S · July 27, 2006, 6:31pm

Mat S. wrote:
…

output:

Rehearsal --------------------------------------------
Indexing 0.100000 0.000000 0.100000 ( 0.102362)
Chop 7.190000 13.890000 21.080000 ( 22.477807)
---------------------------------- total: 21.180000sec
            user     system      total        real
Indexing 0.100000 0.000000 0.100000 ( 0.108777)
Chop 7.290000 14.050000 21.340000 ( 22.755782)

You might want to use chop!:

Rehearsal --------------------------------------------
Indexing 0.843000 0.000000 0.843000 ( 0.844000)
Chop! 0.235000 0.000000 0.235000 ( 0.234000)
----------------------------------- total: 1.078000sec

           user     system      total        real

Indexing 1.437000 0.015000 1.452000 ( 1.453000)
Chop! 0.203000 0.000000 0.203000 ( 0.203000)

cheers
Chris

Mat_S · July 27, 2006, 6:38pm

Basically I want to drop what will be a trailing “\n” from input.
But it appears that using String#[] and if statements is nearly 200
times more efficient than chop. Which just seems really weird, so
here’s the benchmark. Maybe I’m doing something wrong.

Well, if you implement chop fully, you get very similar results:

RubyMate r4106 running Ruby v1.8.4 (/usr/local/bin/ruby)

untitled

Rehearsal -------------------------------------------------
Indexing 1.790000 3.950000 5.740000 ( 7.099300)
Chop 1.680000 3.930000 5.610000 ( 7.135508)
Indexing crlf 1.780000 3.970000 5.750000 ( 6.895291)
Chop crlf 1.670000 3.930000 5.600000 ( 6.573193)
--------------------------------------- total: 22.700000sec

                 user     system      total        real

Indexing 1.780000 3.980000 5.760000 ( 7.033924)
Chop 1.670000 3.970000 5.640000 ( 7.297766)
Indexing crlf 1.790000 4.020000 5.810000 ( 8.969243)
Chop crlf 1.680000 4.000000 5.680000 ( 7.480123)

require ‘benchmark’

n = 10_000
bigstring = "I am a big string " * 5_000

Benchmark.bmbm do |bench|
bench.report(“Indexing”) {
n.times do
bigstring[0…-2] == “\r\n” ? bigstring[0…-2] : bigstring[0…-1]
end
}

bench.report(“Chop”) {
n.times do
bigstring.chop
end
}

bigstring << “\r\n”

bench.report(“Indexing crlf”) {
n.times do
bigstring[0…-2] == “\r\n” ? bigstring[0…-2] : bigstring[0…-1]
end
}

bench.report(“Chop crlf”) {
n.times do
bigstring.chop
end
}
end

Mat_S · July 27, 2006, 6:38pm

Mat S. wrote:

I just did a quick benchmark to prove something to myself. But I’d like
to get a sanity check from the people on the list.

Basically I want to drop what will be a trailing “\n” from input. But
it appears that using String#[] and if statements is nearly 200 times
more efficient than chop. Which just seems really weird, so here’s the
benchmark. Maybe I’m doing something wrong.

Does this seem right? Anyone care to comment?

As someone else pointed out, you’ll probably want to use String#chop!
for
faster performance, since it uses the current object instead of creating
a new one.

Also note that str[0…-2] is not quite the same as str.chop when “\r\n”
is
involved:

irb(main):001:0> str = “hello world\r\n”
=> “hello world\r\n”
irb(main):002:0> str[0…-2]
=> “hello world\r”
irb(main):003:0> str.chop
=> “hello world”

I wouldn’t think the extra work of checking for “\r\n” would add that
much
overhead, though.

Regards,

Dan

This communication is the property of Qwest and may contain confidential
or
privileged information. Unauthorized use of this communication is
strictly
prohibited and may be unlawful. If you have received this communication
in error, please immediately notify the sender by reply e-mail and
destroy
all copies of the communication and any attachments.

Mat_S · July 27, 2006, 6:45pm

On Jul 27, 2006, at 11:12 AM, Mat S. wrote:

Basically I want to drop what will be a trailing “\n” from input.

String#chomp would probably be a better idea for this, but that’s OT
I suppose. Regardless, its performance is the same as chop, it seems.

Here are my modifications:

require ‘benchmark’

class String
def my_chop
self[0…-2]
end
end

n = 100_000
bigstring = "I am a big string " * 5_000

Benchmark.bmbm do |bench|
bench.report(“Indexing”) {
n.times do
bigstring[0…-1]
end
}

bench.report(“Chop”) {
n.times do
bigstring.chop
end
}

bench.report(“My Chop”) {
n.times do
bigstring.my_chop
end
}
end

And here are my results:

Rehearsal --------------------------------------------
Indexing 0.310000 0.000000 0.310000 ( 0.347943)
Chop 11.940000 30.330000 42.270000 ( 44.501066)
My Chop 12.620000 30.720000 43.340000 ( 46.339651)
---------------------------------- total: 85.920000sec

            user     system      total        real

Indexing 0.230000 0.000000 0.230000 ( 0.258177)
Chop 11.980000 30.680000 42.660000 ( 44.966923)
My Chop 12.610000 30.860000 43.470000 ( 45.859064)

Let’s see how String#chop is implemented…

static VALUE
rb_str_chop(str)
VALUE str;
{
str = rb_str_dup(str);
rb_str_chop_bang(str);
return str;
}

So it’s in C… interesting…

Jake McArthur

Mat_S · July 27, 2006, 6:41pm

On 2006-07-27, at 13:36 , Caio C. wrote:

Basically I want to drop what will be a trailing “\n” from input.
But it appears that using String#[] and if statements is nearly
200 times more efficient than chop. Which just seems really
weird, so here’s the benchmark. Maybe I’m doing something wrong.

Well, if you implement chop fully, you get very similar results:

Ah, but rangeless indexing yields much much better results:

RubyMate r4106 running Ruby v1.8.4 (/usr/local/bin/ruby)

untitled

Rehearsal -------------------------------------------------
Indexing 0.110000 0.000000 0.110000 ( 0.151018)
Chop 3.430000 7.920000 11.350000 ( 15.030196)
Indexing crlf 0.110000 0.000000 0.110000 ( 0.128584)
Chop crlf 3.430000 7.920000 11.350000 ( 14.815128)
--------------------------------------- total: 22.920000sec

                 user     system      total        real

Indexing 0.110000 0.000000 0.110000 ( 0.134087)
Chop 3.430000 7.980000 11.410000 ( 14.305555)
Indexing crlf 0.110000 0.000000 0.110000 ( 0.125122)
Chop crlf 3.420000 7.990000 11.410000 ( 13.869411)

require ‘benchmark’

n = 20_000
bigstring = "I am a big string " * 5_000

Benchmark.bmbm do |bench|
bench.report(“Indexing”) {
n.times do
bigstring[-2,2] == “\r\n” ? bigstring[-2,2] : bigstring[-1,1]
end
}

bench.report(“Chop”) {
n.times do
bigstring.chop
end
}

bigstring << “\r\n”

bench.report(“Indexing crlf”) {
n.times do
bigstring[-2,2] == “\r\n” ? bigstring[-2,2] : bigstring[-1,1]
end
}

bench.report(“Chop crlf”) {
n.times do
bigstring.chop
end
}
end

Mat_S · July 27, 2006, 7:21pm

On 7/27/06, Mat S. [email protected] wrote:

I just did a quick benchmark to prove something to myself. But I’d
like to get a sanity check from the people on the list.
[snip]
Benchmark.bmbm do |bench|
bench.report(“Indexing”) {
n.times do
bigstring[0…-1]
end
}
[snip]

No-one seems to have noticed the typo…? I think that 4th line should
be:

bigstring[0…-2]

Which is slower. That should account for part of the performance gap.

Mat_S · July 27, 2006, 7:30pm

On Jul 27, 2006, at 1:18 PM, Caleb C. wrote:

[snip]

No-one seems to have noticed the typo…? I think that 4th line
should be:

bigstring[0…-2]

Which is slower. That should account for part of the performance gap.

You’re totally right! [0…-1] is the same string. Thanks for the
catch. I’m surprised it took that long.

Thanks for all the advice, everyone. Sorry to be a little brain-dead.
-Mat

Mat_S · July 27, 2006, 6:48pm

On 7/27/06, Mat S. [email protected] wrote:

I just did a quick benchmark to prove something to myself. But I’d
like to get a sanity check from the people on the list.

Using Ara’s code:

Rehearsal --------------------------------------------
Indexing 0.109000 0.000000 0.109000 ( 0.109000)
Chop 6.766000 8.250000 15.016000 ( 15.110000)
Chop2 2.656000 3.781000 6.437000 ( 6.468000)
---------------------------------- total: 21.562000sec

           user     system      total        real

Indexing 0.156000 0.000000 0.156000 ( 0.156000)
Chop 0.094000 0.000000 0.094000 ( 0.094000)
Chop2 0.187000 0.000000 0.187000 ( 0.187000)

ruby -v
ruby 1.8.4 (2005-12-24) [i386-mswin32]

I think the difference in performance is because internally chop does
a dup on the string then calls chop! whereas the index operation
creates a new string which shares the old string but with a different
length. I guess this is also why the rehearsal and final results
differ - cutting out the cost of GC doesn’t reflect the true cost of
using chop (especially with big strings).

Regards,
Sean

Mat_S · July 27, 2006, 7:46pm

On 2006-07-27, at 13:40 , Caio C. wrote:

Ah, but rangeless indexing yields much much better results:

Speaking of catching typos, I apparently went too happy with my de-
ranging and implemented the wrong thing. Here are the actual results.
Pretty much the same as with ranges:

RubyMate r4106 running Ruby v1.8.4 (/usr/local/bin/ruby)

untitled

Rehearsal -------------------------------------------------
Indexing 3.690000 7.910000 11.600000 ( 13.937017)
Chop 3.480000 7.890000 11.370000 ( 13.911387)
Indexing crlf 3.690000 7.980000 11.670000 ( 15.256540)
Chop crlf 3.530000 8.040000 11.570000 ( 16.200714)
--------------------------------------- total: 46.210000sec

                 user     system      total        real

Indexing 3.700000 8.050000 11.750000 ( 14.579216)
Chop 3.520000 8.100000 11.620000 ( 15.165561)
Indexing crlf 3.730000 8.090000 11.820000 ( 15.573669)
Chop crlf 3.520000 8.100000 11.620000 ( 15.706817)

require ‘benchmark’

n = 20_000
s = "I am a big string " * 5_000

Benchmark.bmbm do |bench|
bench.report(“Indexing”) {
n.times do
s[-2,2] == “\r\n” ? s[0, s.length - 2] : s[0, s.length - 1]
end
}

bench.report(“Chop”) {
n.times do
s.chop
end
}

s << “\r\n”

bench.report(“Indexing crlf”) {
n.times do
s[-2,2] == “\r\n” ? s[0, s.length - 2] : s[0, s.length - 1]
end
}

bench.report(“Chop crlf”) {
n.times do
s.chop
end
}
end

Mat_S · August 3, 2006, 1:09pm

“Sean O’Halpin” [email protected] writes:

---------------------------------- total: 21.562000sec
a dup on the string then calls chop! whereas the index operation
creates a new string which shares the old string but with a different
length. I guess this is also why the rehearsal and final results
differ - cutting out the cost of GC doesn’t reflect the true cost of
using chop (especially with big strings).

Regards,
Sean

Indexing allocates a new string. It has to since (1) Ruby strings are
mutable, (2) Ruby strings have \0 at the end.

Steve

Mat_S · July 29, 2006, 3:54pm

Jake McArthur [email protected] writes:

}

So it’s in C… interesting…

And that’s the reason why it’s slower… it always dups the string,
while
string[a…b] creates a shared substring (copy-on-write).