Duplicating characters in a string

On 07.03.2008 19:41, Joel VanderWerf wrote:

Robert K. wrote:

What version did you test with? I get much more dramatic differences:

$ ruby -v
ruby 1.8.6 (2007-09-24 patchlevel 111) [i686-linux]

Surprising. The slower cases in your results are all the block cases.
Does cygwin ruby perform poorly with blocks in general? Or is it because
my ruby is compiled for i686 instead of i386?

Probably. But note that blocks must be slower because they are Ruby
code while the replacement without block is done in compiled C code.

Kind regards

robert

On Mar 7, 6:56 pm, 7stud – [email protected] wrote:

(I refuse to consider any code that uses perl syntax)

So you won’t be able to do this any more:

2 + 2

You’ll have to use Forth:

2 2 +

or Lisp:

(+ 2 2)

Some awk syntax:

gsub( /./, “&&”, s )

On Mar 7, 11:45 pm, Joel VanderWerf [email protected] wrote:

7stud 4.100000 0.020000 4.120000 ( 4.300772)
7stud 2.030000 0.010000 2.040000 ( 2.068190)

Pascal (FreePascal):

uses sysutils; // for timing

function dup_chars( s: ansistring ): ansistring;
var
out: ansistring;
i,j: longint;
c: char;
begin
setlength( out, length(s) * 2 );
for i := 1 to length(s) do
begin
c := s[i]; j := (i shl 1);
out[j-1] := c;
out[j] := c;
end;
exit( out )
end;

var
s : ansistring;
i : longint;
when : tDateTime;
begin
s := ‘’;
for i:=1 to 1000000 do s += ‘abc’;
when := Time;
dup_chars( s );
writeln( ((time - when) * secsPerDay):0:3, ’ seconds’ )
end.

—>0.031 seconds

On Mar 11, 9:26 am, Robert K. [email protected] wrote:

function dup_chars( s: ansistring ): ansistring;
out[j] := c;
for i:=1 to 1000000 do s += ‘abc’;


use.inject do |as, often| as.you_can - without end

On Mar 11, 9:26 am, Robert K. [email protected] wrote:

function dup_chars( s: ansistring ): ansistring;
out[j] := c;
for i:=1 to 1000000 do s += ‘abc’;


use.inject do |as, often| as.you_can - without end

On my computer:

s.gsub(re, …) → 9.641000 seconds
Lua → 1.078 seconds

Ruby’s gsub is far too slow. 7stud even beat it with an each_byte
loop!
Disgraceful. Disgusting. Inexcusable.

The Lua code:

s = string.rep( ‘abc’, 1000000 )

time = os.clock()
string.gsub( s, ‘(.)’, “%1%1” )
print( os.clock() - time, “seconds” )

William J. wrote:

Ruby’s gsub is far too slow. 7stud even beat it with an each_byte
loop!

I have not been able to reproduce a benchmark in which each_byte was
faster than gsub (at least, without resorting to the loop unrolling
trick).

2008/3/11, William J. [email protected]:

s.gsub(re){…} 5.910000 0.020000 5.930000 ( 5.993456)
var
end;
when := Time;
dup_chars( s );
writeln( ((time - when) * secsPerDay):0:3, ’ seconds’ )
end.

—>0.031 seconds

Your point being?

robert

Adam A. wrote:

If i have a string “abc” and want to make it like this “aabbcc”, how do
i go about it?

I thought convert it into an array and then use string.map{|x| x << x}

though i havnt tested that yet. Is there a better way or a string method
that saves me from having to convert it into an array?

Another approach:

“abc”.split(//).map{|c| c*2}.join

On 11.03.2008 19:53, William J. wrote:

Btw, is there any particular reason why you duplicated the mail content?

                   user     system      total        real

c: char;
var
—>0.031 seconds
Lua --> 1.078 seconds
string.gsub( s, ‘(.)’, “%1%1” )
print( os.clock() - time, “seconds” )

I do not know Lua. Does this return a copied string or does it do it
inplace?

Kind regards

robert

Some things to keep in mind:

  • captures can be slow, so if you can avoid them, you can get better
    performance
  • looping using “for” is usually faster than using “each { … }”,
    because it doesn’t have to create a new scope every time the block
    is called

Three alternative solutions added given the above (my computer is slower
than yours, so I reduced the number of iterations too). Interesting
that I was able to write an enumerator class in ruby that was faster
than the builtin enumerator class (which is written in C). Also
interesting is how much slower the enumerator solution is on YARV:

require ‘benchmark’
require ‘enumerator’

class EachByteEnumerator
def initialize(obj)
@obj = obj
end

def each(&block)
@obj.each_byte(&block)
end
end

Benchmark.bmbm do |b|
s = “abc” * 100_000
re = /(.)/

b.report("s.gsub(re) {|x| x2}") do
s.gsub(re) {|x| x
2}
end

b.report(“s.gsub(re, ‘\1\1’)”) do
s.gsub(re, ‘\1\1’)
end

b.report(“s.gsub(/./, ‘\0\0’)”) do
s.gsub(/./, ‘\0\0’)
end

b.report(“7stud1”) do
new_str = “”
s.each_byte do |byte|
2.times do
new_str << byte
end
end
end

b.report(“7stud2”) do
new_str = “”
s.each_byte do |byte|
new_str << byte << byte
end
end

b.report(“7stud_enum”) do
new_str = “”
o = s.to_enum(:each_byte)
for byte in o do
new_str << byte << byte
end
end

b.report(“7stud_enum2”) do
new_str = “”
o = EachByteEnumerator.new(s)
for byte in o do
new_str << byte << byte
end
end
end

[pbrannan@zaphod tmp]$ ruby test.rb
Rehearsal --------------------------------------------------------
s.gsub(re) {|x| x*2} 2.410000 0.010000 2.420000 ( 2.431675)
s.gsub(re, ‘\1\1’) 0.900000 0.140000 1.040000 ( 1.035942)
s.gsub(/./, ‘\0\0’) 0.840000 0.070000 0.910000 ( 0.919175)
7stud1 1.430000 0.000000 1.430000 ( 1.430979)
7stud2 0.580000 0.010000 0.590000 ( 0.585925)
7stud_enum 0.820000 0.010000 0.830000 ( 0.837720)
7stud_enum2 0.510000 0.000000 0.510000 ( 0.511776)
----------------------------------------------- total: 7.730000sec

                       user     system      total        real

s.gsub(re) {|x| x*2} 2.450000 0.000000 2.450000 ( 2.455824)
s.gsub(re, ‘\1\1’) 0.860000 0.010000 0.870000 ( 0.861705)
s.gsub(/./, ‘\0\0’) 0.820000 0.000000 0.820000 ( 0.818279)
7stud1 1.460000 0.010000 1.470000 ( 1.483312)
7stud2 0.590000 0.000000 0.590000 ( 0.589141)
7stud_enum 0.820000 0.000000 0.820000 ( 0.817231)
7stud_enum2 0.520000 0.000000 0.520000 ( 0.514009)

[pbrannan@zaphod tmp]$ ruby1.9 test.rb
Rehearsal --------------------------------------------------------
s.gsub(re) {|x| x*2} 2.880000 0.030000 2.910000 ( 2.961747)
s.gsub(re, ‘\1\1’) 1.850000 0.120000 1.970000 ( 1.972712)
s.gsub(/./, ‘\0\0’) 1.770000 0.040000 1.810000 ( 1.815295)
7stud1 1.040000 0.000000 1.040000 ( 1.056692)
7stud2 0.640000 0.000000 0.640000 ( 0.660844)
7stud_enum 1.710000 0.080000 1.790000 ( 1.786316)
7stud_enum2 1.640000 0.120000 1.760000 ( 1.770983)
---------------------------------------------- total: 11.920000sec

                       user     system      total        real

s.gsub(re) {|x| x*2} 2.950000 0.030000 2.980000 ( 2.999795)
s.gsub(re, ‘\1\1’) 1.910000 0.030000 1.940000 ( 1.938242)
s.gsub(/./, ‘\0\0’) 1.760000 0.030000 1.790000 ( 1.811339)
7stud1 1.040000 0.000000 1.040000 ( 1.055916)
7stud2 0.650000 0.030000 0.680000 ( 0.685329)
7stud_enum 1.520000 0.120000 1.640000 ( 1.661215)
7stud_enum2 1.450000 0.120000 1.570000 ( 1.590461)

[pbrannan@zaphod tmp]$ ruby1.9 -v
ruby 1.9.0 (2008-01-24 revision 0) [i686-linux]

On Wed, Mar 12, 2008 at 03:57:59AM +0900, William J. wrote:

print( os.clock() - time, “seconds” )
I don’t think your code is doing what you think it is doing:

[pbrannan@zaphod tmp]$ cat test.lua
– s = string.rep( ‘abc’, 100000 )
s = ‘abcabcabc’

time = os.clock()
string.gsub( s, ‘(.)’, “%1%1” )
print(s)
print( os.clock() - time, “seconds” )

[pbrannan@zaphod tmp]$ lua test.lua
abcabcabc
0 seconds
[pbrannan@zaphod tmp]$ lua -v
Lua 5.0.2 Copyright © 1994-2004 Tecgraf, PUC-Rio

Paul