Jan E. wrote in post #1051180:
The part “.*?” of the regular expression is very inefficient
Is it? Have you measured it?
, because it
will at first consume every character until the end of the line and then
try to find the minimum of characters needed.
Does it? There are many implementations of ruby, which particular one(s)
are you referring to?
Your argument suggests that
/(.+?)(.+?)(.+?)(.+?)(.+?)/ =~ "a"*1_000_000
would be extremely inefficient, but actually it runs very fast for me.
So let’s demonstrate if you are right or wrong:
require ‘benchmark’
LONGSTR = (“a” * 1_000_000).freeze
Benchmark.bmbm do |x|
x.report(“chars”) { 1_000_000.times { /aaaaa/ =~ LONGSTR } }
x.report(“non-greedy”) { 1_000_000.times { /.?.?.?.?.*?/ =~
LONGSTR } }
end
And the results for me, using ruby 1.8.7 under Mac OSX Lion on a Macbook
Air i7:
Rehearsal ----------------------------------------------
chars 0.520000 0.000000 0.520000 ( 0.516861)
non-greedy 0.510000 0.000000 0.510000 ( 0.511089)
------------------------------------- total: 1.030000sec
user system total real
chars 0.510000 0.000000 0.510000 ( 0.505664)
non-greedy 0.510000 0.000000 0.510000 ( 0.511662)
I see no difference there.
Also you don’t need the block version of gsub. You can simple use a
substitute string and refer to the parenthesized subexpression by \1:
You can, but the block version is often clearer, especially if you are
doing things like backslash-escaping strings:
clear
a.gsub(/(.)/) { “\#{$1}” }
same result but horrible
a.gsub(/(.)/), “\\\1”)