Confused by 'test'.gsub(/.*/,'x')

Why do I get “xx” instead of “x” in the following:

$ irb

‘test’.gsub(/.*/,‘x’)
=> “xx”

and even more confusing (to me):

“x\n”.gsub(/.*/,‘y’)
=> “yy\ny”

(I expected “y\n”)

On Wed, Apr 2, 2008 at 10:12 PM, Wybo D. [email protected] wrote:

Why do I get “xx” instead of “x” in the following:

$ irb

‘test’.gsub(/.*/,‘x’)
=> “xx”

.* matches NO and ALL characters, so gsub() substitutes
‘’(empty)(=>‘x’) and and ‘test’(=>‘x’) with x, so you get ‘xx’

and even more confusing (to me):

“x\n”.gsub(/.*/,‘y’)
=> “yy\ny”

Same goes here as above. If you want to replace each character use
‘test’.gsub(/./,‘x’) #=> ‘xxxx’
or if you want to replace all characters in each line, use
“test\ntest”.gsub(/.+/,‘x’) #=> “x\nx”

On Wed, Apr 2, 2008 at 10:55 PM, Yossef M. [email protected]
wrote:

Yeah, it is confusing me, but I agreed on that explanation with
myself, when I read it once here. I’d also expect ‘x’ instead of ‘xx’

On Apr 2, 3:35 pm, “Thomas W.” [email protected]
wrote:

On Wed, Apr 2, 2008 at 10:12 PM, Wybo D. [email protected] wrote:

Why do I get “xx” instead of “x” in the following:

$ irb

‘test’.gsub(/.*/,‘x’)
=> “xx”

.* matches NO and ALL characters, so gsub() substitutes
‘’(empty)(=>‘x’) and and ‘test’(=>‘x’) with x, so you get ‘xx’

That sounds like an explanation why ‘’.gsub(/./, ‘x’) is ‘x’ more
than why ‘test’.gsub(/.
/, ‘x’) is ‘xx’. It seems to me that the .*
should match [empty string]test[empty string] just once.

and even more confusing (to me):

“x\n”.gsub(/.*/,‘y’)
=> “yy\ny”

This makes sense because . doesn’t normally match \n, so there’s the
replacement before and after. Still, the double replacement when there
are actual characters is just weird.

Thomas W. [2008-04-02 22:59]:

Yeah, it is confusing me, but I agreed on that explanation with
myself, when I read it once here. I’d also expect ‘x’ instead of ‘xx’
can’t explain it either, i’m afraid. but you can see what it does
like so:

irb> ‘test’.gsub(/.*/) { |m| p m; ‘x’}
“test”
“”
=>“xx”

as soon as you anchor the regexp at the beginning of the string it
gives the expected result:

irb> ‘test’.gsub(/\A.*/) { |m| p m; ‘x’}
“test”
=>“x”

or just do:

irb> ‘test’.sub(/.*/) { |m| p m; ‘x’}
“test”
=>“x”

:wink:

cheers
jens

Jens W. wrote:

“test”
=>“x”

sure, that works, and so does test.gsub(/.+/,‘x’).
The point is that I don’t understand why test.gsub(/./,‘x’) gives me
‘xx’, since .
means: zero or more of any character, except the newline
character, i.e.: all of the string should be replaced with a single x,
as far as I can see.

Seems wrong to me as well. If you do a destructive gsub and test for
individual letters, e.g. /t.*/,‘x’, you get ‘tex’ as you’d expect. Seems
wrong to get the double ‘x’, when you use your example. Of course my
background is Perl and I believe that’s how it would work there.

irb(main):016:0> ‘test’.gsub!(/./, ‘x’)
=> “xx”
irb(main):017:0> ‘test’.gsub!(/e.
/, ‘x’)
=> “tx”
irb(main):018:0> ‘test’.gsub!(/s./, ‘x’)
=> “tex”
irb(main):019:0> ‘test’.gsub!(/t.
/, ‘x’)
=> “x”
irb(main):020:0> ‘test’.gsub!(/st.*/, ‘x’)
=> “tex”

Ken

On Apr 2, 5:13 pm, Jens W. [email protected] wrote:

Yeah, it is confusing me, but I agreed on that explanation with
myself, when I read it once here. I’d also expect ‘x’ instead of ‘xx’

can’t explain it either, i’m afraid. but you can see what it does
like so:

irb> ‘test’.gsub(/.*/) { |m| p m; ‘x’}
“test”
“”
=>“xx”

That seems like a bug to me. The entire string is matched/consumed
by .*, so why try matching again? Or, if you are going to continue,
why stop with just one additional match? Is there code in gsub to
“only match one time after the string is consumed” ?

irb(main):001:0> ‘test’ =~ /(.)(.)(.*)/
=> 0
irb(main):002:0> $1
=> “test”
irb(main):003:0> $2
=> “”
irb(main):004:0> $3
=> “”

Januski, Ken [2008-04-03 00:08]:

Of course my background is Perl and I believe that’s how it would
work there.
no, works the same way there:

sh> perl -e ‘$s = “test”; $s =~ s/.*/x/g; print “$s\n”’
xx

(only a lot more complicated :wink:

btw: python, php and javascript, too.

oh, and here’s what oniguruma does:

irb> Oniguruma::ORegexp.new(’.*’).gsub(‘test’, ‘x’)
=>“xx”

cheers
jens

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The point is that I don’t understand why test.gsub(/./,‘x’) gives me
‘xx’, since .
means: zero or more of any character, except the newline
character, …

Wybo
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I would venture to say this is exactly what it does. It finds two
matches and replaces them both with ‘x’. The first match is an empty
string , while the second match is the full string .

Alex

Right you are. For all the years I’ve used Perl, and for all that I
thought I knew about regexes, I never would have thought I would get
that result.

I would have expected one greedy match for the entire text. Instead I
guess it’s first getting the zero match and then the full match.

Perl, PHP:

perl -le ‘$str=“test”; $str =~ s/.*?/x/g; print $str;’
xxxxxxxxx

preg_replace(’/.*?/’, ‘x’, ‘test’);
xxxxxxxxx

Ruby:
print ‘test’.gsub(/.*?/, ‘x’)
xtxexsxtx

Zaki

From: Wybo D. [mailto:[email protected]]

sure, that works, and so does test.gsub(/.+/,‘x’).

The point is that I don’t understand why test.gsub(/.*/,‘x’) gives me

‘xx’, since .* means: zero or more of any character, except

the newline

character, i.e.: all of the string should be replaced with a

single x, as far as I can see.

you can start (slowly) by comparing these two examples,

irb(main):077:0> ‘’.gsub(/.*/, ‘x’)
=> “x”

irb(main):078:0> ‘’.gsub(/.+/, ‘x’)
=> “”

kind regards -botp

I would have expected one greedy match for the entire text.
Instead I guess it’s first getting the zero match and then
the full match.

Actually, it’s vice versa. It matches the whole string (greedy), then
matches the end of string. The “test” string is seen by the regex engine
as:

test

.* first matches “test”. is a special ‘character’ that
is not consumed by “.”, so the remaining string is then “”, This is also matched, as it contains zero or more characters
(but is not then matched infinitely, as the position in the string has
not advanced.

Dan.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs