Regular expression: Is this a bug or feature?


#1

Hi ruby experts!

Is this intended behaviour?

irb(main):001:0> s1=‘a=1’
=> “a=1”
irb(main):002:0> s2=‘b=1’
=> “b=1”
irb(main):003:0> s1 =~ /a|b=(.)/
=> 0 <------ expression matches
irb(main):004:0> $1
=> nil <------ but where is argument?
irb(main):005:0> s2 =~ /a|b=(.)/
=> 0 <------ expression matches
irb(main):006:0> $1
=> “1” <------ this has been expected
irb(main):007:0> s1 =~ /(a|b)=(.)/
=> 0 <------ expression matches
irb(main):012:0> $2
=> “1” <------ this has been expected

Tested on ruby 1.8.2 (2004-12-22) [i686-linux]

Thanks for your help in advance
Martin.


#2

Martin K. wrote:

irb(main):003:0> s1 =~ /a|b=(.)/
=> 0 <------ expression matches
irb(main):004:0> $1
=> nil <------ but where is argument?

I assume this is the one you need explanation for? I think you simply
misinterpret the regexp. /a|b=(.)/ is a union between the two regexp /a/
and /b=(.)/. So in this case it matches only the first one which has no
bindings. The regexp you are probably looking for would be
/(?:a|b)=(.)/. Try that.


#3

On 1/23/07, Martin K. removed_email_address@domain.invalid wrote:

Hi ruby experts!

Is this intended behaviour?

irb(main):001:0> s1=‘a=1’
=> “a=1”
irb(main):002:0> s2=‘b=1’
=> “b=1”

Call this part 1:

irb(main):003:0> s1 =~ /a|b=(.)/
=> 0 <------ expression matches
irb(main):004:0> $1
=> nil <------ but where is argument?

Part 2:

irb(main):005:0> s2 =~ /a|b=(.)/
=> 0 <------ expression matches
irb(main):006:0> $1
=> “1” <------ this has been expected

Part 3:

irb(main):007:0> s1 =~ /(a|b)=(.)/
=> 0 <------ expression matches
irb(main):012:0> $2
=> “1” <------ this has been expected

I’m not sure why you think it might be a bug. The ‘|’ operator just
binds very loosely, so you have to group the “a|b” in parens. Note
that in part 1, The bit that matches is the left side of the ‘|’,
namely ‘a’ (no parens), so there are no captures. In part 2, the
right side (‘b=(.)’) matches, so there’s 1 capture. In part 3, it
matches the whole thing (’(a|b)=(.)’), so there are 2 captures.

Does this make sense?

Note that if you only want 1 capture, you can also use the shy
grouping operator (?:…), so:

s1 =~ /(?:a|b)=(.)/

[$1, $2] #=> [“1”, nil]