Regular Expression Grouping


#1

Hi!

I couldn’t understand the behavior of this code:

match = ‘Today is Feb 23rd, 2003’.match(/Feb 23(rd)?/)
a = match.to_a
puts a.size # 2
puts a.join(",") # Feb 23rd,rd
puts a[0] # Feb 23rd
puts a[1] # rd

In my understanding, /Feb 23(rd)?/ is equivalent to /Feb 23|Feb
23rd/ . So, match should not include ‘rd’.

Thanks.


#2

On 19 Oct 2008, at 15:01, mars wrote:

puts a[1] # rd

In my understanding, /Feb 23(rd)?/ is equivalent to /Feb 23|Feb
23rd/ . So, match should not include ‘rd’.

? + and * are greedy, ie they always try to match as much of the
string as possible so rd is part of the match.
If you want a non greedy quantifier you need to add ? to it, for example
match = ‘Today is Feb 23rd, 2003’.match(/Feb 23(rd)??/)
match[0] #=> “Feb 23”

Fred


#3

Yeah you’re right,

match = ‘Today is Feb 23rd, 2003’.match(/Feb 23(rd)??/)
match[0] #=> “Feb 23”
match[1] #=> nil

I expected:

match = ‘Today is Feb 23rd, 2003’.match(/Feb 23(rd)?/) # ? is greedy here
match[0] #=> “Feb 23rd”
match[1] #=> nil

But what I got in ruby 1.8.6 (2007-09-24 patchlevel 111) [i386-mswin32]
is:

match[0] #=> “Feb 23rd”
match[1] #=> ‘rd’

The behavior of ‘?’ being greedy is correct since it matched “Feb
23rd” which is stored in match[0] . But should match[1] not be nil?
The regular expression does not match “rd” alone.

thanks

On Sun, Oct 19, 2008 at 11:34 PM, Frederick C.


#4

thanks Fred. I think I misunderstood something.

On Mon, Oct 20, 2008 at 12:14 AM, Frederick C.


#5

On 19 Oct 2008, at 16:04, Marcelino Debajo wrote:

greedy here
The regular expression does not match “rd” alone.
match[1] is the first group, which in this example should be
‘rd’ (unless you’re talking about something else).

Fred