On May 5, 7:42 pm, James S. [email protected] wrote:
“Age: 21” =~ /Age.{0,60}?: ([\w]+)/
This returns nil and $1 is set to nil.
This seems like a bug, given:
s = “Age: 21”
s =~ /Age.: (\w+)/ #=> 0
s =~ /Age.?: (\w+)/ #=> 0
s =~ /Age.{0,60}: (\w+)/ #=> 0
s =~ /Age.{0,60}?: (\w+)/ #=> nil
(Perhaps you were pairing down a real-world testcase; did you know
that you can simply use \w+ instead of [\w]+ to match one-or-more-word-
characters? And that \d may be more appropriate, matching only digit
characters?)
My simple experiments make me believe this is an edge case
specifically when:
a) a non-greedy range
b) that is matching any-char
c) has a lower-limit of 0
d) and must match 0 times to succeed.
Here’s my test data, with analysis following.
s = “abbc”
%w|
ab{1,9}c ab{1,9}?c
abb{1,9}c abb{1,9}?c
abbb{1,9}c abbb{1,9}?c
ab{0,9}c ab{0,9}?c
abb{0,9}c abb{0,9}?c
abbb{0,9}c abbb{0,9}?c
a.{1,9}c a.{1,9}?c
ab.{1,9}c ab.{1,9}?c
abb.{1,9}c abb.{1,9}?c
a.{0,9}c a.{0,9}?c
ab.{0,9}c ab.{0,9}?c
abb.{0,9}c abb.{0,9}?c
|.each_with_index{ |pattern,i|
regex = Regexp.new( pattern )
puts “%2i %-15s %s” % [
i, regex.inspect, (s =~ regex).inspect
]
}
#=> 0 /ab{1,9}c/ 0
#=> 1 /ab{1,9}?c/ 0
#=> 2 /abb{1,9}c/ 0
#=> 3 /abb{1,9}?c/ 0
#=> 4 /abbb{1,9}c/ nil
#=> 5 /abbb{1,9}?c/ nil
#=> 6 /ab{0,9}c/ 0
#=> 7 /ab{0,9}?c/ 0
#=> 8 /abb{0,9}c/ 0
#=> 9 /abb{0,9}?c/ 0
#=> 10 /abbb{0,9}c/ 0
#=> 11 /abbb{0,9}?c/ 0
#=> 12 /a.{1,9}c/ 0
#=> 13 /a.{1,9}?c/ 0
#=> 14 /ab.{1,9}c/ 0
#=> 15 /ab.{1,9}?c/ 0
#=> 16 /abb.{1,9}c/ nil
#=> 17 /abb.{1,9}?c/ nil
#=> 18 /a.{0,9}c/ 0
#=> 19 /a.{0,9}?c/ 0
#=> 20 /ab.{0,9}c/ 0
#=> 21 /ab.{0,9}?c/ 0
#=> 22 /abb.{0,9}c/ 0
#=> 23 /abb.{0,9}?c/ nil
In the above, we would expect patterns 4, 5, 16 and 17 to fail, but
not 23.
Notable is that pattern #15 succeeds (showing that a non-greedy range
matching any-char can match a lower-limit number of times) and that
pattern #11 succeeds (showing that a non-greedy range matching a
specific char can match zero number of times).