On Tue, Nov 23, 2010 at 5:12 PM, Ammar A. [email protected]
wrote:
“b T T W b”.match(/(?<!t t|a b) w/i)
#Look-behind only contains the t t condition now and, T T are back to
you asked, so it did match.
That was an alternative! If the RX in the lookbehind can match, the
negative lookbehind must fail IMHO.
The thing is what’s in the lookbehind, and all assertions for that matter,
is not really a regular expression. It is a fixed length literal. The only
exception, AFAIK, is character sets because they are also fixed length. The
engine needs to know how many characters to step back and examine.
Docs say that the regexp cannot be unlimited. But it is by far not
only a fixed length literal. “|” is certainly meta in an assertion -
the second line would not match if the lookbehind assertion was a
literal.
10:45:31 ~$ ruby19 x.rb
bc /(?<=ab)c/ []
bc /(?<=a|b)c/ [“c”]
bc /(?<=a|b)c/ []
abc /(?<=ab)c/ [“c”]
abc /(?<=a|b)c/ [“c”]
abc /(?<=a|b)c/ []
a|bc /(?<=ab)c/ []
a|bc /(?<=a|b)c/ [“c”]
a|bc /(?<=a|b)c/ [“c”]
a|bc /(?<=ab)c/ []
a|bc /(?<=a|b)c/ [“c”]
a|bc /(?<=a|b)c/ []
10:45:32 ~$ cat x.rb
str = [“bc”, “abc”, “a|bc”, “a\|bc”]
rxs = [/(?<=ab)c/,/(?<=a|b)c/,/(?<=a|b)c/]
str.each do |s|
rxs.each do |r|
printf “%-10s %-15p %p\n”, s, r, s.scan(r)
end
end
10:45:45 ~$
Docs even say “In negative-look-behind, captured group isn’t allowed,
but shy group(?
is allowed.” So it’s a regexp albeit a limited one.
http://www.geocities.jp/kosako3/oniguruma/doc/RE.txt
irb(main):009:0> “b T T W b”.match(/(?<!t t|a) w/i)
=> #<MatchData " W">
irb(main):010:0> “b T T W b”.match(/(?i:<!t t|a) w/i)
=> nil
That’s not a valid assertion any more, it is now an options specification.
“b <!t t w b”.match( /(?i:<!t t|a) w/ )
=> #<MatchData “<!t t w”>
Right, apparently we cannot have options in assertions.
=> “1.9.1”
RUBY_PATCHLEVEL
=> 378
The root issue still exists
irb(main):014:0> “a ac”.scan /(?<!a a|b)c/i
=> []
irb(main):015:0> “A Ac”.scan /(?<!a a|b)c/i
=> [“c”]
irb(main):016:0> “ac”.scan /(?<!a|b)c/i
=> []
irb(main):017:0> “Ac”.scan /(?<!a|b)c/i
=> []
Statement 15 should not yield any results in the same way as 17 does.
Apparently /i breaks in if there is an alternative (“|”) in
conjunction with more than one chars in one alternative:
Fails (more than 1 char AND alternative)
irb(main):018:0> “aac”.scan /(?<!aa|b)c/i
=> []
irb(main):019:0> “AAc”.scan /(?<!aa|b)c/i
=> [“c”]
irb(main):020:0> “Aac”.scan /(?<!aa|b)c/i
=> [“c”]
irb(main):021:0> “aAc”.scan /(?<!aa|b)c/i
=> [“c”]
Works (more then 1 char OR alternative):
irb(main):022:0> “aac”.scan /(?<!aa)c/i
=> []
irb(main):023:0> “aAc”.scan /(?<!aa)c/i
=> []
irb(main):024:0> “Aac”.scan /(?<!aa)c/i
=> []
irb(main):025:0> “AAc”.scan /(?<!aa)c/i
=> []
irb(main):026:0> “ac”.scan /(?<!a)c/i
=> []
irb(main):027:0> “Ac”.scan /(?<!a)c/i
=> []
irb(main):028:0> “ac”.scan /(?<!a|b)c/i
=> []
irb(main):029:0> “Ac”.scan /(?<!a|b)c/i
=> []
IMHO this is a bug.
Kind regards
robert