Oniguruma question

With the following Oniguruma re:

print “aabb”.index(/.+?(?<!a)/)
print “\n”

My reasoning tells me I should be getting 3, but I’m getting 0.
Basically, the .+ should force the re ‘cursor’ past at least the first
a; after that, the negative lookback should force it past the first b.
Could someone explain where I’m misunderstanding?

Thanks,
Ken


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

On Sep 5, 2008, at 3:04 PM, Kenneth McDonald wrote:

Thanks,
Ken


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

‘.+?’ is matching with ‘aab’, and ‘(?<!a)’ is matching with ‘b’.

irb(main):001:0> “aabb”.index(/.+?(?<!a)/)
=> 0
irb(main):002:0> Regexp.last_match
=> #<MatchData “aab”>


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

On Sep 5, 2008, at 3:24 PM, Hirotsugu A. wrote:

first a; after that, the negative lookback should force it past the

‘.+?’ is matching with ‘aab’, and ‘(?<!a)’ is matching with ‘b’.

irb(main):001:0> “aabb”.index(/.+?(?<!a)/)
=> 0
irb(main):002:0> Regexp.last_match
=> #<MatchData “aab”>

That was not exactly correct. Nothing is matched in (?<!a).

irb(main):003:0> /(.+?)((?<!a))/.match(“aabb”).captures
=> [“aab”, “”]

Anyhow, here’s what I think happens:

The engine starts out with ‘.+?’. The first ‘a’ matches. It then asks
if the next token (?<!a) is satisfied. It is not (since there is
nothing before the first ‘a’).

So it adds the next ‘a’ to ‘.+?’, and then asks (?<!a) is satisfied.
It still is not.

It adds ‘b’ to ‘.+?’, then asks if (?<!a) is satisfied. It is. Thus
the match succeeds.


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email

Hirotsugu A. pisze:

=> #<MatchData “aab”>
The engine starts out with ‘.+?’. The first ‘a’ matches. It then asks
if the next token (?<!a) is satisfied. It is not (since there is
nothing before the first ‘a’).

So it adds the next ‘a’ to ‘.+?’, and then asks (?<!a) is satisfied.
It still is not.

It adds ‘b’ to ‘.+?’, then asks if (?<!a) is satisfied. It is. Thus
the match succeeds.

With org.joni.Config.DEBUG_ALL being set you can watch what’s happening
under the hood, for /(.+?)((?<!a))/.match(“aabb”) goes like this:

stack used: true
code length: 22
[mem-start:1]@0(2) [anychar-sb]@2(1) [jump:(1)]@3(2) [anychar-sb]@5(1)
[push:(-3)]@6(2) [mem-end:1]@8(2) [mem-start:2]@10(2)
[push-look-behind-not:1:(3)]@12(3) [exact1:a]@15(2)
[fail-look-behind-not]@17(1) [mem-end:2]@18(2) [end]@20(1)
[finish]@21(1)

onig_search (entry point): str: 0, end: 4, start: 0, range 4
onig_search(apply anchor): end: 4, start 0, range 4
match_at: str: 0, end: 4, start: 0, sprev: 0
size: 4, start offset: 0
0> “aabb” [mem-start:1]@0(2)
0> “aabb” [anychar-sb]@2(1)
1> “abb” [jump:(1)]@3(2)
1> “abb” [push:(-3)]@6(2)
1> “abb” [mem-end:1]@8(2)
1> “abb” [mem-start:2]@10(2)
1> “abb” [push-look-behind-not:1:(3)]@12(3)
0> “aabb” [exact1:a]@15(2)
1> “abb” [fail-look-behind-not]@17(1)
1> “abb” [anychar-sb]@5(1)
2> “bb” [push:(-3)]@6(2)
2> “bb” [mem-end:1]@8(2)
2> “bb” [mem-start:2]@10(2)
2> “bb” [push-look-behind-not:1:(3)]@12(3)
1> “abb” [exact1:a]@15(2)
2> “bb” [fail-look-behind-not]@17(1)
2> “bb” [anychar-sb]@5(1)
3> “b” [push:(-3)]@6(2)
3> “b” [mem-end:1]@8(2)
3> “b” [mem-start:2]@10(2)
3> “b” [push-look-behind-not:1:(3)]@12(3)
2> “bb” [exact1:a]@15(2)
3> “b” [mem-end:2]@18(2)
3> “b” [end]@20(1)
0
0: (0-3)
1: (0-3)
2: (3-3)


To unsubscribe from this list, please visit:

http://xircles.codehaus.org/manage_email