For some reason, lookbehind and alternation seem not to be playing
together in a little Oniguruma test. This is based on the string
splitting thread from a little while ago this evening, and uses a CVS
1.9.0 Ruby acquired about 1/2 an hour ago.
str = %Q{abc def “ghi jkl” mno}
Look for “…” but just get the … part:
re1 = /(?<=“)[^”]+(?=")/
Test that:
p str.scan(re1) # => [“ghi jkl”]
Now, do the same thing or \S+. This should, I think,
pick up the abc, def, and mno substrings too.
re2 = /((?<=“)[^”]+(?="))|(\S+)/
But it doesn’t; the part before the alternation never
matches, even though it did before (as shown by the
I know that’s all a bit cluttered, but the basic thing is that a
sub-pattern using lookbehind doesn’t seem to match any more when
there’s an alternation. Instead, only the second alternative ever
matches.
I know that’s all a bit cluttered, but the basic thing is that a
sub-pattern using lookbehind doesn’t seem to match any more when
there’s an alternation. Instead, only the second alternative ever
matches.
If so, then I believe the problem is something to do with the fact that
lookaround is atomic, so that when used with capturing groups and
alternations you sometimes experience problems because the regex
immediately forgets the (zero-width, remember) lookaround match, so that
by the time it comes to that ‘or’ it doesn’t have the information to
compare.
Generally, there are restrictions with lookaround (esp lookbehind)
matching, and especially when matching regexps. So far my experiments
with
Oniguruma suggest it’s fairly sophisticated in this respect, supporting
stuff like varying-width alternations, fixed repetition and optional
groups in lookbehind, but of course still no star and plus.
Now, do the same thing or \S+. This should, I think,
pick up the abc, def, and mno substrings too.
re2 = /((?<=")[^"]+(?="))|(\S+)/
But it doesn’t; the part before the alternation never
matches,
It shouldn’t, since pattern-matching goes left-to-right \S will match
the quote before the first half of the regexp gets a chance, since it
wants to match the first character after the quote.
It shouldn’t, since pattern-matching goes left-to-right \S will match the
quote before the first half of the regexp gets a chance, since it wants to
match the first character after the quote.
OK, I see. I was somehow discounting the fact that the first " itself doesn’t match the left-hand alternate.
On Sat, 07 Jan 2006 03:36:34 -0000, Ross B. [email protected] wrote:
If so, then I believe the problem is something to do with the fact that
lookaround is atomic
Argh, I meant ‘if not’. I’m too tired now, I’ve spent too long
perfecting
this quiz thing…
I was guessing it was dropping the match without the capture and so
always
matching the alternate but that’s not right.
On Sat, 07 Jan 2006 03:36:34 -0000, Ross B. [email protected] wrote:
If so, then I believe the problem is something to do with the fact that
lookaround is atomic
Argh, I meant ‘if not’. I’m too tired now, I’ve spent too long perfecting
this quiz thing…
I was guessing it was dropping the match without the capture and so always
matching the alternate but that’s not right.
See Xavier’s post. My mistake was, essentially, expecting the first "
to “know” that it was supposed to match a zero-width condition
governing the state one character later. Instead, of course, it
asserts itself as a character in its own right; fails to match the
first alternate; and does match the second.
So [^\s"]+ is indeed probably the best thing. (Other than the
appropriate real libraries, of course
Argh, I meant ‘if not’. I’m too tired now, I’ve spent too long
perfecting this quiz thing…
I was guessing it was dropping the match without the capture and so
always matching the alternate but that’s not right.
See Xavier’s post. My mistake was, essentially, expecting the first "
to “know” that it was supposed to match a zero-width condition
governing the state one character later. Instead, of course, it
asserts itself as a character in its own right; fails to match the
first alternate; and does match the second.
Oh Damn it, yeah I see now. Wish I’d held my tongue now
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.