On Mon, 16 Apr 2007 19:55:05 +0900, Robert K.
[email protected] wrote:
Here’s an implementation of a Thompson NFA which does Perl-style
capture:
http://swtch.com/~rsc/regexp/nfa-perl.y.txt
Hmm… I was more after how you would notate the RX for a DFA based
engine to do the prioritization.
We note choice points in our tags, and also tag those subsequent
transitions which would disprove a particular choice (for instance, for
/^(?:(abc)|(a)).*$/, the second transition in “axy” disproves the first
alternative, and the third transition in “abc” disproves the second
alternative).
Referring to those tags as we encounter them during FA evaluation, we
can then accumulate sets of plausible captures and cull those which are
disproven as we go.
I do not know a DFA based engine that would actually allow to write a RX
pattern to do that.
A non-backtracking NFA simulation (i.e. Thompson NFA) operates by
constructing and evaluating the corresponding DFA on-the-fly (each
unique set of NFA states corresponds to a DFA state). If you memoize
the construction of DFA states, then you’ve got a straight DFA engine
(the DFA is constructed lazily). Some RX engines have historically
built the entire DFA up front, but as previously discussed, that
approach is wasteful (many of the DFA states so constructed may never
get visited).
The same techniques (using tagged transitions to drive other code as the
FA is evaluated) are applicable in all three cases (non-backtracking NFA
simulation, dynamic DFA construction, and static DFA construction).
Of course, when building engines manually you can do a lot - and
probably a lot more than what actual RX engines allow.
Right, so I thought it appropriate to give an example of an actual RX
engine which allowed the same (and more) as my hand-worked examples by
extending the same class of techniques I had illustrated so far.
But since I’m interested in using them vs. writing RX engines your
solution was not exactly what I expected. 
I’ll admit to being a little confused – you’re interested in using a RX
engine rather than implementing one, but you’d rather see a description
and/or hand-worked example of an implementation technique rather than a
link to a finished implementation using the same techniques under
discussion?
-mental