On Wed, May 23, 2007 at 01:00:04AM +0900, Hans F. wrote:
Well that works for \w+ an \s+, but what if you want to match /01+0/?
You’d get a syntax error on 0111 even though it’s a valid partial match.
OK, I see the problem - it’s not detecting the end of the expression,
saying that this expression might match but only if the right
were appended to the end of the source.
In the general case I think you’d have to turn each RE into one which
matches all possible prefixes, perhaps something like
/(0(1+(0)?)?)/ # (note *)
However, if you can guarantee that no individual valid token is going to
longer than a certain size (let’s say 200 characters) then it would be
simpler to ensure that you read-ahead at least 200 characters into a
and then match against that.
Alternatively: perhaps only a few of your token REs have unlimited
length. Those you can code in the prefix form like that shown above. The
remainder (of fixed or limited length) can just be matched in the simple
against a large enough read-ahead buffer.
(*) Hmm, this isn’t quite right, since it partially matches 011112 as
You could check for a partial match (i.e. $3 = nil) and allow it only if
consumes the whole string.
Alternatively, the RE itself needs to say “must be followed by X or end
string”. This works, but it’s a bit ugly:
I can’t think of a better formulation off the top of my head though.