I’ll start by confessing that this comes originally from something I
worked on in Perl, and I’ve assumed, rightly or wrongly, that regular
expressions are regular expressions are regular expressions.
See
http://www.ilovejackdaniels.com/cheat-sheets/regular-expressions-cheat-sheet/
The context is that there are a whole pile of patterns that must be
preceded by … … well, not words or some punctuation. Call them
“sort-of zero width”. That is, white space, beginning of line and some
opening sequences, call the '[‘and (’ and ‘{’ for the sake of the
example, are allowed.
I’m trying to put the RE into a ‘constant’ so that I don’t have to keep
repeating it - all the DRY stuff about changes and so forth!
I’m trying to use RE’s lookahead.
This works in perl
$STARTWORD = qr/^|(?<=[\s\(\[\{])/m;
There is also the corresponding end word
$ENDWORD = qr/$|(?=[ \t\n\,\.\;\:\!\?\)])/om;
When I translate these into Ruby I get an error,
It doesn’t seem to like the lookbehind
The error message is
SyntaxError undefined (?...) sequence: /^|(?<=[\s\(])/
Well, possibly. Or it may be that it I’m having problems when combining
it with an actual pattern.
What I’ve done is separate out the pattern to a constant (and tried to
eliminate things that might confuse the parser)
STARTWORD = %r{^|(?<=[\s(])}m
An LO! The parser chokes on that.
Does it choke because there isn’t actually pattern being compared?
Well, maybe. If I remove the ‘%r{’ stuff the parser doesn’t choke.
But it doesn’t choke on
ENDWORD = %r{$|(?=[\s,.;:!?)])}m
And I seem to be getting confused when combining these with other
regular expressions because of this inconsistency.
Right now I don’t know if the problem is having the REs as constants.
Does this make them ‘precompiled’?
ENDWORD.type ==> “Regexp”
so I’m presuming it is. In which case why can’t I precompile STARTWORD?
So: Is it that Ruby can’t handle the ‘?<=’ lookbehind assertion … or
what? Am I completely hung up on a wrong track?
–
Any simple problem can be made insoluble if enough meetings are held to
discuss it.