“basi” email@example.com writes:
Just having an outstandingly hard time to even come close to being able
to translate the following string sequences in one or a series of
regular expressions. These are allowable prefix combinations in a
language I’m doing some text analysis on.
i pa(ki) pag
The first one allows the following choices:
, pag, ipa, ipag, ipapag, papag, ipaki, ipakipag, paki, pakipag
You need to use alternatives to handle the nonempty constraint.
For instance, the basic structure for the first sequence is this:
/^(strings with i|strings with pa(ki)|strings with pag)$/
For instance, this works:
But it’s redundant, since several strings will match more than one of
alternatives. For instance, the first alt takes care of all the strings
i, so there’s no need for i? in the other two parts. Similarly, the
second parts together handle all the strings with pa (with or without i-
with or without -ki), so there’s no need to include them in the third
This matches all the valid strings, and I haven’t found an invalid
that it matches - but it’s almost 1 AM, so I could be missing something.
If you need to worry about the difference between lines and strings, you
use \A and \Z instead of ^ and $. It may be more efficient to use
non-capturing parens (?:…) instead of the plain ones, but I think it
makes it harder to read and type.