Peter S. wrote:
regexes. Is there a way around this (without going down to 2000 words) ?
Thanks for any hint
You could optimize the regex a little for size, e.g. by factoring out
Thought of that.
Good for you.
Yes, that was my next thought but where to split? Just count the bytes
and splitt near 1 <<16?
Probably better not to construct the mega-regex in the first place. For
the record, finding yourself on the edges of the language’s capacity
like this might be a sign that refactoring is in order.
Even if you’re going to stick with your current technique, but work
around the size limitation, it’s probably better not to build a megex
™ you’ll have to split up. As you put the pattern together, only add
alternations as long as the cumulative size will be < 0x10000 (or a
well-commented static constant with that value).
Why is there a limitation at all? I implemented the same thing in perl
and it no complains …
Is the regexp engine of perl that much better?
As Friedl notes, Perl is darned close to being the ideal regex language.
Ruby regexes aren’t necessarily meant to be the one hammer that can
drive every nail. If you want to be able to view every problem through
a regex lens, you’ll probably have to dig a little deeper than
categorizing one language’s engine as simply “better” than another.
Big, static sizes like 2**16 are often used to avoid dynamic allocation,
or otherwise improve runtime efficiency.
Also for the record: I’m a big fan of regexes. Though a lot of people
complain about complexity or efficiency issues, I’ve never had a problem
with either. I would be interested to see a comparison of the relative
merits and limitations of various engines, e.g. regex lengths,
benchmarks, big-O complexity, and ability to handle null bytes.