Slow regular expressions :(

If regexp stuff matter so much in the course of your programming for
a particular case, then using perl is a valid suggestion as it is
rightly seen as pretty much the best regexp engine so far (Oniguruma
[1]<sp?> has made some waves and may be able to best perl in the end,
but it’s not widely available and is an extension now, perhaps for
Ruby2/Rite it will be the built in regexp engine, who knows or dares
to dream).

So I don’t find it unreasonable to say to people who need to do a lot
of regexp work - “use perl”, it was practically designed for it. But
most of the time I find using regexp complicates things unnecessarily
[2].

Most of the time Ruby is good enough, if you are generating regexps
automatically, perhaps perl is the better choice as it will prevent
this exponential slowdown for you. To say that Ruby must be ‘fixed’
to allow you to write crappy regexps and have the engine take care of
it for you is in my opinion not the answer. If you’re going to use a
regexp, you should be aware of what you are doing. If for some
reason you generate a load of them automatically, then I think that
there could possibly be a better way of achieving your goals in ruby
(perhaps a dsl instead of treating everything as raw text?). And
finally Ruby might not be the right language to choose and you’d be
better off throwing your funky generated regexps at perl to handle.

All this may change as soon as Ruby2/Rite appears with Oniguruma, but
for now Ruby isn’t the best tool for every possible problem - and if
it’s a major worry for you, perhaps lend a hand with Onig… or
indeed write a perl-alike regexp engine extension - competition is
good :slight_smile:

Kev

[1] http://www.geocities.jp/kosako3/oniguruma/
[2] Some people, when confronted with a problem, think ?I know, I?ll
use regular expressions.? Now they have two problems.?Jamie Zawinski,
in comp.lang.emacs

sender: “Michael W. Ryder” date: “Fri, Jul 28, 2006 at 07:30:11PM +0900” <<<EOQ
What’s wrong with people wanting to improve a language?
Nothing. In this case however we are not talking about improving,
but about making the regexp engine to be more forgiving, so that
we can be less careful when crafting a regex.
I wouldn’t prefer Ruby to think for me, and I very much like when
it says: " - You moron, look what you did: I’m gonna stay stuck on
this regex for 56783.99 years, and all because of you!! Can you
even wait that long?!?" so that slowly I become less of a moron…
I would definitely not like it to say: " - All is good, feed me crap,
I can take it. I have a good stomach.", and let me go away with it.
But maybe that’s just me…

I’m not against evolution, I just see it differently.

Alex

Alexandru E. Ungur wrote:

sender: “Michael W. Ryder” date: “Fri, Jul 28, 2006 at 07:30:11PM +0900” <<<EOQ
What’s wrong with people wanting to improve a language?
Nothing. In this case however we are not talking about improving,
but about making the regexp engine to be more forgiving, so that
we can be less careful when crafting a regex.
I wouldn’t prefer Ruby to think for me, and I very much like when
it says: " - You moron, look what you did: I’m gonna stay stuck on
this regex for 56783.99 years, and all because of you!! Can you
even wait that long?!?" so that slowly I become less of a moron…
I would definitely not like it to say: " - All is good, feed me crap,
I can take it. I have a good stomach.", and let me go away with it.
But maybe that’s just me…

First, you are misunderstanding the extend of the problem if you think
this
is just about RegExps that are not “carefully crafted”.
As I have pointed out, with very complex expressions or expressions that
get constructed automatically (which happens quite often) it is nearly
impossible to avoid this, even if you are en expert with RegExps.
Why would you prefer an expression engine that
stupidly does the wrong thing?

Second, the whole purpose of programming languages is to think for
humans when the kind of thinking required is not actually helping to
solve a
problem but rather a technicality. Your argument would prevent
optimizing
compilers and a lot of other things where indeed a language (or its
compiler/interpreter is doing a lot of thinking for humans).
We are talking about Ruby here - a high level language with a design
that
makes it easy to learn and use. Not assmebler or C.

Lastly, what you want Ruby to do here is even more out of the scope
of a RegExp engine than simply using optimization tricks for speeding
up some pathological cases: if you want it to tell you in advance
that it is going to get into exponentional processing for a certain case
you are wishing for the impossible. And if you want it to teach you
how to make better epxressions you are asking for something nearly
as complicated.

On 7/28/06, Alexandru E. Ungur [email protected] wrote:

I wouldn’t prefer Ruby to think for me, and I very much like when
it says: " - You moron, look what you did: I’m gonna stay stuck on
this regex for 56783.99 years, and all because of you!! Can you
even wait that long?!?" so that slowly I become less of a moron…
I would definitely not like it to say: " - All is good, feed me crap,
I can take it. I have a good stomach.", and let me go away with it.
But maybe that’s just me…

I don’t talk to my programming language nearly as much, though perhaps
I should :). But I agree with this. I’ve had regexes that degenerated
into such pathological cases once or twice and I’ve just fixed them.
Fixing ruby would hide the bug.

That being said this is just one more reason I’m afraid of regexes.
When they grow beyond very short I start to get nervous. That’s
probably how AI will develop and enslave us all, by some random noise
being interpreted as a regex… Which is one more reason not to fix
this, to slow down our evil overlords. :slight_smile:

Pedro.

On Sat, 29 Jul 2006 00:14:22 +0900, Roman H. wrote:

First, you are misunderstanding the extend of the problem if you think this is just about RegExps that are not "carefully crafted". As I have pointed out, with very complex expressions or expressions that get constructed automatically (which happens quite often) it is nearly impossible to avoid this, even if you are en expert with RegExps. Why would you prefer an expression engine that stupidly does *the wrong* thing?

If you really need to create such complicated regexps, then maybe
regexps
aren’t the right tool. You would be better of using a real parser or
parser generator.

Kristof

Alexandru E. Ungur wrote:

I can take it. I have a good stomach.", and let me go away with it.
But maybe that’s just me…

I’m not against evolution, I just see it differently.

Alex

So you are saying that Ruby should only be used by those who can craft a
perfect Regexp, all others need not bother? Personally, I choose a
language by how well it will do the job, not how arcane it is. Maybe
you would be happier with APL where you can write an entire program in a
single line. Very few other people will ever be able to read it, much
less maintain it.
Computer Languages are supposed to make it easier to accomplish a
purpose, not force you to think about the details. If I wanted to have
to spend hours fine tuning a single line of code so that it would run in
a reasonable amount of time I would use Assembly. People should not
have to spend a large amount of time trying to learn “features” in a
language to become productive. They should be able to create a
“reasonable” program quickly and then spend time later, if they have it
which most of us don’t, to make it better.

On Sat, Jul 29, 2006 at 04:35:05AM +0900, Michael W. Ryder wrote:

So you are saying that Ruby should only be used by those who can craft a
perfect Regexp, all others need not bother? Personally, I choose a
language by how well it will do the job, not how arcane it is. Maybe
you would be happier with APL where you can write an entire program in a
single line. Very few other people will ever be able to read it, much
less maintain it.

Well, that’s not exactly fair to languages in which you can write the
entire (presumably nontrivial) program on a single line. For instance,
Logo allows you to write an entire (nontrivial) program on a single
line, and it’s eminently readable – at least in the same class as Ruby.

Don’t allow this digression to distract you from the point you’re
making, though.

“Alexandru E. Ungur” [email protected] wrote in message
news:[email protected]

sender: “Michael W. Ryder” date: “Fri, Jul 28, 2006 at 07:30:11PM
+0900” <<<EOQ
What’s wrong with people wanting to improve a language?

Nothing. In this case however we are not talking about improving,
but about making the regexp engine to be more forgiving, so that
we can be less careful when crafting a regex.

So, you're against "defensive programming?"  Routines that check 

their
input for bad values are not an improvement over their earlier
counterparts
that would just crash, because forgiveness won’t teach you a lesson?
Are you against “memory protection” because a program should crash
and
take the whole system down with it instead of being forgiving of
people’s
mistakes?
Are you against “microkernels” because drivers should crash and halt
the
system and automatically restarting such sub-systems is just too
forgiving
and would hide the problem?

I wouldn’t prefer Ruby to think for me, and I very much like when
it says: " - You moron, look what you did: I’m gonna stay stuck on
this regex for 56783.99 years, and all because of you!! Can you
even wait that long?!?" so that slowly I become less of a moron…
I would definitely not like it to say: " - All is good, feed me crap,
I can take it. I have a good stomach.", and let me go away with it.
But maybe that’s just me…

I think you misunderstand the purpose of programming.  Programming

languages are not a yard stick to measure the length of your member.
This
is not a contest and it’s not a tutorial on regular expressions and how
they
work. Why don’t you want Ruby to think for you? That’s the role of a
computer… to do work for you! I wish I could just talk to Hal, my
computer, describe to it what work I want done and then have it do that
work! But no, computers are still primitive, and we must do most of the
thinking for it. However, we do try to get it to do as much work as
possible, including catching our mistakes and optimising our code!
Believe
me, removing an exponential search pattern from my regular expression
counts
as an optimisation!
Ruby halting is not Ruby saying “You moron, look what you did.”
That’s
Ruby sulking in a corner, not doing the work I (thought I) asked it to,
and
not telling me what’s wrong.

I’m not against evolution, I just see it differently.

I don't think you see it at all.  Pretend, just for a moment, that 

you
don’t care if you’re “right” or “wrong” and think about both sides of
the
issue. Think about the purpose of computers and think about the pros
and
cons…

A little off topic but here's another property of Ruby I'd like to

change. I didn’t realize I wanted this until just the other day, so it
remains interesting to me…

array.each { |i| new_var = i if i.some_test }
a_method.do_thing new_var # variables leak out of blocks!

Just Another Victim of the Ambient M. wrote:

Personally, I don’t use Ruby to write fast programs, I use it to
write (correct) programs fast.

That’s a comment worth highlighting, and keeping.
Thanks!