Regexp and string halts ruby

I’ve tried the following regexp/string match:

/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 – -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 – -i 10 -t 600 -q AW02, "

On a Debian/Linux system, with ruby version 1.8.4(-7sarge5) and even on
the more recent 1.8.5(-4) ruby enters an infinite active loop, and never
returns.

Can anyone confirm/deny this possible bug on different architectures?
Thanks.

On Fri, 26 Jan 2007, Ricardo R. wrote:

returns.

Can anyone confirm/deny this possible bug on different architectures?
Thanks.

it’s just a bad regex.

harp:~ > cat a.rb
#! /usr/bin/env ruby

re = /^((?:\s*[^,;=]+[;=][^,;=]+,?)+)$/

s = " pid=934, time=478611, command=TMQFORWARD, args=-C dom=ccaApp -g
650 -i
4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m 0 – -i 10 -t 600 -q
AW02,AW02S TMQFORWARD -C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 – -i 10 -t 600 -q AW02, "

m = re.match s

puts m.to_a.first

harp:~ > time ruby a.rb
pid=934, time=478611, command=TMQFORWARD, args=-C dom=ccaApp -g 650
-i
4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m 0 – -i 10 -t 600 -q
AW02,AW02S TMQFORWARD -C dom=ccaApp -g 650 -i 4278 -u pthp68 -U

real 0m0.008s
user 0m0.010s
sys 0m0.000s

google ‘exponential backtracking’

regards.

-a

Ricardo R. schrieb:

returns.

Can anyone confirm/deny this possible bug on different architectures?
Thanks.

There is no infinite loop - it it takes a long time, because your
regular
expression is possibly too complex. These “(……+……)+”
constructs are
dangerous. See “Mastering Regular Expressions - Friedl” for details.

An example with your regular expression and your string, starting with a
short
part an taken longer ones each step. The whole string may take some
hundred
years on my small machine :wink:

Code >>>>>

regex = /^((?:\s*.+[;=][^,]*,?)+)$/

string = " pid=934, time=478611,command=TMQFORWARD, args=-C dom=ccaApp
-g 650 -i
4278 -u pthp68, "
t = Time.new
string.match(regex)
puts “Match need appr. #{Time.now-t} seconds”
STDOUT.flush

string = " pid=934, time=478611,command=TMQFORWARD, args=-C dom=ccaApp
-g 650 -i
4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG, "
t = Time.new
string.match(regex)
puts “Match need appr. #{Time.now-t} seconds”
STDOUT.flush

string = " pid=934, time=478611,command=TMQFORWARD, args=-C dom=ccaApp
-g 650 -i
4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m 0 – -i 10 -t 600 -q
AW02,AW02S
TMQFORWARD, "
t = Time.new
string.match(regex)
puts “Match need appr. #{Time.now-t} seconds”

Output >>>>>

Match need appr. 7.0 seconds
Match need appr. 21.38 seconds
Match need appr. 60.658 seconds

EoE >>>>>

Wolfgang Nádasi-Donner

On 25.01.2007 19:20, Ricardo R. wrote:

returns.
Are you sure that it enters an infinite loop? I see you nest two star
operators and a “.+” inside a + operator - this calls for trouble (i.e.
exponential run time).

I don’t know what you are actually trying to achieve with your regexp
but you should consider changing it. What strikes me odd:

  • you have a group that covers the whole RX, this is superfluous as you
    can get the same more easily

  • You are grouping but not extracting, maybe you can show more code and
    more of the text you match against (if this is just a small part).

Kind regards

robert

Ricardo R. [email protected] writes:

/^((?:\s*.+[;=][^,]*,?)+)$/ =~ " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 – -i 10 -t 600 -q AW02,AW02S TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 – -i 10 -t 600 -q AW02, "

It is possible to write a regular expression that takes exponential
time to confirm that there’s no match. This is not really anything
too new. You can do similar things to Python and to Perl too,
although with Perl you have to try significantly harder since 5.6.
(There’s some code in there that detects certain exponential-time
matching patterns and adjusts behavior)

You can simplify your example to:

/(?:.+=[^,]*,?)+$/ =~ “x=x,x” * 12

Which will complete on my machine, albeit very slowly. Increasing the
number much more will get you what appears to be a hang.

Note that if the regular expression you gave were to match the string,
then this regular expression should also match:

/(?:\s*.+[;=][^,]*,?)$/

Consider this a “prerequisite” to your match - if this doesn’t match,
then there’s no way your full expression could match your input. In
your code, one thing you could do is check against this short
expression first, and only if that matches then check against the long
expression.

However, it may be much more productive simply to rewrite your
expression as:

/^((?:\s*.+[;=][^,],)(?:\s*.+[;=][^,]*,?))$/

I’ve expanded out your (?:blah)+ into a (?:blah)*(?:blah), and (most
importantly!) I’ve removed the “?” after the “,” in the first clause,
on the assumption that the comma is optional only on the last piece in
the line. With that change, ruby no longer goes nuts trying to match
this regular expression.

On Jan 25, 12:20 pm, Ricardo R. [email protected]
wrote:

returns.

Can anyone confirm/deny this possible bug on different architectures?
Thanks.


Posted viahttp://www.ruby-forum.com/.

Perhaps you should have done it this way:

s = " pid=934, time=478611,
command=TMQFORWARD, args=-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U
/home1/eclprp/logs/tx/ULOG -m 0 – -i 10 -t 600 -q AW02,AW02S
TMQFORWARD
-C dom=ccaApp -g 650 -i 4278 -u pthp68 -U /home1/eclprp/logs/tx/ULOG -m
0 – -i 10 -t 600 -q AW02, "

re = /[^\s;=]+[;=][^;=,\s]+/

p s.scan( re )

— output -----
[“pid=934”, “time=478611”, “command=TMQFORWARD”,
“args=-C”, “dom=ccaApp”, “dom=ccaApp”]

On Jan 26, 2007, at 3:58 AM, Ricardo R. wrote:

I think i’ll go and get Friedl’s Mastering Regular Expressions - i’m
curious and i like O’Reilly’s books anyway… :slight_smile:

You wont be disappointed. It is a tome of black magic. :wink:

James Edward G. II

Thanks guys, i guess that was it, the regexp was too complex. Daniel
Martin’s simplification (?:blah,?)+ into a (?:blah,)*(?:blah,?) really
did the trick! What (mis)led me to believe it was a bug was the fact
that i tried the original regexp/string in perl, and it returned rather
quickly. I didn’t know that, also according to Martin, perl has code in
that detects certain exponential-time
matching patterns and adjusts behavior.

I think i’ll go and get Friedl’s Mastering Regular Expressions - i’m
curious and i like O’Reilly’s books anyway… :slight_smile:

Best regards,
Ricardo R.

This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.

| Privacy Policy | Terms of Service | Remote Ruby Jobs