Forum: Ruby String.scan hangs in some cases?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Bb4bdf2b184027bc38d4fb529770cde5?d=identicon&s=25 Wes Gamble (weyus)
on 2006-05-24 02:29
Has anyone ever see a call to String.scan (the "regular" form, not the
block form) just hang when processing a string?

If so, any insight into how that might happen?

Thanks,
Wes
Aee77dba395ece0a04c688b05b07cd63?d=identicon&s=25 Daniel Berger (Guest)
on 2006-05-24 02:43
(Received via mailing list)
Wes Gamble wrote:
> Has anyone ever see a call to String.scan (the "regular" form, not the
> block form) just hang when processing a string?
>
> If so, any insight into how that might happen?
>
> Thanks,
> Wes
>

Got an example you can share that demonstrates the problem?

Maybe the string is huge?  Maybe there's a recursive lookup happening of
some kind?  Dunno without more detail.

Regards,

Dan
4feed660d3728526797edeb4f0467384?d=identicon&s=25 Bill Kelly (Guest)
on 2006-05-24 02:59
(Received via mailing list)
From: "Daniel Berger" <djberg96@gmail.com>
> Got an example you can share that demonstrates the problem?
>
> Maybe the string is huge?  Maybe there's a recursive lookup happening of
> some kind?  Dunno without more detail.

I'm guessing exponential backtracking when the expression doesn't match.

For ex:

x = "x" * 1000
x << "ab"
x =~ /x*x*a/       # matches quickly
x =~ /x*x*b/       # "hangs" trying exponential combinations with
backtracking


Regards,

Bill
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2006-05-24 09:12
(Received via mailing list)
2006/5/24, Bill Kelly <billk@cts.com>:
> >
> x << "ab"
> x =~ /x*x*a/       # matches quickly
> x =~ /x*x*b/       # "hangs" trying exponential combinations with backtracking

Yes, that's a property of NFA based regular expression engines.  Some
patterns have awful runtime characteristics because of the
backtracking.  Without looking too close at your patter I guess in
this case it's caused by having "x*" twice in your RX - not very
useful anyway.  If you are interested in more detail I recommend
reading "Mastering Regular Expressions":

http://www.amazon.com/gp/product/0596002890/

Kind regards

robert
4feed660d3728526797edeb4f0467384?d=identicon&s=25 Bill Kelly (Guest)
on 2006-05-24 10:16
(Received via mailing list)
From: "Robert Klemme" <shortcutter@googlemail.com>
>
> Yes, that's a property of NFA based regular expression engines.  Some
> patterns have awful runtime characteristics because of the
> backtracking.  Without looking too close at your patter I guess in
> this case it's caused by having "x*" twice in your RX - not very
> useful anyway.

Thanks :)  We haven't seen the OP's regexp yet; I was just
offering the above as an example of what can cause the
hanging he described.


Regards,

Bill
Bb4bdf2b184027bc38d4fb529770cde5?d=identicon&s=25 Wes Gamble (weyus)
on 2006-05-24 17:03
Bill Kelly wrote:
> From: "Robert Klemme" <shortcutter@googlemail.com>
>>
>> Yes, that's a property of NFA based regular expression engines.  Some
>> patterns have awful runtime characteristics because of the
>> backtracking.  Without looking too close at your patter I guess in
>> this case it's caused by having "x*" twice in your RX - not very
>> useful anyway.
>
> Thanks :)  We haven't seen the OP's regexp yet; I was just
> offering the above as an example of what can cause the
> hanging he described.
>
>
> Regards,
>
> Bill

All,

Thanks for giving me some idea of what might be wrong.  For now, I've
gone with a simpler regex that I think will work fine.  My original
pattern was attempting to allow several portions of the pattern to be
optional and the more that I think about it, the more it makes sense
that this might occur.

I will keep (?>pattern) in mind as a way to inhibit backtracking.

Thanks,
Wes
Bb4bdf2b184027bc38d4fb529770cde5?d=identicon&s=25 Wes Gamble (weyus)
on 2006-05-24 17:11
Just for kicks, here was the original regex - attempting to match
against a string containing an entire HTML page OR a style sheet.

/(?=<style)*?background.*?(?=-image)*?:.*?url\((.*?)\).*?(?=<\/style>)*?/

I was attempting to make the <style  and </style> parts optional.

As it turns out, there are so many different ways that a style can be
specified in a HTML page, that I just went with the much simpler:

/background.*?:.*?url\((.*?)\)/i

Wes
This topic is locked and can not be replied to.