String.scan hangs in some cases?


#1

Has anyone ever see a call to String.scan (the “regular” form, not the
block form) just hang when processing a string?

If so, any insight into how that might happen?

Thanks,
Wes


#2

Wes G. wrote:

Has anyone ever see a call to String.scan (the “regular” form, not the
block form) just hang when processing a string?

If so, any insight into how that might happen?

Thanks,
Wes

Got an example you can share that demonstrates the problem?

Maybe the string is huge? Maybe there’s a recursive lookup happening of
some kind? Dunno without more detail.

Regards,

Dan


#3

From: “Daniel B.” removed_email_address@domain.invalid

Got an example you can share that demonstrates the problem?

Maybe the string is huge? Maybe there’s a recursive lookup happening of
some kind? Dunno without more detail.

I’m guessing exponential backtracking when the expression doesn’t match.

For ex:

x = “x” * 1000
x << “ab”
x =~ /xxa/ # matches quickly
x =~ /xxb/ # “hangs” trying exponential combinations with
backtracking

Regards,

Bill


#4

From: “Robert K.” removed_email_address@domain.invalid

Yes, that’s a property of NFA based regular expression engines. Some
patterns have awful runtime characteristics because of the
backtracking. Without looking too close at your patter I guess in
this case it’s caused by having “x*” twice in your RX - not very
useful anyway.

Thanks :slight_smile: We haven’t seen the OP’s regexp yet; I was just
offering the above as an example of what can cause the
hanging he described.

Regards,

Bill


#5

2006/5/24, Bill K. removed_email_address@domain.invalid:

x << “ab”
x =~ /xxa/ # matches quickly
x =~ /xxb/ # “hangs” trying exponential combinations with backtracking

Yes, that’s a property of NFA based regular expression engines. Some
patterns have awful runtime characteristics because of the
backtracking. Without looking too close at your patter I guess in
this case it’s caused by having “x*” twice in your RX - not very
useful anyway. If you are interested in more detail I recommend
reading “Mastering Regular Expressions”:

http://www.amazon.com/gp/product/0596002890/

Kind regards

robert


#6

Just for kicks, here was the original regex - attempting to match
against a string containing an entire HTML page OR a style sheet.

/(?=<style)?background.?(?=-image)?:.?url((.?)).?(?=</style>)*?/

I was attempting to make the <style and parts optional.

As it turns out, there are so many different ways that a style can be
specified in a HTML page, that I just went with the much simpler:

/background.?:.?url((.*?))/i

Wes


#7

Bill K. wrote:

From: “Robert K.” removed_email_address@domain.invalid

Yes, that’s a property of NFA based regular expression engines. Some
patterns have awful runtime characteristics because of the
backtracking. Without looking too close at your patter I guess in
this case it’s caused by having “x*” twice in your RX - not very
useful anyway.

Thanks :slight_smile: We haven’t seen the OP’s regexp yet; I was just
offering the above as an example of what can cause the
hanging he described.

Regards,

Bill

All,

Thanks for giving me some idea of what might be wrong. For now, I’ve
gone with a simpler regex that I think will work fine. My original
pattern was attempting to allow several portions of the pattern to be
optional and the more that I think about it, the more it makes sense
that this might occur.

I will keep (?>pattern) in mind as a way to inhibit backtracking.

Thanks,
Wes