String.scan hangs in some cases?

Has anyone ever see a call to String.scan (the “regular” form, not the
block form) just hang when processing a string?

If so, any insight into how that might happen?

Thanks,
Wes

Wes G. wrote:

Has anyone ever see a call to String.scan (the “regular” form, not the
block form) just hang when processing a string?

If so, any insight into how that might happen?

Thanks,
Wes

Got an example you can share that demonstrates the problem?

Maybe the string is huge? Maybe there’s a recursive lookup happening of
some kind? Dunno without more detail.

Regards,

Dan

From: “Daniel B.” [email protected]

Got an example you can share that demonstrates the problem?

Maybe the string is huge? Maybe there’s a recursive lookup happening of
some kind? Dunno without more detail.

I’m guessing exponential backtracking when the expression doesn’t match.

For ex:

x = “x” * 1000
x << “ab”
x =~ /xxa/ # matches quickly
x =~ /xxb/ # “hangs” trying exponential combinations with
backtracking

Regards,

Bill

From: “Robert K.” [email protected]

Yes, that’s a property of NFA based regular expression engines. Some
patterns have awful runtime characteristics because of the
backtracking. Without looking too close at your patter I guess in
this case it’s caused by having “x*” twice in your RX - not very
useful anyway.

Thanks :slight_smile: We haven’t seen the OP’s regexp yet; I was just
offering the above as an example of what can cause the
hanging he described.

Regards,

Bill

2006/5/24, Bill K. [email protected]:

x << “ab”
x =~ /xxa/ # matches quickly
x =~ /xxb/ # “hangs” trying exponential combinations with backtracking

Yes, that’s a property of NFA based regular expression engines. Some
patterns have awful runtime characteristics because of the
backtracking. Without looking too close at your patter I guess in
this case it’s caused by having “x*” twice in your RX - not very
useful anyway. If you are interested in more detail I recommend
reading “Mastering Regular Expressions”:

Kind regards

robert

Just for kicks, here was the original regex - attempting to match
against a string containing an entire HTML page OR a style sheet.

/(?=<style)?background.?(?=-image)?:.?url((.?)).?(?=</style>)*?/

I was attempting to make the <style and parts optional.

As it turns out, there are so many different ways that a style can be
specified in a HTML page, that I just went with the much simpler:

/background.?:.?url((.*?))/i

Wes

Bill K. wrote:

From: “Robert K.” [email protected]

Yes, that’s a property of NFA based regular expression engines. Some
patterns have awful runtime characteristics because of the
backtracking. Without looking too close at your patter I guess in
this case it’s caused by having “x*” twice in your RX - not very
useful anyway.

Thanks :slight_smile: We haven’t seen the OP’s regexp yet; I was just
offering the above as an example of what can cause the
hanging he described.

Regards,

Bill

All,

Thanks for giving me some idea of what might be wrong. For now, I’ve
gone with a simpler regex that I think will work fine. My original
pattern was attempting to allow several portions of the pattern to be
optional and the more that I think about it, the more it makes sense
that this might occur.

I will keep (?>pattern) in mind as a way to inhibit backtracking.

Thanks,
Wes