Lazy regexp is not lazy enough

Consider the string

xyz

x P0 y

x Q1 y

1 Placeholder2 2

abc

and the pattern

<(h\d)>.?Placeholder2.?</\1>

the pattern matches

x P0 y

x Q1 y

1 Placeholder2 2

I want it to match

1 Placeholder2 2

How can I do this? That is, I want to find the nearest


surrounding Placeholder2.

On Feb 8, 3:46pm, Ralph S. [email protected] wrote:

I want it to match

1 Placeholder2 2

How can I do this? That is, I want to find the nearest


surrounding Placeholder2.

Don’t know if/how to bend ruby’s regular expressions to do this but
trying to parse html with regular expressions is doomed to fail
eventually. Use something like nokogiri.

Fred

Ralph S. wrote in post #980334:

How can I do this? That is, I want to find the nearest


surrounding Placeholder2.

First, I’ll +1 Fred on using Nokogiri for parsing HTML.

But you can modify you regex so any markup ‘<’ characters are excluded
using [^<], as in:

p = /<(h\d)>[^<]+Placeholder2.*?</\1>/
s = “xyz

x P0 y

x Q1 y

1 Placeholder2 2

abc”
p =~ s
=> 33

$1
=> “h1”

Is that what you meant?

  • ff

Ralph S. wrote in post #980369:

Fearless, your solution seems to work … but I am clueless as to how
and why it works!

I’m FAR from a regex wizard, but it’s worth noting:
[abc] means match any occurrence of a or b or c
[^abc] means match any character that is NOT a or b or c
ergo
[^<] means match anything that is NOT an open bracket
[^<]+ means match one or more things are are not open brackets

so
/<(h\d)>[^<]+Placeholder2.*?</\1>/

matches an open < followed by an h followed by a digit followed by a
close >, then any number of characters as long as they are NOT <
followed by “Placeholder2” … etc

Of course, this will break as soon as someone adds attributes to the

tag, such as

, which is why we all like Nokogiri. I'm sorry your ISP doesn't agree! :)

In terms of Nokogiri and Hpricot …

I develop on a Windows machine and my ISP’s machine is a Linux.

Nokoogiri works great on my development machine. My ISP does not
support Nokogiri on his … unless I am willing to spend the money to
have him install it … which I don’t … and for political reasons, I
can’t move to another ISP.

Hpricot has given me lots and lots of problems …

So I have been reduced to parsing some html myself. I don’t want to do
it … but I gotta.

Fearless, your solution seems to work … but I am clueless as to how
and why it works!