How to grep the shortly matching in a string

arowana · December 18, 2005, 11:37am

I used regular expression to grep content from a web page,but it seems
ruby always match the longest string,but I need to fetch the shortly
matching. How can I do ?
Thanks for help!

arowana · December 18, 2005, 12:00pm

Arowana L. schrieb:

I used regular expression to grep content from a web page,but it seems
ruby always match the longest string,but I need to fetch the shortly
matching. How can I do ?
Thanks for help!

Would be nice to see the regexp. Maybe using non-greedy multipliers will
do.
You could also use character classes like [^<>] to match chars between <
and >.
But I don’t know what you want to match, so please show me :>

arowana · December 18, 2005, 12:26pm

Robert R. wrote:

Arowana L. schrieb:

I used regular expression to grep content from a web page,but it seems
ruby always match the longest string,but I need to fetch the shortly
matching. How can I do ?
Thanks for help!

Would be nice to see the regexp. Maybe using non-greedy multipliers will
do.
You could also use character classes like [^<>] to match chars between <
and >.
But I don’t know what you want to match, so please show me :>

Thanks Robert!
here is the example.

FormatFormat2Format3 I use the code below to fetch "Format" and "Format2" in the table feature=content.scan(/[([\w\s])*<\/td><\/tr>/) I want to get each row into array feature,like feature[0]=Format;feature[1]=Format2... but it match all the row into feature[0]=FormatFormat2Format3

arowana · December 18, 2005, 2:40pm

Arowana L. wrote:

Robert R. wrote:

Arowana L. schrieb:

I used regular expression to grep content from a web page,but it seems
ruby always match the longest string,but I need to fetch the shortly
matching. How can I do ?
Thanks for help!

Would be nice to see the regexp. Maybe using non-greedy multipliers will
do.
You could also use character classes like [^<>] to match chars between <
and >.
But I don’t know what you want to match, so please show me :>

Thanks Robert!
here is the example.
Format Format2 Format3 I use the code below to fetch "Format" and "Format2" in the table feature=content.scan(/[([\w\s])*<\/td><\/tr>/) I want to get each row into array feature,like feature[0]=Format;feature[1]=Format2... but it match all the row into feature[0]=Format Format2 Format3
Yes, you want the non-greedy version .? instead of .
there. You can use ? with the *, + and {,} specifiers.

E

arowana · December 18, 2005, 5:46pm

Hi –

On Sun, 18 Dec 2005, Ross B. wrote:

Non greedy quantifiers could probably be used to do this, but given that
your data is quite nicely delimited you may as well just scan

s =
“FormatFormat2Format3”
s.scan(/([^<]*)</td>/) { |it| puts it }

array.each {|it| puts it } == puts array

outputs:

Format
Format2
Format3
=>
“FormatFormat2Format3”

Obviously it doesn’t do quite what you want (you need an array) but that part
should be easy to add…

scan returns an array, so just grab it:

results = s.scan(/…/).flatten # flatten because of the ()'s

[And yes, everyone who’s about to say it, we all know that you cannot
parse arbitrary HTML with a single regular expression.]

David

–
David A. Black
[email protected]

“Ruby for Rails”, from Manning Publications, coming April 2006!

arowana · December 18, 2005, 6:28pm

On Sun, 18 Dec 2005 16:45:49 -0000, [email protected] wrote:

Obviously it doesn’t do quite what you want (you need an array) but
that part should be easy to add…

scan returns an array, so just grab it:

results = s.scan(/…/).flatten # flatten because of the ()'s

I knew I was missing something.

In mitigation, I’d like to say that it was my first mail-check of the
day.
Unfortunately, it makes no difference because I’d completely forgotten
about scan (etc) with no block anyway, so I had it returning the string

Thanks, David

arowana · December 18, 2005, 2:33pm

On Sun, 18 Dec 2005 11:26:12 -0000, Arowana L. [email protected]
wrote:

Would be nice to see the regexp. Maybe using non-greedy multipliers will
I want to get each row into array feature,like
feature[0]=Format;feature[1]=Format2…
but it match all the row into
feature[0]=FormatFormat2Format3

Non greedy quantifiers could probably be used to do this, but given
that
your data is quite nicely delimited you may as well just scan

s =

“FormatFormat2Format3”
s.scan(/([^<]*)</td>/) { |it| puts it }

outputs:

Format
Format2
Format3
=>

“FormatFormat2Format3”

Obviously it doesn’t do quite what you want (you need an array) but that
part should be easy to add…

arowana · December 18, 2005, 7:53pm

[email protected] wrote:

[And yes, everyone who’s about to say it, we all know that you cannot
parse arbitrary HTML with a single regular expression.]

Maybe in Perl 6! Apocalypse 5: Pattern Matching

Devin
And a shoutout to Rubyful Soup is in order, I think…