How to grep the shortly matching in a string


#1

I used regular expression to grep content from a web page,but it seems
ruby always match the longest string,but I need to fetch the shortly
matching. How can I do ?
Thanks for help!


#2

Arowana L. schrieb:

I used regular expression to grep content from a web page,but it seems
ruby always match the longest string,but I need to fetch the shortly
matching. How can I do ?
Thanks for help!

Would be nice to see the regexp. Maybe using non-greedy multipliers will
do.
You could also use character classes like [^<>] to match chars between <
and >.
But I don’t know what you want to match, so please show me :>


#3

Robert R. wrote:

Arowana L. schrieb:

I used regular expression to grep content from a web page,but it seems
ruby always match the longest string,but I need to fetch the shortly
matching. How can I do ?
Thanks for help!

Would be nice to see the regexp. Maybe using non-greedy multipliers will
do.
You could also use character classes like [^<>] to match chars between <
and >.
But I don’t know what you want to match, so please show me :>

Thanks Robert!
here is the example.

FormatFormat2Format3 I use the code below to fetch "Format" and "Format2" in the table feature=content.scan(/[([\w\s])*<\/td><\/tr>/) I want to get each row into array feature,like feature[0]=Format;feature[1]=Format2... but it match all the row into feature[0]=FormatFormat2Format3

#4

Arowana L. wrote:

Robert R. wrote:

Arowana L. schrieb:

I used regular expression to grep content from a web page,but it seems
ruby always match the longest string,but I need to fetch the shortly
matching. How can I do ?
Thanks for help!

Would be nice to see the regexp. Maybe using non-greedy multipliers will
do.
You could also use character classes like [^<>] to match chars between <
and >.
But I don’t know what you want to match, so please show me :>

Thanks Robert!
here is the example.

Format Format2 Format3 I use the code below to fetch "Format" and "Format2" in the table feature=content.scan(/[([\w\s])*<\/td><\/tr>/) I want to get each row into array feature,like feature[0]=Format;feature[1]=Format2... but it match all the row into feature[0]=Format Format2 Format3

Yes, you want the non-greedy version .? instead of .
there. You can use ? with the *, + and {,} specifiers.

E


#5

Hi –

On Sun, 18 Dec 2005, Ross B. wrote:

Non greedy quantifiers could probably be used to do this, but given that
your data is quite nicely delimited you may as well just scan :wink:

s =

Format Format2 Format3”
s.scan(/([^<]*)</td>/) { |it| puts it }

array.each {|it| puts it } == puts array :slight_smile:

outputs:

Format
Format2
Format3
=>

Format Format2 Format3”

Obviously it doesn’t do quite what you want (you need an array) but that part
should be easy to add…

scan returns an array, so just grab it:

results = s.scan(/…/).flatten # flatten because of the ()'s

[And yes, everyone who’s about to say it, we all know that you cannot
parse arbitrary HTML with a single regular expression.]

David


David A. Black
removed_email_address@domain.invalid

“Ruby for Rails”, from Manning Publications, coming April 2006!


#6

On Sun, 18 Dec 2005 16:45:49 -0000, removed_email_address@domain.invalid wrote:

Obviously it doesn’t do quite what you want (you need an array) but
that part should be easy to add…

scan returns an array, so just grab it:

results = s.scan(/…/).flatten # flatten because of the ()'s

I knew I was missing something.

In mitigation, I’d like to say that it was my first mail-check of the
day.
Unfortunately, it makes no difference because I’d completely forgotten
about scan (etc) with no block anyway, so I had it returning the string
:stuck_out_tongue:

Thanks, David :slight_smile:


#7

On Sun, 18 Dec 2005 11:26:12 -0000, Arowana L. removed_email_address@domain.invalid
wrote:

Would be nice to see the regexp. Maybe using non-greedy multipliers will
I want to get each row into array feature,like
feature[0]=Format;feature[1]=Format2…
but it match all the row into
feature[0]=Format

Format2 Format3

Non greedy quantifiers could probably be used to do this, but given
that
your data is quite nicely delimited you may as well just scan :wink:

s =

Format Format2 Format3”
s.scan(/([^<]*)</td>/) { |it| puts it }

outputs:

Format
Format2
Format3
=>

Format Format2 Format3”

Obviously it doesn’t do quite what you want (you need an array) but that
part should be easy to add…


#8

removed_email_address@domain.invalid wrote:

[And yes, everyone who’s about to say it, we all know that you cannot
parse arbitrary HTML with a single regular expression.]

Maybe in Perl 6! http://dev.perl.org/perl6/doc/design/apo/A05.html

Devin
And a shoutout to Rubyful Soup is in order, I think…