Forum: Ruby How to grep the shortly matching in a string

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
C053ed426a75f36cf2326817e2e2d0f5?d=identicon&s=25 Arowana Lin (arowana)
on 2005-12-18 11:37
I used regular expression to grep content from a web page,but it seems
ruby always match the longest string,but I need to fetch the shortly
matching. How can I do ?
Thanks for help!
8aab717743d4f58a4453adb8b6855e1e?d=identicon&s=25 Robert Retzbach (Guest)
on 2005-12-18 12:00
(Received via mailing list)
Arowana Lin schrieb:

>I used regular expression to grep content from a web page,but it seems
>ruby always match the longest string,but I need to fetch the shortly
>matching. How can I do ?
>Thanks for help!
>
>
>
Would be nice to see the regexp. Maybe using non-greedy multipliers will
do.
You could also use character classes like [^<>] to match chars between <
and >.
But I don't know what you want to match, so please show me :>
C053ed426a75f36cf2326817e2e2d0f5?d=identicon&s=25 Arowana Lin (arowana)
on 2005-12-18 12:26
Robert Retzbach wrote:
> Arowana Lin schrieb:
>
>>I used regular expression to grep content from a web page,but it seems
>>ruby always match the longest string,but I need to fetch the shortly
>>matching. How can I do ?
>>Thanks for help!
>>
>>
>>
> Would be nice to see the regexp. Maybe using non-greedy multipliers will
> do.
> You could also use character classes like [^<>] to match chars between <
> and >.
> But I don't know what you want to match, so please show me :>

Thanks Robert!
here is the example.
<tr><td>Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3</td></tr>
I use the code below to fetch "Format" and "Format2" in the table
feature=content.scan(/[<tr><td>([\w\s])*<\/td><\/tr>/)
I want to get each row into array feature,like
feature[0]=Format;feature[1]=Format2...
but it match all the row into
feature[0]=Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3
82e62c756d89bc6fa0a0a2d7f2b1e617?d=identicon&s=25 Ross Bamford (Guest)
on 2005-12-18 14:33
(Received via mailing list)
On Sun, 18 Dec 2005 11:26:12 -0000, Arowana Lin <priceagle@gmail.com>
wrote:

>> Would be nice to see the regexp. Maybe using non-greedy multipliers will
> I want to get each row into array feature,like
> feature[0]=Format;feature[1]=Format2...
> but it match all the row into
> feature[0]=Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3
>
>

Non greedy quantifiers  could probably be used to do this, but given
that
your data is quite nicely delimited you may as well just scan  ;)

	s =
"<tr><td>Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3</td></tr>"
	s.scan(/<td>([^<]*)<\/td>/) { |it| puts it }

outputs:

	Format
	Format2
	Format3
	=>
"<tr><td>Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3</td></tr>"

Obviously it doesn't do quite what you want (you need an array) but that
part should be easy to add...
Fe9b2d0628c0943af374b2fe5b320a82?d=identicon&s=25 Eero Saynatkari (rue)
on 2005-12-18 14:40
Arowana Lin wrote:
> Robert Retzbach wrote:
>> Arowana Lin schrieb:
>>
>>>I used regular expression to grep content from a web page,but it seems
>>>ruby always match the longest string,but I need to fetch the shortly
>>>matching. How can I do ?
>>>Thanks for help!
>>>
>>>
>>>
>> Would be nice to see the regexp. Maybe using non-greedy multipliers will
>> do.
>> You could also use character classes like [^<>] to match chars between <
>> and >.
>> But I don't know what you want to match, so please show me :>
>
> Thanks Robert!
> here is the example.
> <tr><td>Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3</td></tr>
> I use the code below to fetch "Format" and "Format2" in the table
> feature=content.scan(/[<tr><td>([\w\s])*<\/td><\/tr>/)
> I want to get each row into array feature,like
> feature[0]=Format;feature[1]=Format2...
> but it match all the row into
> feature[0]=Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3

Yes, you want the non-greedy version .*? instead of .*
there. You can use ? with the *, + and {,} specifiers.


E
1fba4539b6cafe2e60a2916fa184fc2f?d=identicon&s=25 unknown (Guest)
on 2005-12-18 17:46
(Received via mailing list)
Hi --

On Sun, 18 Dec 2005, Ross Bamford wrote:

> Non greedy quantifiers  could probably be used to do this, but given that
> your data is quite nicely delimited you may as well just scan  ;)
>
> 	s =
> "<tr><td>Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3</td></tr>"
> 	s.scan(/<td>([^<]*)<\/td>/) { |it| puts it }

array.each {|it| puts it }  ==  puts array  :-)

> outputs:
>
> 	Format
> 	Format2
> 	Format3
> 	=>
> "<tr><td>Format</td></tr><tr><td>Format2</td></tr><tr><td>Format3</td></tr>"
>
> Obviously it doesn't do quite what you want (you need an array) but that part
> should be easy to add...

scan returns an array, so just grab it:

   results = s.scan(/.../).flatten  # flatten because of the ()'s

[And yes, everyone who's about to say it, we all know that you cannot
parse arbitrary HTML with a single regular expression.]


David

--
David A. Black
dblack@wobblini.net

"Ruby for Rails", from Manning Publications, coming April 2006!
http://www.manning.com/books/black
82e62c756d89bc6fa0a0a2d7f2b1e617?d=identicon&s=25 Ross Bamford (Guest)
on 2005-12-18 18:28
(Received via mailing list)
On Sun, 18 Dec 2005 16:45:49 -0000, <dblack@wobblini.net> wrote:

>> Obviously it doesn't do quite what you want (you need an array) but
>> that part should be easy to add...
>
> scan returns an array, so just grab it:
>
>    results = s.scan(/.../).flatten  # flatten because of the ()'s
>

I _knew_ I was missing something.

In mitigation, I'd like to say that it was my first mail-check of the
day.
Unfortunately, it makes no difference because I'd completely forgotten
about scan (etc) with no block anyway, so I had it returning the string
:P

Thanks, David :)
918c6daad03c85e51ad1a11f57017947?d=identicon&s=25 Devin Mullins (Guest)
on 2005-12-18 19:53
(Received via mailing list)
dblack@wobblini.net wrote:

> [And yes, everyone who's about to say it, we all know that you cannot
> parse arbitrary HTML with a single regular expression.]

Maybe in Perl 6! http://dev.perl.org/perl6/doc/design/apo/A05.html

Devin
And a shoutout to Rubyful Soup is in order, I think...
This topic is locked and can not be replied to.