Forum: Ruby on Rails Help with regex needed

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Kim (Guest)
on 2008-10-22 09:06
(Received via mailing list)
Hi here is the array I am scanning:
["\n<td>&nbsp;<a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/
frameset~2489041&FF=rwr+121&1,1,\">The Academic Writer: A Brief Guide</
a>\n</td>\n<td >\n&nbsp;Ede, Lisa\n</td>\n\n<td >\n&nbsp;Valley
Reserves -- VR 282  -- AVAILABLE\n</td>\n\n<td >\n&nbsp;\n</td>\n\n</
tr>\n<tr>\n<td>&nbsp;<a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/
frameset~1334646&FF=rwr+121&1,1,\">Cultural literacy : what every
American needs to know / E.D. Hirsch, Jr. ; with an appendix, What li</
a>\n</td>\n<td >\n&nbsp;Hirsch, E. D. (Eric Donald), 1928-\n</td>\n
\n<td >\n&nbsp;Valley Reserves -- LC149 .H57 1987  -- AVAILABLE\n</td>
\n\n<td >\n&nbsp;\n</td>]

I am trying to pull out the essential (everything but the newlines and
such) value in between the <td></td>.

Here is the regex I am trying:
s.first.scan(/\<td \>(.*?)\<\/td\>/mi)
But I don't get the first <td> a href value.

Any help would be appreciated. Kim
Mukund (Guest)
on 2008-10-22 09:11
(Received via mailing list)
Use hpricot plugin to handle HTML parsing.
Mark T. (Guest)
on 2008-10-22 17:27
(Received via mailing list)
On Oct 22, 12:14 am, Kim <removed_email_address@domain.invalid> wrote:
> \n\n<td >\n&nbsp;\n</td>]
>
> I am trying to pull out the essential (everything but the newlines and
> such) value in between the <td></td>.

I agree with Mukund. Use Hpricot:

html = Hpricot(s.first)

html.search( "td" ) do |cell|
  puts cell.inner_html
end

-- Mark.
This topic is locked and can not be replied to.