Forum: Ruby on Rails Help with regex needed

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
8e73782b5a220c5d923a0195667e7406?d=identicon&s=25 Kim (Guest)
on 2008-10-22 07:06
(Received via mailing list)
Hi here is the array I am scanning:
["\n<td>&nbsp;<a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/
frameset~2489041&FF=rwr+121&1,1,\">The Academic Writer: A Brief Guide</
a>\n</td>\n<td >\n&nbsp;Ede, Lisa\n</td>\n\n<td >\n&nbsp;Valley
Reserves -- VR 282  -- AVAILABLE\n</td>\n\n<td >\n&nbsp;\n</td>\n\n</
tr>\n<tr>\n<td>&nbsp;<a href=\"/search~S13?/rWR%20121/rwr+121/1,7,9,B/
frameset~1334646&FF=rwr+121&1,1,\">Cultural literacy : what every
American needs to know / E.D. Hirsch, Jr. ; with an appendix, What li</
a>\n</td>\n<td >\n&nbsp;Hirsch, E. D. (Eric Donald), 1928-\n</td>\n
\n<td >\n&nbsp;Valley Reserves -- LC149 .H57 1987  -- AVAILABLE\n</td>
\n\n<td >\n&nbsp;\n</td>]

I am trying to pull out the essential (everything but the newlines and
such) value in between the <td></td>.

Here is the regex I am trying:
s.first.scan(/\<td \>(.*?)\<\/td\>/mi)
But I don't get the first <td> a href value.

Any help would be appreciated. Kim
29ebf90af6107d2eb39b587c7972639c?d=identicon&s=25 Mukund (Guest)
on 2008-10-22 07:11
(Received via mailing list)
Use hpricot plugin to handle HTML parsing.
134ea397777886d6f0aa992672a50eaa?d=identicon&s=25 Mark Thomas (Guest)
on 2008-10-22 15:27
(Received via mailing list)
On Oct 22, 12:14 am, Kim <Kim.Gri...@gmail.com> wrote:
> \n\n<td >\n&nbsp;\n</td>]
>
> I am trying to pull out the essential (everything but the newlines and
> such) value in between the <td></td>.

I agree with Mukund. Use Hpricot:

html = Hpricot(s.first)

html.search( "td" ) do |cell|
  puts cell.inner_html
end

-- Mark.
This topic is locked and can not be replied to.