Help with regex needed

kim · October 22, 2008, 7:06am

Hi here is the array I am scanning:
["\n

<a href="/search~S13?/rWR%20121/rwr+121/1,7,9,B/
frameset~2489041&FF=rwr+121&1,1,">The Academic Writer: A Brief Guide</
a>\n\n\n Ede, Lisa\n\n\n\n Valley
Reserves – VR 282 – AVAILABLE\n\n\n\n \n\n\n</
tr>\n\n <a href="/search~S13?/rWR%20121/rwr+121/1,7,9,B/
frameset~1334646&FF=rwr+121&1,1,">Cultural literacy : what every
American needs to know / E.D. Hirsch, Jr. ; with an appendix, What li</
a>\n\n\n Hirsch, E. D. (Eric Donald), 1928-\n\n
\n\n Valley Reserves – LC149 .H57 1987 – AVAILABLE\n
\n\n\n \n]

I am trying to pull out the essential (everything but the newlines and
such) value in between the

.

Here is the regex I am trying:
s.first.scan(/<td >(.*?)</td>/mi)
But I don’t get the first

a href value.

Any help would be appreciated. Kim

kim · October 22, 2008, 7:11am

Use hpricot plugin to handle HTML parsing.

kim · October 22, 2008, 3:27pm

On Oct 22, 12:14 am, Kim [email protected] wrote:

\n\n\n \n]

I am trying to pull out the essential (everything but the newlines and
such) value in between the .

I agree with Mukund. Use Hpricot:

html = Hpricot(s.first)

html.search( “td” ) do |cell|
puts cell.inner_html
end

– Mark.