Help with regex needed


#1

Hi here is the array I am scanning:
["\n

 <a href="/search~S13?/rWR%20121/rwr+121/1,7,9,B/
frameset~2489041&FF=rwr+121&1,1,">The Academic Writer: A Brief Guide</
a>\n\n\n Ede, Lisa\n\n\n\n Valley
Reserves – VR 282 – AVAILABLE\n\n\n\n \n\n\n</
tr>\n\n <a href="/search~S13?/rWR%20121/rwr+121/1,7,9,B/
frameset~1334646&FF=rwr+121&1,1,">Cultural literacy : what every
American needs to know / E.D. Hirsch, Jr. ; with an appendix, What li</
a>\n\n\n Hirsch, E. D. (Eric Donald), 1928-\n\n
\n\n Valley Reserves – LC149 .H57 1987 – AVAILABLE\n
\n\n\n \n]

I am trying to pull out the essential (everything but the newlines and
such) value in between the

.

Here is the regex I am trying:
s.first.scan(/<td >(.*?)</td>/mi)
But I don’t get the first

a href value.

Any help would be appreciated. Kim


#2

Use hpricot plugin to handle HTML parsing.


#3

On Oct 22, 12:14 am, Kim removed_email_address@domain.invalid wrote:

\n\n

\n \n]

I am trying to pull out the essential (everything but the newlines and
such) value in between the

.

I agree with Mukund. Use Hpricot:

html = Hpricot(s.first)

html.search( “td” ) do |cell|
puts cell.inner_html
end

– Mark.