Hello.
I’m using hpricot for the first time on a project. I need to get some
url’s from a web site, but I only want certain url’s.
I can grab all of the url’s from the page without a problem, but how can
I enhance this to select http://www.goodsite.com vs.
http://www.wrongsite.com?
I’d like to test for the string “goodsite”.
Thanks!
I’m using hpricot for the first time on a project. I need to get some
url’s from a web site, but I only want certain url’s.
I can grab all of the url’s from the page without a problem, but how
can
I enhance this to select http://www.goodsite.com vs.
http://www.wrongsite.com?
I’d like to test for the string “goodsite”.
…assuming doc is an hpricot object…
doc.search(“a[@href*=‘goodsite’]”) do |result|
…
end
Philip H. wrote:
…assuming doc is an hpricot object…
doc.search(“a[@href*=‘goodsite’]”) do |result|
…
end
Yes, that works to only grab the links that I need. Previously though,
I had used
(doc/:a).each do |link|
this only gave me the html string.
Can I do this the same way instead of returning
<a href= "http://…
I only want http:// so that I can use these links.
THANKS!
Actually, I was wrong in my previous post. Sorry!! Both results are
the same, i.e., I get back the <a href…
Is there a way for me to have a clean link? I want to insert this into
a table and then pull up the pages.
Thanks!
Figured it out.
doc.search(“a[@href*=‘goodsite’]”) do |result|
link = results.attributes[‘href’]
puts link
end