How to extract links of a particular class type

I have a web page which has n number of links.
The only i can differentiate links is with their class attribute.
I need the extract the set of links and their titles of a particular
class
type.

I tried using scrubyt exractor, dont have idea where to specify the
class
type.

google_data = Scrubyt::Extractor.define do
fetch ‘http://www.google.com/
fill_textfield ‘q’, ‘ruby’
submit
link “Ruby programming language” do
url “href”, :type => :attribute
end
junk = google_data.to_xml

And how to get the output in text/string format.

On 2008.11.17., at 19:17, Sita Rami R. wrote:

google_data = Scrubyt::Extractor.define do
fetch ‘http://www.google.com/
fill_textfield ‘q’, ‘ruby’
submit
link “Ruby programming language” do
url “href”, :type => :attribute
end
junk = google_data.to_xml

And how to get the output in text/string format.

btw. you should get the newest scRUBYt! , 0.4.05 which does not
depend on RubyInline, Ruby2Ruby and ParseTree etc.

What would you like to do exactly?

  1. class: use an xpath like this: stuff “//td[@class=‘red’]”
  2. text/string: use to_hash instead of to_xml.

HTH,
Peter


http://www.rubyrailways.com
http://scrubyt.org

require ‘rubygems’
require ‘scrubyt’

google_data = Scrubyt::Extractor.define do
fetch ‘gap inc - Google Search

link_title “//a[@class=‘l’]”, :write_text => true do
link_url
end
next_page “Next”, :limit => 3
end

output_file = open(“google_results.txt”, ‘w’) do |f|
google_data.to_hash.each do |result|
f.puts “#{result[:link_title]} - #{result[:link_url]}”
end
end

produces:

Shop clothes for women, men, maternity, baby, and kids at gap.com

HTH,
Peter


http://www.rubyrailways.com
http://scrubyt.org

My program need to do the following
Navigate to google site, providing “ruby” as search text, clicked the
search
button
Now we get the results page showing 1st 10 results.

I like to collect those 10 links and titles of those links and log them
in
an output file
using scrubyt extractor, i achived some thing, got all those 10 links
captured…but i am unable to get the titles.
And also i know how to extract in XML format…

but i need in this way .each Title and its Link in a single line

My scripts goes here…

require ‘rubygems’
require ‘scrubyt’

google_data = Scrubyt::Extractor.define do
#Perform the action(s)
fetch ‘http://www.google.com/
fill_textfield ‘q’, ‘Gap Inc’
submit
#Construct the wrapper
link “gap” do
url “href”, :type => :attribute
end
next_page “Next”, :limit => 10
end
junk = google_data.to_xml
puts junk

Please help me out…
Suggest anyother way, if this doesn’t work out

Thanks,
Sita.

Thanq very much peter…it surved my purpose

That’s great to hear :slight_smile: If you have any scRUBYt!/scraping related
questions, don’t hesitate to ask.

Cheers,
Peter


http://www.rubyrailways.com
http://scrubyt.org

Thanq very much peter…it surved my purpose

Peter,
Where can i find some good stuff relating to scruby/Ruby …any
preferred
sites…

Thanks,
Sita.

http://scrubyt.org - check out the older posts dealing with creating
scrapers for different pages
check out the examples:
http://rubyforge.org/frs/download.php/46812/scrubyt-examples-0.4.05.tgz

more is on the way…

Cheers,
Peter


http://www.rubyrailways.com
http://scrubyt.org

See my other post…

Cheers,
Peter


http://www.rubyrailways.com
http://scrubyt.org

Hi Peter,

I need to fetch some information from http://www.ebay.in.
My required fields are : Name of the product, Image, Price and the link
to that product.

am able to get the data using this method.
require ‘rubygems’
require ‘scrubyt’

google_data = Scrubyt::Extractor.define do
fetch ‘http://www.ebay.in
fill_textfield ‘satitle’, ‘ipod shuffle’
submit

record 

“/html/body/div[2]/div[4]/div[2]/div/div/div[2]/div[2]/div/div/div[3]/div/div/table/tr”
do
name “/td[2]/div/a”
price “/td[5]”
image “/td/a/img” do
url “src”, :type => :attribute
end
link “/td[2]/div/a” do
url “href”, :type => :attribute
end
end

end

google_data.to_xml.write($stdout, 1)

but my problem is for some products its not working properly. (div may
be changed). is there any better solution for this?

Thanks in advance,
Vipin

I also want to store the position of the resultpage on Google. Example:
rank 1 - Title - url

How can i fix this the code?

grtz…remco