Forum: Ruby How to extract links of a particular class type

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Sita Rami R. (Guest)
on 2008-11-17 20:21
(Received via mailing list)
I have a web page which has n number of links.
The only i can differentiate links is with their class attribute.
I need the extract the set of links and their titles of a particular
class
type.

I tried using scrubyt exractor, dont have idea where to specify the
class
type.

google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/'
fill_textfield 'q', 'ruby'
submit
link "Ruby programming language" do
url "href", :type => :attribute
end
junk = google_data.to_xml


And how to get the output in text/string format.
Peter S. (Guest)
on 2008-11-17 21:03
(Received via mailing list)
On 2008.11.17., at 19:17, Sita Rami R. wrote:

> google_data = Scrubyt::Extractor.define do
> fetch 'http://www.google.com/'
> fill_textfield 'q', 'ruby'
> submit
> link "Ruby programming language" do
> url "href", :type => :attribute
> end
> junk = google_data.to_xml
>
>
> And how to get the output in text/string format.

btw. you should get the newest scRUBYt! , 0.4.05 which does *not*
depend on RubyInline, Ruby2Ruby and ParseTree etc.

What would you like to do exactly?

1) class: use an xpath like this: stuff "//td[@class='red']"
2) text/string: use to_hash instead of to_xml.

HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
Sita Rami R. (Guest)
on 2008-11-17 21:35
(Received via mailing list)
My program need to do the following
Navigate to google site, providing "ruby" as search text, clicked the
search
button
Now we get the results page showing 1st 10 results.

I like to collect those 10 links and titles of those links and log them
in
an output file
using scrubyt extractor, i achived some thing, got all those 10 links
captured..but i am unable to get the titles.
And also i know how to extract in XML format...

but i need in this way .each Title and its Link in a single line

My scripts goes here..

require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
#Perform the action(s)
fetch 'http://www.google.com/'
fill_textfield 'q', 'Gap Inc'
submit
#Construct the wrapper
link "gap" do
url "href", :type => :attribute
end
next_page "Next", :limit => 10
end
junk = google_data.to_xml
puts junk

Please help me out..
Suggest anyother way, if this doesn't work out

Thanks,
Sita.
Peter S. (Guest)
on 2008-11-17 21:59
(Received via mailing list)
require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
   fetch 'http://www.google.com/search?hl=en&q=gap+inc'

   link_title "//a[@class='l']", :write_text => true do
     link_url
   end
   next_page "Next", :limit => 3
end

output_file = open("google_results.txt", 'w') do |f|
   google_data.to_hash.each do |result|
     f.puts "#{result[:link_title]} - #{result[:link_url]}"
   end
end

produces:

Shop clothes for women, men, maternity, baby, and kids at gap.com ...
- http://www.gap.com/
Gap Inc. - http://www.gapinc.com/
Gap Inc. - Careers - http://www.gapinc.com/public/Careers/careers.shtml
The Gap Inc. News - The New York Times -
http://topics.nytimes.com/top/news/business/compan...
Gap (clothing retailer) - Wikipedia, the free encyclopedia -
http://en.wikipedia.org/wiki/Gap_(clothing)
GPS: Summary for GAP INC - Yahoo! Finance -
http://finance.yahoo.com/q?s=gps
GPS - BloggingStocks - http://gps.bloggingstocks.com/
....
....



HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
Sita Rami R. (Guest)
on 2008-11-17 22:43
(Received via mailing list)
Thanq very much peter..it surved my purpose
Peter S. (Guest)
on 2008-11-17 22:48
(Received via mailing list)
> Thanq very much peter..it surved my purpose

That's great to hear :) If you have any scRUBYt!/scraping related
questions, don't hesitate to ask.

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
Sita Rami R. (Guest)
on 2008-11-18 01:22
(Received via mailing list)
Peter,
  Where can i find some good stuff relating to scruby/Ruby ....any
preferred
sites..

Thanks,
Sita.
Peter S. (Guest)
on 2008-11-18 01:58
(Received via mailing list)
http://scrubyt.org - check out the older posts dealing with creating
scrapers for different pages
check out the examples:
http://rubyforge.org/frs/download.php/46812/scruby...

more is on the way...

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
Vipin V. (Guest)
on 2008-12-05 08:55
Hi Peter,

I need to fetch some information from http://www.ebay.in.
My required fields are : Name of the product, Image, Price and the link
to that product.

am able to get the data using this method.
require 'rubygems'
require 'scrubyt'

  google_data = Scrubyt::Extractor.define do
    fetch          'http://www.ebay.in'
    fill_textfield 'satitle', 'ipod shuffle'
    submit

    record
"/html/body/div[2]/div[4]/div[2]/div/div/div[2]/div[2]/div/div/div[3]/div/div/table/tr"
do
      name "/td[2]/div/a"
      price "/td[5]"
      image "/td/a/img" do
          url "src", :type => :attribute
      end
      link "/td[2]/div/a" do
          url "href", :type => :attribute
      end
    end

  end

 google_data.to_xml.write($stdout, 1)

but my problem is for some products its not working properly. (div may
be changed). is there any better solution for this?

Thanks in advance,
Vipin
Peter S. (Guest)
on 2008-12-05 11:13
(Received via mailing list)
See my other post...


Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
Remco S. (Guest)
on 2009-02-05 23:32

I also want to store the position of the resultpage on Google. Example:
rank 1 - Title - url

How can i fix this the code?

grtz..remco
This topic is locked and can not be replied to.