Forum: Ruby How to extract links of a particular class type

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Be0fb99c879ff825b22ea5bce32176de?d=identicon&s=25 Sita Rami Reddy (Guest)
on 2008-11-17 19:21
(Received via mailing list)
I have a web page which has n number of links.
The only i can differentiate links is with their class attribute.
I need the extract the set of links and their titles of a particular
class
type.

I tried using scrubyt exractor, dont have idea where to specify the
class
type.

google_data = Scrubyt::Extractor.define do
fetch 'http://www.google.com/'
fill_textfield 'q', 'ruby'
submit
link "Ruby programming language" do
url "href", :type => :attribute
end
junk = google_data.to_xml


And how to get the output in text/string format.
F50f5d582d76f98686da34917531fe56?d=identicon&s=25 Peter Szinek (Guest)
on 2008-11-17 20:03
(Received via mailing list)
On 2008.11.17., at 19:17, Sita Rami Reddy wrote:

> google_data = Scrubyt::Extractor.define do
> fetch 'http://www.google.com/'
> fill_textfield 'q', 'ruby'
> submit
> link "Ruby programming language" do
> url "href", :type => :attribute
> end
> junk = google_data.to_xml
>
>
> And how to get the output in text/string format.

btw. you should get the newest scRUBYt! , 0.4.05 which does *not*
depend on RubyInline, Ruby2Ruby and ParseTree etc.

What would you like to do exactly?

1) class: use an xpath like this: stuff "//td[@class='red']"
2) text/string: use to_hash instead of to_xml.

HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
Be0fb99c879ff825b22ea5bce32176de?d=identicon&s=25 Sita Rami Reddy (Guest)
on 2008-11-17 20:35
(Received via mailing list)
My program need to do the following
Navigate to google site, providing "ruby" as search text, clicked the
search
button
Now we get the results page showing 1st 10 results.

I like to collect those 10 links and titles of those links and log them
in
an output file
using scrubyt extractor, i achived some thing, got all those 10 links
captured..but i am unable to get the titles.
And also i know how to extract in XML format...

but i need in this way .each Title and its Link in a single line

My scripts goes here..

require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
#Perform the action(s)
fetch 'http://www.google.com/'
fill_textfield 'q', 'Gap Inc'
submit
#Construct the wrapper
link "gap" do
url "href", :type => :attribute
end
next_page "Next", :limit => 10
end
junk = google_data.to_xml
puts junk

Please help me out..
Suggest anyother way, if this doesn't work out

Thanks,
Sita.
F50f5d582d76f98686da34917531fe56?d=identicon&s=25 Peter Szinek (Guest)
on 2008-11-17 20:59
(Received via mailing list)
require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
   fetch 'http://www.google.com/search?hl=en&q=gap+inc'

   link_title "//a[@class='l']", :write_text => true do
     link_url
   end
   next_page "Next", :limit => 3
end

output_file = open("google_results.txt", 'w') do |f|
   google_data.to_hash.each do |result|
     f.puts "#{result[:link_title]} - #{result[:link_url]}"
   end
end

produces:

Shop clothes for women, men, maternity, baby, and kids at gap.com ...
- http://www.gap.com/
Gap Inc. - http://www.gapinc.com/
Gap Inc. - Careers - http://www.gapinc.com/public/Careers/careers.shtml
The Gap Inc. News - The New York Times -
http://topics.nytimes.com/top/news/business/compan...
Gap (clothing retailer) - Wikipedia, the free encyclopedia -
http://en.wikipedia.org/wiki/Gap_(clothing)
GPS: Summary for GAP INC - Yahoo! Finance -
http://finance.yahoo.com/q?s=gps
GPS - BloggingStocks - http://gps.bloggingstocks.com/
....
....



HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
Be0fb99c879ff825b22ea5bce32176de?d=identicon&s=25 Sita Rami Reddy (Guest)
on 2008-11-17 21:43
(Received via mailing list)
Thanq very much peter..it surved my purpose
F50f5d582d76f98686da34917531fe56?d=identicon&s=25 Peter Szinek (Guest)
on 2008-11-17 21:48
(Received via mailing list)
> Thanq very much peter..it surved my purpose

That's great to hear :) If you have any scRUBYt!/scraping related
questions, don't hesitate to ask.

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
Be0fb99c879ff825b22ea5bce32176de?d=identicon&s=25 Sita Rami Reddy (Guest)
on 2008-11-18 00:22
(Received via mailing list)
Peter,
  Where can i find some good stuff relating to scruby/Ruby ....any
preferred
sites..

Thanks,
Sita.
F50f5d582d76f98686da34917531fe56?d=identicon&s=25 Peter Szinek (Guest)
on 2008-11-18 00:58
(Received via mailing list)
http://scrubyt.org - check out the older posts dealing with creating
scrapers for different pages
check out the examples:
http://rubyforge.org/frs/download.php/46812/scruby...

more is on the way...

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
30e4c9142032f31df2fe0d20672a4c2e?d=identicon&s=25 Vipin Vm (vmvipin)
on 2008-12-05 07:55
Hi Peter,

I need to fetch some information from http://www.ebay.in.
My required fields are : Name of the product, Image, Price and the link
to that product.

am able to get the data using this method.
require 'rubygems'
require 'scrubyt'

  google_data = Scrubyt::Extractor.define do
    fetch          'http://www.ebay.in'
    fill_textfield 'satitle', 'ipod shuffle'
    submit

    record
"/html/body/div[2]/div[4]/div[2]/div/div/div[2]/div[2]/div/div/div[3]/div/div/table/tr"
do
      name "/td[2]/div/a"
      price "/td[5]"
      image "/td/a/img" do
          url "src", :type => :attribute
      end
      link "/td[2]/div/a" do
          url "href", :type => :attribute
      end
    end

  end

 google_data.to_xml.write($stdout, 1)

but my problem is for some products its not working properly. (div may
be changed). is there any better solution for this?

Thanks in advance,
Vipin
F50f5d582d76f98686da34917531fe56?d=identicon&s=25 Peter Szinek (Guest)
on 2008-12-05 10:13
(Received via mailing list)
See my other post...


Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
28be3be213eeed3f66e0a20f1e20c7ac?d=identicon&s=25 Remco Swoany (zwaan123)
on 2009-02-05 22:32

I also want to store the position of the resultpage on Google. Example:
rank 1 - Title - url

How can i fix this the code?

grtz..remco
This topic is locked and can not be replied to.