Forum: Ruby Data extraction using Scrubyt

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Vipin V. (Guest)
on 2008-12-05 09:08
Hi All,

I need to fetch some information from http://www.ebay.in.
My required fields are : Name of the product, Image, Price and the link
to that product.

am able to get the data using this method.
require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
  fetch          'http://www.ebay.in'
  fill_textfield 'satitle', 'ipod shuffle'
  submit
  record
"/html/body/div[2]/div[4]/div[2]/div/div/div[2]/div[2]/div/div/div[3]/div/div/table/tr"
do
    name "/td[2]/div/a"
    price "/td[5]"
    image "/td/a/img" do
        url "src", :type => :attribute
    end
    link "/td[2]/div/a" do
        url "href", :type => :attribute
    end
  end
end

google_data.to_xml.write($stdout, 1)

but my problem is for some products its not working properly. (div may
be changed). is there any better solution for this?

Thanks in advance,
Vipin
Peter S. (Guest)
on 2008-12-05 11:13
(Received via mailing list)
You need to create smarter XPaths, relying on CSS id/class attributes
or other properties rather than a full XPath from the root - for
example:

require 'rubygems'
require 'scrubyt'

ebay_data = Scrubyt::Extractor.define  do

      fetch 'http://www.ebay.in/'
      fill_textfield 'satitle', 'ipod'
      submit

      record "//table[@class='nol']" do
        name "//td[@class='details']/div/a"
      end
end

puts ebay_data.to_xml

etc.

This way your scraper will be more robust and prone to page changes.

HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
Vipin V. (Guest)
on 2008-12-06 05:52
Hi Peter,

Thanks for the Help... its working fine :)

Vipin

Peter S. wrote:
> You need to create smarter XPaths, relying on CSS id/class attributes
> or other properties rather than a full XPath from the root - for
> example:
>
> require 'rubygems'
> require 'scrubyt'
>
> ebay_data = Scrubyt::Extractor.define  do
>
>       fetch 'http://www.ebay.in/'
>       fill_textfield 'satitle', 'ipod'
>       submit
>
>       record "//table[@class='nol']" do
>         name "//td[@class='details']/div/a"
>       end
> end
>
> puts ebay_data.to_xml
>
> etc.
>
> This way your scraper will be more robust and prone to page changes.
>
> HTH,
> Peter
> ___
> http://www.rubyrailways.com
> http://scrubyt.org
Peter S. (Guest)
on 2008-12-06 09:31
(Received via mailing list)
On 2008.12.06., at 4:46, Vipin Vm wrote:

> Hi Peter,
>
> Thanks for the Help... its working fine :)

Glad that I could help. I am just working on a new release btw, so
stay tuned!

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
This topic is locked and can not be replied to.