Data extraction using Scrubyt

Hi All,

I need to fetch some information from http://www.ebay.in.
My required fields are : Name of the product, Image, Price and the link
to that product.

am able to get the data using this method.
require ‘rubygems’
require ‘scrubyt’

google_data = Scrubyt::Extractor.define do
fetch ‘http://www.ebay.in
fill_textfield ‘satitle’, ‘ipod shuffle’
submit
record
“/html/body/div[2]/div[4]/div[2]/div/div/div[2]/div[2]/div/div/div[3]/div/div/table/tr”
do
name “/td[2]/div/a”
price “/td[5]”
image “/td/a/img” do
url “src”, :type => :attribute
end
link “/td[2]/div/a” do
url “href”, :type => :attribute
end
end
end

google_data.to_xml.write($stdout, 1)

but my problem is for some products its not working properly. (div may
be changed). is there any better solution for this?

Thanks in advance,
Vipin

You need to create smarter XPaths, relying on CSS id/class attributes
or other properties rather than a full XPath from the root - for
example:

require ‘rubygems’
require ‘scrubyt’

ebay_data = Scrubyt::Extractor.define do

  fetch 'http://www.ebay.in/'
  fill_textfield 'satitle', 'ipod'
  submit

  record "//table[@class='nol']" do
    name "//td[@class='details']/div/a"
  end

end

puts ebay_data.to_xml

etc.

This way your scraper will be more robust and prone to page changes.

HTH,
Peter


http://www.rubyrailways.com
http://scrubyt.org

On 2008.12.06., at 4:46, Vipin Vm wrote:

Hi Peter,

Thanks for the Help… its working fine :slight_smile:

Glad that I could help. I am just working on a new release btw, so
stay tuned!

Cheers,
Peter


http://www.rubyrailways.com
http://scrubyt.org

Hi Peter,

Thanks for the Help… its working fine :slight_smile:

Vipin

Peter S. wrote:

You need to create smarter XPaths, relying on CSS id/class attributes
or other properties rather than a full XPath from the root - for
example:

require ‘rubygems’
require ‘scrubyt’

ebay_data = Scrubyt::Extractor.define do

  fetch 'http://www.ebay.in/'
  fill_textfield 'satitle', 'ipod'
  submit

  record "//table[@class='nol']" do
    name "//td[@class='details']/div/a"
  end

end

puts ebay_data.to_xml

etc.

This way your scraper will be more robust and prone to page changes.

HTH,
Peter


http://www.rubyrailways.com
http://scrubyt.org