Forum: Ruby Data extraction using Scrubyt

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
30e4c9142032f31df2fe0d20672a4c2e?d=identicon&s=25 Vipin Vm (vmvipin)
on 2008-12-05 08:08
Hi All,

I need to fetch some information from http://www.ebay.in.
My required fields are : Name of the product, Image, Price and the link
to that product.

am able to get the data using this method.
require 'rubygems'
require 'scrubyt'

google_data = Scrubyt::Extractor.define do
  fetch          'http://www.ebay.in'
  fill_textfield 'satitle', 'ipod shuffle'
  submit
  record
"/html/body/div[2]/div[4]/div[2]/div/div/div[2]/div[2]/div/div/div[3]/div/div/table/tr"
do
    name "/td[2]/div/a"
    price "/td[5]"
    image "/td/a/img" do
        url "src", :type => :attribute
    end
    link "/td[2]/div/a" do
        url "href", :type => :attribute
    end
  end
end

google_data.to_xml.write($stdout, 1)

but my problem is for some products its not working properly. (div may
be changed). is there any better solution for this?

Thanks in advance,
Vipin
F50f5d582d76f98686da34917531fe56?d=identicon&s=25 Peter Szinek (Guest)
on 2008-12-05 10:13
(Received via mailing list)
You need to create smarter XPaths, relying on CSS id/class attributes
or other properties rather than a full XPath from the root - for
example:

require 'rubygems'
require 'scrubyt'

ebay_data = Scrubyt::Extractor.define  do

      fetch 'http://www.ebay.in/'
      fill_textfield 'satitle', 'ipod'
      submit

      record "//table[@class='nol']" do
        name "//td[@class='details']/div/a"
      end
end

puts ebay_data.to_xml

etc.

This way your scraper will be more robust and prone to page changes.

HTH,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
30e4c9142032f31df2fe0d20672a4c2e?d=identicon&s=25 Vipin Vm (vmvipin)
on 2008-12-06 04:52
Hi Peter,

Thanks for the Help... its working fine :)

Vipin

Peter Szinek wrote:
> You need to create smarter XPaths, relying on CSS id/class attributes
> or other properties rather than a full XPath from the root - for
> example:
>
> require 'rubygems'
> require 'scrubyt'
>
> ebay_data = Scrubyt::Extractor.define  do
>
>       fetch 'http://www.ebay.in/'
>       fill_textfield 'satitle', 'ipod'
>       submit
>
>       record "//table[@class='nol']" do
>         name "//td[@class='details']/div/a"
>       end
> end
>
> puts ebay_data.to_xml
>
> etc.
>
> This way your scraper will be more robust and prone to page changes.
>
> HTH,
> Peter
> ___
> http://www.rubyrailways.com
> http://scrubyt.org
F50f5d582d76f98686da34917531fe56?d=identicon&s=25 Peter Szinek (Guest)
on 2008-12-06 08:31
(Received via mailing list)
On 2008.12.06., at 4:46, Vipin Vm wrote:

> Hi Peter,
>
> Thanks for the Help... its working fine :)

Glad that I could help. I am just working on a new release btw, so
stay tuned!

Cheers,
Peter
___
http://www.rubyrailways.com
http://scrubyt.org
This topic is locked and can not be replied to.