Scraping amazon pages with nokogiri

addis_a · September 1, 2014, 2:05pm

ruby latest
nokogiri 1.6.3.1

I am trying to get title and price from Amazon product pages like,

require ‘open-uri’
require ‘nokogiri’

url =
‘Amazon.co.jp’

charset = nil
html = open(url) do |f|
charset = f.charset
f.read
end

doc = Nokogiri::HTML.parse(html, nil, charset)

p doc.title

the result is

…1.0/open-uri.rb:353:in `open_http’: 503 Service Unavailable
(OpenURI::HTTPError)

Does that mean amazon is refusing the scraping act?

Is there any way to do it?

soujiro0725 · September 1, 2014, 11:17pm

Soichi I. wrote in post #1156501:

Does that mean amazon is refusing the scraping act?

When I run your code, I get:

“Amazon.co.jp： Advanced EXCEL: David Bolocan: 洋書”

0.5.4 503 Service Unavailable

The server is currently unable to handle the request due to a temporary
overloading or maintenance of the server. The implication is that this
is a temporary condition which will be alleviated after some delay. If
known, the length of the delay MAY be indicated in a Retry-After header.
If no Retry-After is given, the client SHOULD handle the response as it
would for a 500 response