Ok guys, lets say i wanted to grab the source for google.com or
something… it wont allow if unless i send the correct headers to spoof
the program… Can anyone give me a working example of how to send
headers and download a webpage source?
I tried looking through all of the docs and coming up with something but
i failed…
With open-uri[0] you can open URIs just like local files. That would be
entirely sufficient to get the content of the index page of google.com,
for
example. Instead of a simple URL you can also pass the open call a
URI[1]
object, for which you can explicitly call headers if you need to.
You could then also also use Hpricot[2] to do all sorts of nifty HTML
parsing
LHH shows all HTTP chatter, so there’s nothing that a server can see
that you can’t. From there it’s just a matter of imitating the headers
with Net::HTTP.
Remember, though, that you have some vague sort of obligation to
maintain netiquette. If a server rejects automated requests, they may
have a good reason to, and you’re going against their wishes to mimic
a real browser. I doubt the Feds are going to come kicking your door
in over it, but it’s still worth trying to be respectful.
Google, for example, has an API that they encourage for automated
usage. Here are some details:
-rking
This forum is not affiliated to the Ruby language, Ruby on Rails framework, nor any Ruby applications discussed here.