Forum: Ruby What's the Best Way to Mimic an HTTP Request?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Daniel M. (Guest)
on 2008-11-05 18:26
I'm trying to write a tool that will take a domain as an argument and
make a request to http://onsamehost.com and then capture the list of
domains that share that same IP. I want to parse out those IPs and put
them into an array that I can print to a file later.

Here's the code I'm trying to use:

--
require 'net/http'
require 'uri'

PATH = '/query.jsp'
USERAGENT = 'Opera'
HOST = 'onsamehost.com'

@http = Net::HTTP.new(HOST, 80)

resp, data = @http.get2(PATH, {'User-Agent' => USERAGENT})

puts resp
puts data
--

The problem is that I keep getting a redirect
(#<Net::HTTPMovedPermanently:0xb7c35ffc>), which doesn't happen when I
make the request from a regular browser.

So I sniffed the regular request with wireshark, and a browser sends a
bunch of additional headers when it makes the request. Cookies,
referrer, etc.

Are any of these headers more necessary than others, and is there a
preferred way to send the headers using Ruby?

Thanks for any thoughts...
James H. (Guest)
on 2008-11-05 18:32
(Received via mailing list)
Is there a Ruby front end for Curl?

James
Hassan S. (Guest)
on 2008-11-05 18:50
(Received via mailing list)
On Wed, Nov 5, 2008 at 8:24 AM, Daniel M. <removed_email_address@domain.invalid>
wrote:

> The problem is that I keep getting a redirect
> (#<Net::HTTPMovedPermanently:0xb7c35ffc>), which doesn't happen when I
> make the request from a regular browser.

Actually, it does -- you just don't see it.

When you request  e.g. `http::/example.com` most servers will send
a redirect to the default page, e.g. `http://example.com/index.html`.

You need to either handle it or pass the default page's full URL.

HTH,
Michael L. (Guest)
on 2008-11-05 18:59
(Received via mailing list)
On Wed, Nov 5, 2008 at 10:24 AM, Daniel M. 
<removed_email_address@domain.invalid>
wrote:

> The problem is that I keep getting a redirect
> (#<Net::HTTPMovedPermanently:0xb7c35ffc>), which doesn't happen when I
> make the request from a regular browser.

That site makes heavy use of redirects. Watch closely while running
queries or check your browser history.

> So I sniffed the regular request with wireshark, and a browser sends a
> bunch of additional headers when it makes the request. Cookies,
> referrer, etc.
>
> Are any of these headers more necessary than others, and is there a
> preferred way to send the headers using Ruby?

Headers probably have no effect here.

What you probably want is code like this:

    require 'net/http'
    require 'uri'

    def fetch(uri_str, limit = 10)
      # You should choose better exception.
      raise ArgumentError, 'HTTP redirect too deep' if limit == 0

      response = Net::HTTP.get_response(URI.parse(uri_str))
      case response
      when Net::HTTPSuccess     then response
      when Net::HTTPRedirection then fetch(response['location'], limit -
1)
      else
        response.error!
      end
    end

    resp = fetch('http://www.ruby-lang.org')
    puts resp.body

(from http://ruby-doc.org/stdlib/libdoc/net/http/rdoc/index.html --
"Following Redirection")

regards,
Michael L.
Daniel M. (Guest)
on 2008-11-05 19:09
Thanks, much, Michael. Unfortunately I'm not quite tracking on why that
was necessary. It just seems a bit elaborate given what I thought was a
simple problem.

But I totally appreciate it...I just wish it were something simpler.
Michael L. (Guest)
on 2008-11-05 19:57
(Received via mailing list)
On Wed, Nov 5, 2008 at 11:08 AM, Daniel M. 
<removed_email_address@domain.invalid>
wrote:
> Thanks, much, Michael. Unfortunately I'm not quite tracking on why that
> was necessary. It just seems a bit elaborate given what I thought was a
> simple problem.
>
> But I totally appreciate it...I just wish it were something simpler.

The site you're hitting makes heavy use of redirects (and not really
for their intended purpose). What this means is that you submit your
request for a given URL and the server responds with a redirect and a
new URL. If you are working in a browser, your browser automatically
requests that URL, and the server again responds with a redirect and a
new URL. Again, a web browser handles requesting that next URL
automatically. This URL is the actual results page with the data you
want. It's the web site making you jump through hoops to get where you
want to go.

Net::HTTP does not have a built in facility for following redirects
the way your browser does. So you have to write code to follow
redirects by submitting new requests until you get to one that is not
a redirect, which is what the fetch() method from the Net::HTTP
example does.

-Michael
Daniel M. (Guest)
on 2008-11-05 20:03
Michael L. wrote:

> The site you're hitting makes heavy use of redirects (and not really
> for their intended purpose). What this means is that you submit your
> request for a given URL and the server responds with a redirect and a
> new URL. If you are working in a browser, your browser automatically
> requests that URL, and the server again responds with a redirect and a
> new URL. Again, a web browser handles requesting that next URL
> automatically. This URL is the actual results page with the data you
> want. It's the web site making you jump through hoops to get where you
> want to go.

Ah, I see.

You appear, by my estimation, to rock.

: Daniel :
Avdi G. (Guest)
on 2008-11-05 20:03
(Received via mailing list)
You may want to look into using Mechanize rather than straight-up
Net::HTTP.

--
Avdi

Home: http://avdi.org
Developer Blog: http://avdi.org/devblog/
Twitter: http://twitter.com/avdi
Journal: http://avdi.livejournal.com
Daniel M. (Guest)
on 2008-11-05 20:13
Avdi G. wrote:
> You may want to look into using Mechanize rather than straight-up
> Net::HTTP.

Mechanize for Ruby? Interesting. I didn't know Ruby had an
implementation. Thanks, Avdi.
Uwe P. (Guest)
on 2008-11-21 11:30
Daniel M. wrote:
> The problem is that I keep getting a redirect
> (#<Net::HTTPMovedPermanently:0xb7c35ffc>), which doesn't happen when I
> make the request from a regular browser.
>
> So I sniffed the regular request with wireshark, and a browser sends a
> bunch of additional headers when it makes the request. Cookies,
> referrer, etc.
>
> Are any of these headers more necessary than others, and is there a
> preferred way to send the headers using Ruby?
>

We have had similar issues where we didn't see a redirect when sniffing
the browser but it happened for our code. The reason was HTTP/1.1.
With HTTP/1.1 it is required to specify the host you expect to be
talking with (as more than one virtual host may be serviced by one
server):
  GET / HTTP/1.1
  Host: www.apache.org
(see http://www.apacheweek.com/features/http11 for reference)

Hope that helps in avoiding the redirect ;-)

Uwe
This topic is locked and can not be replied to.