Forum: Ruby Getting Response from HTTPS POST

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
333103e4407bb5c415e5b5e9ade71f20?d=identicon&s=25 Matt White (Guest)
on 2007-05-31 19:30
(Received via mailing list)
Hello,

I am writing a crawler to parse webpages. One site that I am crawling
requires me to log in, so I use an HTTPS POST to log in. However, once
I send the POST I can't get anywhere because I have to have a valid
session id in the URL. If I log in using FireFox, the session id is
appended to the URL for every page that I visit (something like
http://blah.com/page?sessid=5438729057). How can I get this session ID
so that I can append it to my URLs and crawl the page? They used to
send the session id in a cookie but they no longer use cookies (you
will see the attempt to get the cookie still in this code). Here is
what I have:

    require 'net/https'
    require 'uri'

      url = '<appropriate URL here>'
      uri = URI.parse(url)
      http = Net::HTTP.new(uri.host, uri.port)
      http.use_ssl = uri.scheme == 'https'
      http.verify_mode = OpenSSL::SSL::VERIFY_NONE

      response = self.get_data(http, uri, headers)
      page = response.body

      #grab hidden field from the page
      view_state = CGI::escape(page[/<input type="hidden"
name="__VIEWSTATE" value="([^"]*)"/, 1])
      post_data = '<post data here>'

      login_response,data = http.post('<appropriate path here>',
post_data, headers)

      cookie = nil
      location = nil
      login_response.each_header do |name, value|
        cookie = value[0, value.index(';')] if name == 'set-cookie'
        location = value if name == 'location'
      end

      headers['Cookie'] = cookie

      if location
        homepage = get_data(http, URI.parse('<appropriate URI
here>'+location), headers).body
      else
        homepage = get_data(http, URI.parse('<default URI here>'),
headers).body
      end

      start_with_homepage(homepage, http, headers)

Thanks,
Matt
This topic is locked and can not be replied to.