How to read HTML from a password-protected page?

Hello,

I have a method that gets the html off a page. I give it a word to
look up, and it returns the results. I just switched to read from a
password-protected page, and I can’t figure out how to send along the
password so I can get authorized to read the page. Anyone know how to
do this?

def getWordResults(word)
require ‘open-uri’
begin
@page = open(“http://joe.com/find?word=#{word}&lb=1&user=joe”).read
rescue # if page won’t open
nil
end
end

Joe P. wrote:

Hello,

I have a method that gets the html off a page. I give it a word to
look up, and it returns the results. I just switched to read from a
password-protected page, and I can’t figure out how to send along the
password so I can get authorized to read the page. Anyone know how to
do this?

def getWordResults(word)
require ‘open-uri’
begin
@page = open(“http://joe.com/find?word=#{word}&lb=1&user=joe”).read
rescue # if page won’t open
nil
end
end
If it helps at all, I can get to the page using a browser and manually
entering the user and password. Does anyone know how to do that
automatically?

If it is using http authentication, you can use the format of
http://user:[email protected]?restofurl


Chris

On Nov 14, 2007, at 12:45 PM, Chris E. wrote:

Hello,
def getWordResults(word)
entering the user and password. Does anyone know how to do that
automatically?

You can use httpclient for this kind of thing:

require ‘rubygems’
require ‘httpclient’

url = “http://joe.com/find?word=#{word}&lb=1&user=joe
client = HTTPClient.new
client.debug_dev = STDOUT if $DEBUG
client.set_auth(url, ‘joe’, ‘thePassw0rd’)
resp = client.get(url)
@page = resp.content

You might have to make the first arg to .set_auth be just the URL
without the query string.

Also, if the service doesn’t send back an authorization request (which
it sounds like it does since the browser asks you), then HTTPClient
won’t volunteer the credentials you’ve provided.

You can check things like if resp.status != 200

-Rob

Rob B. http://agileconsultingllc.com
[email protected]