Code duplication

Hi all,
The following is the code for extracting the html contents of a
website. I have included the code in case url redirect and BadRequest
error.

#getting the HTTP response from ‘uri’
response = Net::HTTP.get_response(uri)
case response
# if the url is redirecting then fetch the contents of the
redirected url
when Net::HTTPRedirection then uri = URI.parse(response[‘Location’])
response =
Net::HTTP.get_response(uri)

in case of a bad request error

when Net::HTTPBadRequest then http = Net::HTTP.start(uri.host,
uri.port)
#getting the html data by setting the path as ‘/’ and using a user
agent
response = http.get("/", “User-Agent”=>“Mozilla/4.0 (compatible; MSIE
5.5; Windows NT 5.0)”)
end

data = response.body

My tutor is saying that there is a duplication in the above code. ie.
code for html reading is specified twice without any purpose and it
should be removed. I’ve no idea where there is a mistake. I’m a newbee
to ruby and i don’t understand the problem correctly or where things
went wrong. Can anyone please help me to find the mistake.

Thanks in advance.

Regards
Arun

Arun K. wrote:

Hi all,
The following is the code for extracting the html contents of a
website. I have included the code in case url redirect and BadRequest
error.

#getting the HTTP response from ‘uri’
response = Net::HTTP.get_response(uri)
case response
# if the url is redirecting then fetch the contents of the
redirected url
when Net::HTTPRedirection then uri = URI.parse(response[‘Location’])
response =
Net::HTTP.get_response(uri)

in case of a bad request error

when Net::HTTPBadRequest then http = Net::HTTP.start(uri.host,
uri.port)
#getting the html data by setting the path as ‘/’ and using a user
agent
response = http.get("/", “User-Agent”=>“Mozilla/4.0 (compatible; MSIE
5.5; Windows NT 5.0)”)
end

data = response.body

My tutor is saying that there is a duplication in the above code. ie.
code for html reading is specified twice without any purpose and it
should be removed. I’ve no idea where there is a mistake. I’m a newbee
to ruby and i don’t understand the problem correctly or where things
went wrong. Can anyone please help me to find the mistake.

Thanks in advance.

Regards
Arun

What is the use of this below statement ?
response = http.get("/", “User-Agent”=>“Mozilla/4.0
(compatible; MSIE
5.5; Windows NT 5.0)”)

Since you had already got the response object using get_response, then
why it is needed?

Loga G. wrote:

Arun K. wrote:

Hi all,
The following is the code for extracting the html contents of a
website. I have included the code in case url redirect and BadRequest
error.

#getting the HTTP response from ‘uri’
response = Net::HTTP.get_response(uri)
case response
# if the url is redirecting then fetch the contents of the
redirected url
when Net::HTTPRedirection then uri = URI.parse(response[‘Location’])
response =
Net::HTTP.get_response(uri)

in case of a bad request error

when Net::HTTPBadRequest then http = Net::HTTP.start(uri.host,
uri.port)
#getting the html data by setting the path as ‘/’ and using a user
agent
response = http.get(“/”, “User-Agent”=>“Mozilla/4.0 (compatible; MSIE
5.5; Windows NT 5.0)”)
end

data = response.body

My tutor is saying that there is a duplication in the above code. ie.
code for html reading is specified twice without any purpose and it
should be removed. I’ve no idea where there is a mistake. I’m a newbee
to ruby and i don’t understand the problem correctly or where things
went wrong. Can anyone please help me to find the mistake.

Thanks in advance.

Regards
Arun

What is the use of this below statement ?
response = http.get(“/”, “User-Agent”=>“Mozilla/4.0
(compatible; MSIE
5.5; Windows NT 5.0)”)

Since you had already got the response object using get_response, then
why it is needed?

Hi,
Thanks for the reply. If it is a bad request error, then I have to
communicate to the port and host and then I’ve to fetch the data. For
eg. if i try to fetch html contents from youtube.com, i get a bad
request error. So I used the Net::HTTP.start() and then I used the path
and user agent to retreive the contents and stored it in response. I
dont think that there is any other way. If I remove that part, I’m not
able to read the html.

Thanks
Arun

On 6 Apr 2009, at 14:50, Arun K. wrote:

My tutor is saying that there is a duplication in the above code. ie.
code for html reading is specified twice without any purpose and it
should be removed. I’ve no idea where there is a mistake. I’m a newbee
to ruby and i don’t understand the problem correctly or where things
went wrong. Can anyone please help me to find the mistake.

There are probably better solutions, but the following illustrates the
point your tutor is making:

MOZILLA_HEADER = { “User-Agent”=>“Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT 5.0)” }

def get_http_response uri, max_redirects = 0
Net::HTTP.start(uri) do |connection|
response = connection.get(uri.path, MOZILLA_HEADER)
response &&= case response
when Net::HTTPRedirection
if max_redirects > 0 then
get_http_response URI.parse(response[‘Location’]),
(max_redirects - 1)
else
raise “Too many redirects”
end
when Net::HTTPRedirection
get_http_response URI.parse(“http://#{uri.host}:#{uri.port}/”),
max_redirects
end
end
end

data = get_http_response(my_uri, 3).body

See how get_http_response is recursive in the case of an erroneous
response? This minimises the actual HTTP interaction code as well as
elegantly handling redirects. Whilst this could result in many more
http connections being used, it also makes them clear up after
themselves which is always good.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason

There are probably better solutions, but the following illustrates the
point your tutor is making:

MOZILLA_HEADER = { “User-Agent”=>“Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT 5.0)” }

def get_http_response uri, max_redirects = 0
Net::HTTP.start(uri) do |connection|
response = connection.get(uri.path, MOZILLA_HEADER)
response &&= case response
when Net::HTTPRedirection
if max_redirects > 0 then
get_http_response URI.parse(response[‘Location’]),
(max_redirects - 1)
else
raise “Too many redirects”
end
when Net::HTTPRedirection
get_http_response URI.parse(“http://#{uri.host}:#{uri.port}/”),
max_redirects
end
end
end

data = get_http_response(my_uri, 3).body

Thanks Ellie,You gave me a clue of not only solving the code
duplication but also about handling the redirects. Thanks a lot.
Regards
Arun

Eleanor McHugh wrote:

On 7 Apr 2009, at 05:31, Arun K. wrote:

Thanks Ellie,You gave me a clue of not only solving the code
duplication but also about handling the redirects. Thanks a lot.

My pleasure :slight_smile:

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason

Hi Ellie,
I once again thank you for your reply. It helped me a lot. Now I want to
share some doubt with you.

  1. How can i specify the redirect limit without declaring it inside a
    method. Is it possible?
  2. By including a redirect limit, will I be able to make the code for
    url redirection the most effective one or should i include some aditions
    to the code to handle redirection effectively?

Thanks
Arun

On 7 Apr 2009, at 12:18, Arun K. wrote:

Hi Ellie,
I once again thank you for your reply. It helped me a lot. Now I
want to
share some doubt with you.

  1. How can i specify the redirect limit without declaring it inside a
    method. Is it possible?

The redirect limit isn’t declared inside the method but as one of the
parameters of the method, which is why it allows recursive execution
as each redirect is received. You’ll note that I provided an initial
value as part of the initial functional call:

data = get_http_response(my_uri, 3).body

but in a real-world program you either specify a constant and use that:

MAXIMUM_REDIRECTS = 3
data = get_http_response(MAXIMUM_REDIRECTS, 3).body

or else wrap everything together into an object where this value would
be either an instance or class variable depending on your intent.

  1. By including a redirect limit, will I be able to make the code for
    url redirection the most effective one or should i include some
    aditions
    to the code to handle redirection effectively?

I can’t really answer that question without knowing more about the
real-world problem you’re trying to solve. However in general I’d say
that whenever you have a recursive problem like this it’s sensible to
ensure that it’s throttled to prevent resource exhaustion. For a very
graphic example of why this is important - especially with network
applications - read up on the Morris Worm :slight_smile:

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason

On 7 Apr 2009, at 05:31, Arun K. wrote:

Thanks Ellie,You gave me a clue of not only solving the code
duplication but also about handling the redirects. Thanks a lot.

My pleasure :slight_smile:

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason