Forum: Ruby Code duplication

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Arun K. (Guest)
on 2009-04-06 17:51
Hi all,
       The following is the code for extracting the html contents of a
website. I have included the code in case url redirect and BadRequest
error.

#getting the HTTP response from 'uri'
response = Net::HTTP.get_response(uri)
case response
    # if the url is redirecting then fetch the contents of the
redirected url
    when Net::HTTPRedirection then uri = URI.parse(response['Location'])
                                   response =
Net::HTTP.get_response(uri)
  # in case of a bad request error
  when Net::HTTPBadRequest then  http = Net::HTTP.start(uri.host,
uri.port)
  #getting the html data by setting the path as '/' and using a user
agent
  response = http.get("/", "User-Agent"=>"Mozilla/4.0 (compatible; MSIE
5.5; Windows NT 5.0)")
end

data = response.body

My tutor is saying that there is a duplication in the above code. ie.
code for html reading is specified twice without any purpose and it
should be removed. I've no idea where there is a mistake. I'm a newbee
to ruby and i don't understand the problem correctly or where things
went wrong. Can anyone please help me to find the mistake.

Thanks in advance.

Regards
Arun
Loga G. (Guest)
on 2009-04-06 18:28
Arun K. wrote:
> Hi all,
>        The following is the code for extracting the html contents of a
> website. I have included the code in case url redirect and BadRequest
> error.
>
> #getting the HTTP response from 'uri'
> response = Net::HTTP.get_response(uri)
> case response
>     # if the url is redirecting then fetch the contents of the
> redirected url
>     when Net::HTTPRedirection then uri = URI.parse(response['Location'])
>                                    response =
> Net::HTTP.get_response(uri)
>   # in case of a bad request error
>   when Net::HTTPBadRequest then  http = Net::HTTP.start(uri.host,
> uri.port)
>   #getting the html data by setting the path as '/' and using a user
> agent
>   response = http.get("/", "User-Agent"=>"Mozilla/4.0 (compatible; MSIE
> 5.5; Windows NT 5.0)")
> end
>
> data = response.body
>
> My tutor is saying that there is a duplication in the above code. ie.
> code for html reading is specified twice without any purpose and it
> should be removed. I've no idea where there is a mistake. I'm a newbee
> to ruby and i don't understand the problem correctly or where things
> went wrong. Can anyone please help me to find the mistake.
>
> Thanks in advance.
>
> Regards
> Arun

What is the use of this below statement ?
         response = http.get("/", "User-Agent"=>"Mozilla/4.0
(compatible; MSIE
5.5; Windows NT 5.0)")

Since you had already got the response object using get_response, then
why it is needed?
Arun K. (Guest)
on 2009-04-06 18:39
Loga G. wrote:
> Arun K. wrote:
>> Hi all,
>>        The following is the code for extracting the html contents of a
>> website. I have included the code in case url redirect and BadRequest
>> error.
>>
>> #getting the HTTP response from 'uri'
>> response = Net::HTTP.get_response(uri)
>> case response
>>     # if the url is redirecting then fetch the contents of the
>> redirected url
>>     when Net::HTTPRedirection then uri = URI.parse(response['Location'])
>>                                    response =
>> Net::HTTP.get_response(uri)
>>   # in case of a bad request error
>>   when Net::HTTPBadRequest then  http = Net::HTTP.start(uri.host,
>> uri.port)
>>   #getting the html data by setting the path as '/' and using a user
>> agent
>>   response = http.get("/", "User-Agent"=>"Mozilla/4.0 (compatible; MSIE
>> 5.5; Windows NT 5.0)")
>> end
>>
>> data = response.body
>>
>> My tutor is saying that there is a duplication in the above code. ie.
>> code for html reading is specified twice without any purpose and it
>> should be removed. I've no idea where there is a mistake. I'm a newbee
>> to ruby and i don't understand the problem correctly or where things
>> went wrong. Can anyone please help me to find the mistake.
>>
>> Thanks in advance.
>>
>> Regards
>> Arun
>
> What is the use of this below statement ?
>          response = http.get("/", "User-Agent"=>"Mozilla/4.0
> (compatible; MSIE
> 5.5; Windows NT 5.0)")
>
> Since you had already got the response object using get_response, then
> why it is needed?

Hi,
Thanks for the reply. If it is a bad request error, then I have to
communicate to the port and host and then I've to fetch the data. For
eg. if i try to fetch html contents from youtube.com, i get a bad
request error. So I used the Net::HTTP.start() and then I used the path
and user agent to retreive the contents and stored it in response. I
dont think that there is any other way. If I remove that part, I'm not
able to read the html.

Thanks
Arun
Eleanor McHugh (Guest)
on 2009-04-06 19:21
(Received via mailing list)
On 6 Apr 2009, at 14:50, Arun K. wrote:
> My tutor is saying that there is a duplication in the above code. ie.
> code for html reading is specified twice without any purpose and it
> should be removed. I've no idea where there is a mistake. I'm a newbee
> to ruby and i don't understand the problem correctly or where things
> went wrong. Can anyone please help me to find the mistake.


There are probably better solutions, but the following illustrates the
point your tutor is making:

MOZILLA_HEADER = { "User-Agent"=>"Mozilla/4.0 (compatible; MSIE 5.5;
Windows NT 5.0)" }

def get_http_response uri, max_redirects = 0
   Net::HTTP.start(uri) do |connection|
     response = connection.get(uri.path, MOZILLA_HEADER)
     response &&= case response
     when Net::HTTPRedirection
       if max_redirects > 0 then
         get_http_response URI.parse(response['Location']),
(max_redirects - 1)
       else
         raise "Too many redirects"
       end
     when Net::HTTPRedirection
       get_http_response URI.parse("http://#{uri.host}:#{uri.port}/"),
max_redirects
     end
   end
end

data = get_http_response(my_uri, 3).body

See how get_http_response is recursive in the case of an erroneous
response? This minimises the actual HTTP interaction code as well as
elegantly handling redirects. Whilst this could result in many more
http connections being used, it also makes them clear up after
themselves which is always good.


Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net
----
raise ArgumentError unless @reality.responds_to? :reason
Arun K. (Guest)
on 2009-04-07 08:32
> There are probably better solutions, but the following illustrates the
> point your tutor is making:
>
> MOZILLA_HEADER = { "User-Agent"=>"Mozilla/4.0 (compatible; MSIE 5.5;
> Windows NT 5.0)" }
>
> def get_http_response uri, max_redirects = 0
>    Net::HTTP.start(uri) do |connection|
>      response = connection.get(uri.path, MOZILLA_HEADER)
>      response &&= case response
>      when Net::HTTPRedirection
>        if max_redirects > 0 then
>          get_http_response URI.parse(response['Location']),
> (max_redirects - 1)
>        else
>          raise "Too many redirects"
>        end
>      when Net::HTTPRedirection
>        get_http_response URI.parse("http://#{uri.host}:#{uri.port}/"),
> max_redirects
>      end
>    end
> end
>
> data = get_http_response(my_uri, 3).body


Thanks Ellie,You gave me a clue of not only solving the code
duplication but also about handling the redirects. Thanks a lot.
Regards
Arun
Eleanor McHugh (Guest)
on 2009-04-07 15:07
(Received via mailing list)
On 7 Apr 2009, at 05:31, Arun K. wrote:
> Thanks Ellie,You gave me a clue of not only solving the code
> duplication but also about handling the redirects. Thanks a lot.

My pleasure :)


Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net
----
raise ArgumentError unless @reality.responds_to? :reason
Arun K. (Guest)
on 2009-04-07 15:19
Eleanor McHugh wrote:
> On 7 Apr 2009, at 05:31, Arun K. wrote:
>> Thanks Ellie,You gave me a clue of not only solving the code
>> duplication but also about handling the redirects. Thanks a lot.
>
> My pleasure :)
>
>
> Ellie
>
> Eleanor McHugh
> Games With Brains
> http://slides.games-with-brains.net
> ----
> raise ArgumentError unless @reality.responds_to? :reason

Hi Ellie,
I once again thank you for your reply. It helped me a lot. Now I want to
share some doubt with you.
1) How can i specify the redirect limit without declaring it inside a
method. Is it possible?
2) By including a redirect limit, will I be able to make the code for
url redirection the most effective one or should i include some aditions
to the code to handle redirection effectively?

Thanks
Arun
Eleanor McHugh (Guest)
on 2009-04-07 16:32
(Received via mailing list)
On 7 Apr 2009, at 12:18, Arun K. wrote:
> Hi Ellie,
> I once again thank you for your reply. It helped me a lot. Now I
> want to
> share some doubt with you.
> 1) How can i specify the redirect limit without declaring it inside a
> method. Is it possible?

The redirect limit isn't declared inside the method but as one of the
parameters of the method, which is why it allows recursive execution
as each redirect is received. You'll note that I provided an initial
value as part of the initial functional call:

  data = get_http_response(my_uri, 3).body

but in a real-world program you either specify a constant and use that:

  MAXIMUM_REDIRECTS = 3
  data = get_http_response(MAXIMUM_REDIRECTS, 3).body

or else wrap everything together into an object where this value would
be either an instance or class variable depending on your intent.

> 2) By including a redirect limit, will I be able to make the code for
> url redirection the most effective one or should i include some
> aditions
> to the code to handle redirection effectively?

I can't really answer that question without knowing more about the
real-world problem you're trying to solve. However in general I'd say
that whenever you have a recursive problem like this it's sensible to
ensure that it's throttled to prevent resource exhaustion. For a very
graphic example of why this is important - especially with network
applications - read up on the Morris Worm :)


Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net
----
raise ArgumentError unless @reality.responds_to? :reason
This topic is locked and can not be replied to.