Open-uri speed

I just discovered this. Lots of people already know, I’m sure, but
maybe some don’t.

net/http ran in: 128.734298 seconds.

open-uri ran in: 268.869359 seconds.

require “net/http”
require “open-uri”

def timing(name)
start_time = Time.now
yield
end_time = Time.now
puts “#{name} ran in: #{end_time - start_time} seconds.”
end

n = 1_000

timing(“net/http”) {Net::HTTP.start(“www.pythonchallenge.com”) do |http|
n.times do |i|
r = http.get(“/pc/def/linkedlist.php”)
end
end}

timing(“open-uri”) {n.times do |i|
open(“follow the chain”) do |x|
x.read
end
end}

– Elliot T.

2006/7/4, Elliot T. [email protected]:

I just discovered this. Lots of people already know, I’m sure, but
maybe some don’t.

net/http ran in: 128.734298 seconds.

open-uri ran in: 268.869359 seconds.

Interesting. Are you sure it’s not caused by server side effects? I
mean, the URL you are retrieving seems to come from a PHP script - and
that can do anything it wants to waste time, including parsing client
identifier etc. I would have preferred to test with a static HTML
page.

Btw, something you might not know: there is module Benchmark that can
be easily used for these kinds of things. It also does nice printing
and differentiation between kernel and user times, automatic ramp up
if needed etc. :slight_smile:

Kind regards

robert

On Jul 4, 2006, at 12:51 AM, Elliot T. wrote:

http|

– Elliot T.
Curiosity Blog – Elliot Temple

Well of course the second one takes longer. It parses the URI and
then does the same thing as the first. There’s at least one more
method call involved with the second every time through the the loop.

Elliot T. wrote:

n.times do |i|
r = http.get(“/pc/def/linkedlist.php”)
end
end}

Doesn’t this version open the connection only once and then reuses it
for all the requests?

timing(“open-uri”) {n.times do |i|
open(“follow the chain”) do |x|
x.read
end
end}

This, I’m sure, doesn’t reuse the connection.

On Jul 3, 2006, at 11:50 PM, Logan C. wrote:

Well of course the second one takes longer. It parses the URI and
then does the same thing as the first. There’s at least one more
method call involved with the second every time through the the loop.

Method call overhead doesn’t account for 2 minutes when N was only
set to 1000. At 100,000 times, URI parsing only takes a few seconds.

parse curi.us ran in: 4.434573 seconds.

parse follow the chain ran in:

8.940227 seconds.

timing(‘parse curi.us’){100_000.times do
URI.parse(“curi.us”)
end}
timing(‘parse follow the chain’)
{100_000.times do
URI.parse(“follow the chain”)
end}

– Elliot T.

On Jul 4, 2006, at 3:16 PM, Elliot T. wrote:

Method call overhead doesn’t account for 2 minutes when N was only
set to 1000. At 100,000 times, URI parsing only takes a few seconds.

Yeah but the first one uses a single object. The second one creates a
new object everytime

On Jul 4, 2006, at 2:26 AM, Carlos wrote:

end_time = Time.now
Doesn’t this version open the connection only once and then reuses
it for all the requests?

I don’t know. But I started testing with n=1 and an outer loop, so
the Net::HTTP.start is repeated, and net/http is still winning (by
more than URI parsing accounts for).

I will run some longer tests later to get more accurate data (and
intermix doing it each way to minimise the effect of random net
traffic fluctuations). I will use a static page as Robert suggested.

– Elliot T.

On Jul 3, 2006, at 9:51 PM, Elliot T. wrote:

I just discovered this. Lots of people already know, I’m sure, but
maybe some don’t.

Your benchmark is not very illustrative of the problem, try this one:

$ cat timing.rb
require ‘net/http’
require ‘open-uri’

def timing(name, n)
start_time = Time.now
n.times do yield end
end_time = Time.now
puts “#{name} ran in: #{end_time - start_time} seconds.”
end

def test(uri, n)
timing ‘raw socket’, n do
s = TCPSocket.open uri.host, uri.port
s.write “GET #{uri.request_uri} HTTP/1.0\r\nHost: #{uri.host}\r\n
\r\n”
s.read.split(“\r\n\r\n”, 2).last
s.close
end

Net::HTTP.start uri.host do |http|
timing ‘net/http cheat’, n do
r = http.get uri.request_uri
end
end

timing ‘net/http’, n do
Net::HTTP.start uri.host do |http|
r = http.get uri.request_uri
end
end

timing ‘open-uri’, n do
uri.open do |x|
x.read
end
end
end

n = 100

uri = URI.parse ‘http://localhost/manual/

p uri.read.length

test uri, n

uri = URI.parse ‘http://localhost/manual/mod/mod_rewrite.html

p uri.read.length

test uri, n

$ ruby timing.rb
9187
raw socket ran in: 1.184571 seconds.
net/http cheat ran in: 1.809506 seconds.
net/http ran in: 2.137558 seconds.
open-uri ran in: 2.606976 seconds.
87071
raw socket ran in: 1.729406 seconds.
net/http cheat ran in: 7.434297 seconds.
net/http ran in: 7.740268 seconds.
open-uri ran in: 13.605024 seconds.

You shouldn’t cheat and have Net::HTTP reuse its connection. (It
seems socket setup/teardown costs 3ms over loopback on my machine.)

open-uri and Net::HTTP’s performance both degrade significantly on
larger files. I believe this is due to their implementation, they
both read into a buffer rather than fetching the entire response.
open-uri buffers differently and provides progress callbacks which is
probably the reason it performs worse the larger the file.

I tend to use open-uri because it has a simpler API. I don’t have to
worry about handling redirects because it all gets taken care of for me.


Eric H. - [email protected] - http://blog.segment7.net
This implementation is HODEL-HASH-9600 compliant

http://trackmap.robotcoop.com