Net/http performance question

bornboulder · November 7, 2006, 10:47am

I have the following code:

def fetch_into (uri, name)
http = Net::HTTP.new(uri.host, uri.port)
req = Net::HTTP::Get.new(uri.path)
req.basic_auth(USERNAME, PASSWORD)
start_time = Time.now.to_f
File.open(name, “w”) do |f|
print " - fetching #{name}"
http.request(req) do |result|
f.write(result.body)
f.close()
elapsed = Time.new.to_f - start_time
bps = (result.body.length / elapsed) / 1024
printf “, at %7.2f kbps\n”, bps
end
end
end

this is run in a very simple loop that doesn’t do anything that
requires much CPU. the files downloaded are about 10Mb and since the
connection is not that fast (about 15Mbit/sec) I would expect this to
consume little CPU, but in fact it gobbles up CPU. on a 2Ghz AMD it
eats 65% CPU on average (the job runs for hours on end).

where are the cycles going? I assumed it would be a somewhat
suboptimal way of doing it since there might be some buffer resizing
in there, but not that badly.

anyone care to shed some light on this?

(I would assume that there is a way of performing an http request in a
way where you can read chunks of the response body at a time?)

-Bjørn

bornboulder · November 7, 2006, 10:47am

On 10/31/06, Bjorn B. [email protected] wrote:

http.request(req) do |result|
requires much CPU. the files downloaded are about 10Mb and since the
(I would assume that there is a way of performing an http request in a
way where you can read chunks of the response body at a time?)

Hi,
there seems to be HTTPResponse#read_body, that can provide the chunks
as they come (not tested, copy&paste from docs:

using iterator

http.request_get(‘/index.html’) {|res|
res.read_body do |segment|
print segment
end
}

BTW, you could move the File.open later, saving f.close() call
try fiddling with GC - GC.disable when receiving might help or not.
don’t forget to enable it between requests.

so

def fetch_into (uri, name)
http = Net::HTTP.new(uri.host, uri.port)
req = Net::HTTP::Get.new(uri.path)
req.basic_auth(USERNAME, PASSWORD)
start_time = Time.now.to_f
print " - fetching #{name}"

GC.disable # optional

GC.enable

end

bornboulder · November 7, 2006, 10:49am

[“Jan S.” [email protected]]
|
| Hi,
| there seems to be HTTPResponse#read_body, that can provide the chunks
| as they come (not tested, copy&paste from docs:
|
| # using iterator
| http.request_get(‘/index.html’) {|res|
| res.read_body do |segment|
| print segment
| end
| }

thanks!

indeed, this helped a bit, but not too much. from the looks of it the
standard libraries seem to hard-code the read buffer size to 1024
(Ruby 1.8, net/protocol.rb) which results in at least twice the number
of system calls to read(2) for the same amount of data. I
experimentally upped the read buffer to 10k, and now it seems I get
buffer-fulls equivalent to the MTU over the interface the data is
readfrom.

when it is at 1024 bytes I consistently get one buffer of 1024 and the
next buffer is approximately MTU - 1024. the next is 1024 bytes again
etc.

even after modifying the hard-coded buffer size to 10k it still eats
obscene amounts of CPU for what it is doing. I would have expected
any reasonable implementation to eat at most 1% CPU (probably less)
for what is almost pure IO. (it now consumes about 35% CPU on a 2Ghz
AMD).

anyway, note to implementors: it might be an idea to pick a buffer
size larger than 1024 bytes if you are going to hard code it. at the
very least 4k or 8k would be more sensible. preferably it should be
configurable (but with a sensible default value) so the user can make
an informed decision to increase or decrease the size as needed.

-Bjørn