Net/http performance

hagus · July 15, 2006, 9:44am

(notes: I posted this to comp.lang.ruby, figuring it would filter
through to appropriate mailing lists, but it did not).

It seems that net/http’s implementation is extremely inefficient when
it comes to dealing with large files.

I think this is something worth fixing in subsequent versions. It
shouldn’t be as bad as it is. I would also appreciate any hints or
advice on working around the problem.

Specifically, I am interested in HTTP GETs (from net/http) and HTTP PUTs
(both on the net/http side and WEBrick receiving side) that have
adequate streaming performance. I would like to GET and PUT fairly large
files, and don’t want to pay such a large network and CPU performance
overhead.

Below I have attached a test suite that illustrates the problem. I used
WEBrick as the server.

“Host: localhost, port: 12000, request_uri: /ten-meg.bin”
user system total real
TCPSocket 0.030000 0.150000 0.180000 ( 0.468867)
net/http 10.620000 8.630000 19.250000 ( 21.787785)
LB net/http 10.870000 8.900000 19.770000 ( 22.259448)
open-uri 16.400000 11.900000 28.300000 ( 39.834555)

As you can see, a raw TCPSocket is orders of magnitude faster than
net/http and friends. However, I’m using read_body and receiving the
data in chunks, and I would have expected much better performance as a
result. We’re talking 20MB/s for TCPSocket versus 400KB/s for net/http.

What’s happening here? What can I do to fix it?

Any help appreciated.

Regards,

Luke.

#!/usr/bin/ruby

require ‘net/http’
require ‘open-uri’
require ‘benchmark’
require ‘WEBrick’
include WEBrick

uri = URI.parse(“http://localhost:12000/ten-meg.bin”)
sourceFolder = “/tmp/”

Kernel.system(“dd if=/dev/random of=/tmp/ten-meg.bin bs=1024
count=10240”)

port = 12000
server = HTTPServer.new(:Port => port, :DocumentRoot => sourceFolder)

trap the signal for shutdown

trap(“INT”){ server.shutdown }
pid = Kernel.fork {
$stdout.reopen(‘/tmp/WEBrick.stdout’)
$stderr.reopen(‘/tmp/WEBrick.stderr’)
server.start

}

at_exit { Process.kill(“INT”, pid) }

Kernel.sleep 1

p “Host: #{uri.host}, port: #{uri.port}, request_uri:
#{uri.request_uri}”

Benchmark.bm(10) do |time|
out = File.new(“/tmp/tcp.tar.bz2”, “w”)
time.report(“TCPSocket”) do
s = TCPSocket.open uri.host, uri.port
s.write “GET #{uri.request_uri} HTTP/1.0\r\nHost:
#{uri.host}\r\n\r\n”
temp = s.read.split(“\r\n\r\n”, 2).last
s.close
out.write(temp)
end
out.close

out = File.new(“/tmp/luke.out”, “w”)
time.report(“LB net/http”) do
http = Net::HTTP.new(uri.host, uri.port)
http.request_get(uri.path) { |response|
response.read_body { |segment|
out.write(segment)
}
}
end
out.close

out = File.new(“/tmp/uri.tar.bz2”, “w”)
time.report(“open-uri”) do
uri.open do |x|
out.write(x.read)
end
end
out.close
end

hagus · July 15, 2006, 4:38pm

On Jul 15, 2006, at 3:44 AM, Luke B. wrote:

Below I have attached a test suite that illustrates the problem. I
used
WEBrick as the server.

Why don’t you swap out WEBrick for Mongrel and run the same tests?

I suspect that the server is the bottleneck, not the client.

Gary W.

hagus · July 15, 2006, 9:15pm

On Sat, 2006-07-15 at 23:37 +0900, [email protected] wrote:

On Jul 15, 2006, at 3:44 AM, Luke B. wrote:

Below I have attached a test suite that illustrates the problem. I
used
WEBrick as the server.

Why don’t you swap out WEBrick for Mongrel and run the same tests?

I suspect that the server is the bottleneck, not the client.

Better yet, don’t use a Ruby web server at all, and use another tool you
trust (httperf or ab and curl) to determine a good baseline performance.
Once you’ve got what could be done with net/http then you can run
net/http and compare.

Also, I’m working on a faster alternative to net/http in the RFuzz http
client. Stay tuned for that, but you can play with it right now:

http://www.zedshaw.com/projects/rfuzz/

hagus · July 16, 2006, 2:43am

Gary W. wrote:

I suspect that the server is the bottleneck, not the client.

Zed S. wrote:

Better yet, don’t use a Ruby web server at all, and use another tool you
trust (httperf or ab and curl) to determine a good baseline performance.
Once you’ve got what could be done with net/http then you can run
net/http and compare.

Hi Zed & Garry,

I neglected to mention in my post that I have already double checked
that WEBrick is not the culprit. Fetching from WEBrick using curl is as
fast as using TCPSocket:

$ time curl -O http://localhost:12000/ten-meg.bin
% Total % Received % Xferd Average Speed Time Time Time
Current
Dload Upload Total Spent Left
Speed
100 10.0M 100 10.0M 0 0 21.9M 0 --:–:-- --:–:-- --:–:–
29.4M

real 0m0.466s
user 0m0.013s
sys 0m0.093s

So I know Ruby can shift the bits around fast enough, it’s just that
net/http isn’t playing the game.

Also, I’m working on a faster alternative to net/http in the RFuzz http
client. Stay tuned for that, but you can play with it right now:

I’d definitely like to check that out!

I just think that net/http’s abysmal speed is somewhat anomalous. I am
confident there is a simple explanation - maybe some tight loop in there
doing something not particularly clever - but I haven’t had the time as
yet to dive right into net/http and find the reason. And I haven’t had
much success with Ruby profilers either. More suggestions welcome here!

I have gone ahead and changed the critical section of my code to use
TCPSocket instead. That solved the HTTP GET problem, but I still
struggle with PUTs:

file = File.open(resultFile, “r”)
http = Net::HTTP.new(@uri.host, @uri.port)
http.put(“/put/” + URI.escape(File.basename(resultFile)), file.read)

Now that’s not real pleasant because it relies on snarfing the whole
file into memory first. I would have liked to do something like:

http.put(“/put/” + URI.escape(File.basename(resultFile))) do
|datasocket|
while file.eof? == false
datasocket.write(file.read(4096))
end
end

This is similar to what net/http offers in the case of HTTP GET, but of
course it’s broken because of the aforementioned speed concerns:

http.request_get(“/#{file}”) { |response|
response.read_body { |segment|
# in here, 400 KB/s max and > 70% CPU utilisation …
outputFile.write(segment)
}
}

I still call myself a Ruby newbie, so one of my concerns is that perhaps
I’m Just Not Getting It and that if I followed the Ruby Way my troubles
would vanish

hagus · July 16, 2006, 1:44pm

In article [email protected],
“Florian F.” [email protected] writes:

I think you’re right. In the GET case (for both open-uri and net/http) I
have found a possibility to speed up the download. The method
BufferedIO#rbuf_fill in net/protocol.rb uses a fixed buffer size of
1024. I changed this to 8192:
def rbuf_fill
  timeout(@read_timeout) {
    @rbuf << @io.sysread(8192)
  }
end

I guess the timeout() is slow.

Try:

def rbuf_fill
  @rbuf << @io.sysread(1024)
end

However, the above is not acceptable in general since
timeout is a feature.

It is possible to implement timeout without timeout() as:

def rbuf_fill
  begin
    @rbuf << @io.read_nonblock(4096)
  rescue Errno::EWOULDBLOCK
    if IO.select([@io], nil, nil, @read_timeout)
      @rbuf << @io.read_nonblock(4096)
    else
      raise Timeout::TimeoutError
    end
  end
end

hagus · July 16, 2006, 4:15am

Luke B. wrote:

I just think that net/http’s abysmal speed is somewhat anomalous. I am
confident there is a simple explanation - maybe some tight loop in there
doing something not particularly clever - but I haven’t had the time as
yet to dive right into net/http and find the reason. And I haven’t had
much success with Ruby profilers either. More suggestions welcome here!

I think you’re right. In the GET case (for both open-uri and net/http) I
have found a possibility to speed up the download. The method
BufferedIO#rbuf_fill in net/protocol.rb uses a fixed buffer size of
1024. I changed this to 8192:

def rbuf_fill
  timeout(@read_timeout) {
    @rbuf << @io.sysread(8192)
  }
end

This lead to almost the same speed I could achieve with the use of
direct sockets.

http.put("/put/" + URI.escape(File.basename(resultFile))) do
|datasocket|
while file.eof? == false
datasocket.write(file.read(4096))
end
end

You can do this by setting Put#body_stream= to your IO object. The only
problem is, that the related method also sets a very small buffer size:

def send_request_with_body_stream(sock, ver, path, f)
  raise ArgumentError, "Content-Length not given and

Transfer-Encoding is not `chunked’" unless content_length() or chunked?
unless content_type()
warn ‘net/http: warning: Content-Type did not set; using
application/x-www-form-urlencoded’ if $VERBOSE
set_content_type ‘application/x-www-form-urlencoded’
end
write_header sock, ver, path
if chunked?
while s = f.read(1024)
sock.write(sprintf("%x\r\n", s.length) << s << “\r\n”)
end
sock.write “0\r\n\r\n”
else
while s = f.read(1024)
sock.write s
end
end
end

I think this is rather unfortunate. It would be better, if those methods
would use higher buffer values and/or make them tweakable if necessary.

hagus · July 17, 2006, 12:45am

On Sun, Jul 16, 2006 at 09:17:35PM +0900, Florian F. wrote:

     raise Timeout::TimeoutError
   end
 end
end
This would be even faster I think, because timeout also represents a lot
of overhead. For 1.8. those methods would have to be backported then.
Hint, hint…

RUBY_VERSION # => “1.8.5”
RUBY_RELEASE_DATE # => “2006-06-24”
IO.instance_methods.grep(/nonblock/) # => [“read_nonblock”,
“write_nonblock”]

(Yes, I must remove them from my 1.8 vs. 1.9 changelog summary)

hagus · July 16, 2006, 2:18pm

Tanaka A. wrote:

I guess the timeout() is slow.

Try:
def rbuf_fill
  @rbuf << @io.sysread(1024)
end

The main problem seems to be, that the read* methods of BufferedIO all
suffer from the small value in rbuf_fill. If I download a big file it’s
very likely, that TCP packets are bigger than 1024 bytes (depending on
my network infrastructure). For every received packet I have to call
lots of Ruby methods to handle it. At the same time this renders
operation system buffers, that are usually higher than 1024 bytes
useless. This is a big overhead per packet, which reduces the maximum
bandwidth, that can be achieved.

      @rbuf << @io.read_nonblock(4096)
    else
      raise Timeout::TimeoutError
    end
  end
end

This would be even faster I think, because timeout also represents a lot
of overhead. For 1.8. those methods would have to be backported then.
Hint, hint…

hagus · July 17, 2006, 5:07am

An addendum: testing with ruby 1.8.4 from a Locomotive bundle I had
handy, reveals my solution does not work on my recent versions of Ruby.

I suspect this might be due to more strictness on doing shonky things,
such as overriding private methods, which is the basis of my hack.

For now I am satisfied (since I am targeting OS X with Ruby 1.8.2), but
if any Ruby gurus can suggest a 1.8.4 compatible hack, I’m all ears.

Regards,

Luke.

hagus · July 17, 2006, 4:54am

Hi all,

Thanks to the many thoughtful suggestions here, I have implemented an
easy workaround to this problem that doesn’t involve giving up the
net/http library completely.

If you go back to my original benchmark testing code in the original
post, I have made the following changes. Basically I override the
necessary methods to tweak the buffer size:

class OverrideInternetMessageIO < Net::InternetMessageIO
def rbuf_fill
timeout(@read_timeout) {
@rbuf << @socket.sysread(65536)
}
end
end

class NewHTTP < Net::HTTP
def NewHTTP.socket_type
OverrideInternetMessageIO
end
end

After making all those changes, we see the following new results for the
10 MB file transfer from WEBrick:

                       user     system      total         real

net/http - big buffer 0.360000 0.390000 0.750000 ( 0.991848)

That’s still twice as slow as a raw TCPSocket, but it’s now definitely
in the realm of “usable for large file transfers”.

I couldn’t make any real recommendations on what the buffers size should
be. I imagine it’s a trade off between the OS kernel’s buffer size, TCP
packet size, and memory footprint of your application. Do HTTP clients
normally automatically negotiate a buffer? Do they pick one based on
content type? What are the common optimisations, and should net/http
follow them?

I tested a couple of values and found anything past 65536 bytes started
giving negligible returns, on my G5 running OS X 10.4.7 (ruby 1.8.2 -
i.e. the default OS X install).

Thanks again for all the pointers and commentary, and I hope to see a
more robust solution in a future version

Regards,

Luke.

hagus · July 17, 2006, 10:48am

Luke B. wrote:

necessary methods to tweak the buffer size:

class OverrideInternetMessageIO < Net::InternetMessageIO
def rbuf_fill
timeout(@read_timeout) {
@rbuf << @socket.sysread(65536)
}
end
end

I couldn’t make any real recommendations on what the buffers size should
be. I imagine it’s a trade off between the OS kernel’s buffer size, TCP
packet size, and memory footprint of your application. Do HTTP clients
normally automatically negotiate a buffer? Do they pick one based on
content type? What are the common optimisations, and should net/http
follow them?

I tested a couple of values and found anything past 65536 bytes started
giving negligible returns, on my G5 running OS X 10.4.7 (ruby 1.8.2 -
i.e. the default OS X install).

It would be really interesting to get a record of the actual data sizes
read out from each of the calls to sysread(65536) in your modified code.
If stdout is free, maybe you could do something like:
timeout(@read_timeout) {
a << @socket.sysread(65536)
puts “sysread #{a.length} bytes”
@rbuf << a
}
just for a few trials, especially to compare the values when your large
file in coming in from a LAN, a WAN, and the Internet. The values you
see should give you a clue as to what the sysread buffer size ought to
be. TCP of course has the “sliding congestion window” mechanism in which
it adaptively increases the number of bytes that a peer may send to
another peer before it must wait for an acknowledgement. Most of the
time, this number may not be larger than 64K (because it’s carried in a
16-bit field in the TCP packet header), which explains your observation
that 64K is a practical limit. With a large file transfer on a fast and
otherwise unloaded network, you should see this value quickly reach and
remain at 64K. (This is not something that HTTP clients have to do
themselves, to your other question- it’s built into TCP.) If your
application runs on a LAN, I would expect a lot of benefit from 64K
sysreads. Across the Internet, I’d be pretty surprised if you get much
improvement from sysreads above 16K (which is also the typical
network-driver buffer size for Berkeley-derived kernels like OSX, unless
you’ve tweaked yours).

It’s rather a surprise that the Ruby code which handles these raw reads
is so inefficient that cutting down the number of passes through it
makes such a difference. I’m usually pretty surprised when I/O
processing in an application is more than a negligible fraction of the
network transit time.

hagus · July 17, 2006, 11:22am

It would be really interesting to get a record of the actual data sizes
read out from each of the calls to sysread(65536) in your modified code.
If stdout is free, maybe you could do something like:
timeout(@read_timeout) {
a << @socket.sysread(65536)
puts “sysread #{a.length} bytes”
@rbuf << a
}

Yecch, sorry, obviously the second line in my code snippet should be:
a = @socket.sysread(65536)

hagus · July 17, 2006, 4:24pm

Luke B. [email protected] writes:

After making all those changes, we see the following new results for the
10 MB file transfer from WEBrick:
                       user     system      total         real
net/http - big buffer 0.360000 0.390000 0.750000 ( 0.991848)

That’s still twice as slow as a raw TCPSocket, but it’s now definitely
in the realm of “usable for large file transfers”.

Does this improve noticeably if you use the read_nonblock suggestion
given earlier in the thread? Say:

class OverrideInternetMessageIO < Net::InternetMessageIO
def rbuf_fill
begin
@rbuf << @io.read_nonblock(65536)
rescue Errno::EWOULDBLOCK
if IO.select([@io], nil, nil, @read_timeout)
@rbuf << @io.read_nonblock(65536)
else
raise Timeout::TimeoutError
end
end
end
end

As for the appropriate buffer size, for what it’s worth apache uses
this structure to read into when it’s acting as a proxy server and
reading someone else’s output:

char buffer[HUGE_STRING_LEN];

Where HUGE_STRING_LEN is defined in various apr (Apache Portable
Runtime) headers as 8192. (In Apcahe 1.3 it was in ‘httpd.h’)

I don’t have time to track through the mozilla source to find out what
buffer size they use.

hagus · July 18, 2006, 6:10am

In article [email protected],
Luke B. [email protected] writes:

I tested this earlier … but my version of Ruby doesn’t have
read_nonblock, unfortunately. I haven’t had a chance to pull down 1.8.5
and re-test.

This should work with 1.8.2.

def rbuf_fill
  if IO.select([@io], nil, nil, @read_timeout)
    @rbuf << @io.sysread(16384)
  else
    raise Timeout::TimeoutError
  end
end

The enlarging buffer size should work well until in-kernel
TCP buffer is large enough to store data receiving between
successive rbuf_fill.

If the in-kernel buffer is not large enough, the overhead
should be reduced. I think timeout() is the first candidate
to remove.

hagus · July 18, 2006, 12:12am

Daniel M. wrote:

Does this improve noticeably if you use the read_nonblock suggestion
given earlier in the thread? Say:

Hi Daniel,

I tested this earlier … but my version of Ruby doesn’t have
read_nonblock, unfortunately. I haven’t had a chance to pull down 1.8.5
and re-test.

Additionally, if I were to do so, I’d need to find a new workaround
method. As mentioned above, my OverrideInternetMessageIO class does not
seem to function in > ruby-1.8.2. So that makes a nice catch-22.

Of couse I could hand edit the net/http classes, which would suffice for
an academic test. In the real world I can’t run around patching people’s
net/http for them

Taking a step back for a moment - is there a Ruby bugzilla system or
equivalent, where such problems can be logged and prioritised? I would
be happy to do the grunt work of submitting the patch, now that the
solution is pretty clear.

Regards,

Luke.