Ruby Forum Ruby > net/http performance

Posted by Luke Burton (hagus)
on 15.07.2006 09:44
(notes: I posted this to comp.lang.ruby, figuring it would filter 
through to appropriate mailing lists, but it did not).

It seems that net/http's implementation is extremely inefficient when
it comes to dealing with large files.

I think this is something worth fixing in subsequent versions. It
shouldn't be as bad as it is. I would also appreciate any hints or
advice on working around the problem.

Specifically, I am interested in HTTP GETs (from net/http) and HTTP PUTs 
(both on the net/http side and WEBrick receiving side) that have 
adequate streaming performance. I would like to GET and PUT fairly large 
files, and don't want to pay such a large network and CPU performance 
overhead.

Below I have attached a test suite that illustrates the problem. I used
WEBrick as the server.

"Host: localhost, port: 12000, request_uri: /ten-meg.bin"
                user     system      total        real
TCPSocket    0.030000   0.150000   0.180000 (  0.468867)
net/http    10.620000   8.630000  19.250000 ( 21.787785)
LB net/http 10.870000   8.900000  19.770000 ( 22.259448)
open-uri    16.400000  11.900000  28.300000 ( 39.834555)

As you can see, a raw TCPSocket is orders of magnitude faster than
net/http and friends. However, I'm using read_body and receiving the
data in chunks, and I would have expected much better performance as a
result. We're talking 20MB/s for TCPSocket versus 400KB/s for net/http.

What's happening here? What can I do to fix it?

Any help appreciated.

Regards,

Luke.

#!/usr/bin/ruby

require 'net/http'
require 'open-uri'
require 'benchmark'
require 'WEBrick'
include WEBrick

uri = URI.parse("http://localhost:12000/ten-meg.bin")
sourceFolder = "/tmp/"

Kernel.system("dd if=/dev/random of=/tmp/ten-meg.bin bs=1024
count=10240")

port = 12000
server = HTTPServer.new(:Port => port, :DocumentRoot => sourceFolder)
# trap the signal for shutdown
trap("INT"){ server.shutdown }
pid = Kernel.fork {
  $stdout.reopen('/tmp/WEBrick.stdout')
  $stderr.reopen('/tmp/WEBrick.stderr')
  server.start

}

at_exit { Process.kill("INT", pid) }

Kernel.sleep 1

p "Host: #{uri.host}, port: #{uri.port}, request_uri:
#{uri.request_uri}"

Benchmark.bm(10) do |time|
  out = File.new("/tmp/tcp.tar.bz2", "w")
  time.report("TCPSocket") do
    s = TCPSocket.open uri.host, uri.port
    s.write "GET #{uri.request_uri} HTTP/1.0\r\nHost:
#{uri.host}\r\n\r\n"
    temp = s.read.split("\r\n\r\n", 2).last
    s.close
    out.write(temp)
  end
  out.close

  out = File.new("/tmp/net.tar.bz2", "w")
  time.report("net/http") do
    Net::HTTP.start uri.host, uri.port do |http|
      http.request_get(uri.request_uri) do |response|
        response.read_body do |segment|
          out.write(segment)
        end
      end
    end
  end
  out.close

  out = File.new("/tmp/luke.out", "w")
  time.report("LB net/http") do
    http = Net::HTTP.new(uri.host, uri.port)
    http.request_get(uri.path) { |response|
      response.read_body { |segment|
        out.write(segment)
      }
    }
  end
  out.close

  out = File.new("/tmp/uri.tar.bz2", "w")
  time.report("open-uri") do
    uri.open do |x|
      out.write(x.read)
    end
  end
  out.close
end
Posted by unknown (Guest)
on 15.07.2006 16:38
(Received via mailing list)
On Jul 15, 2006, at 3:44 AM, Luke Burton wrote:
> Below I have attached a test suite that illustrates the problem. I  
> used
> WEBrick as the server.

Why don't you swap out WEBrick for Mongrel and run the same tests?

I suspect that the server is the bottleneck, not the client.

Gary Wright
Posted by Zed Shaw (Guest)
on 15.07.2006 21:15
(Received via mailing list)
On Sat, 2006-07-15 at 23:37 +0900, gwtmp01@mac.com wrote:
> On Jul 15, 2006, at 3:44 AM, Luke Burton wrote:
> > Below I have attached a test suite that illustrates the problem. I  
> > used
> > WEBrick as the server.
> 
> Why don't you swap out WEBrick for Mongrel and run the same tests?
> 
> I suspect that the server is the bottleneck, not the client.

Better yet, don't use a Ruby web server at all, and use another tool you
trust (httperf or ab and curl) to determine a good baseline performance.
Once you've got what *could* be done with net/http then you can run
net/http and compare.

Also, I'm working on a faster alternative to net/http in the RFuzz http
client.  Stay tuned for that, but you can play with it right now:

  http://www.zedshaw.com/projects/rfuzz/
Posted by Luke Burton (hagus)
on 16.07.2006 02:43
Gary Wright wrote:
>> I suspect that the server is the bottleneck, not the client.

Zed Shaw wrote:
> Better yet, don't use a Ruby web server at all, and use another tool you
> trust (httperf or ab and curl) to determine a good baseline performance.
> Once you've got what *could* be done with net/http then you can run
> net/http and compare.

Hi Zed & Garry,

I neglected to mention in my post that I have already double checked 
that WEBrick is not the culprit. Fetching from WEBrick using curl is as 
fast as using TCPSocket:

$ time curl -O http://localhost:12000/ten-meg.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time 
Current
                                 Dload  Upload   Total   Spent    Left 
Speed
100 10.0M  100 10.0M    0     0  21.9M      0 --:--:-- --:--:-- --:--:-- 
29.4M

real    0m0.466s
user    0m0.013s
sys     0m0.093s

So I *know* Ruby can shift the bits around fast enough, it's just that 
net/http isn't playing the game.

> Also, I'm working on a faster alternative to net/http in the RFuzz http
> client.  Stay tuned for that, but you can play with it right now:

I'd definitely like to check that out!

I just think that net/http's abysmal speed is somewhat anomalous. I am 
confident there is a simple explanation - maybe some tight loop in there 
doing something not particularly clever - but I haven't had the time as 
yet to dive right into net/http and find the reason. And I haven't had 
much success with Ruby profilers either. More suggestions welcome here!

I have gone ahead and changed the critical section of my code to use 
TCPSocket instead. That solved the HTTP GET problem, but I still 
struggle with PUTs:

file = File.open(resultFile, "r")
http = Net::HTTP.new(@uri.host, @uri.port)
http.put("/put/" + URI.escape(File.basename(resultFile)), file.read)

Now that's not real pleasant because it relies on snarfing the whole 
file into memory first. I would have liked to do something like:

http.put("/put/" + URI.escape(File.basename(resultFile))) do 
|datasocket|
    while file.eof? == false
        datasocket.write(file.read(4096))
    end
end

This is similar to what net/http offers in the case of HTTP GET, but of 
course it's broken because of the aforementioned speed concerns:

http.request_get("/#{file}") { |response|
    response.read_body { |segment|
        # in here, 400 KB/s max and > 70% CPU utilisation ...
        outputFile.write(segment)
   }
}

I still call myself a Ruby newbie, so one of my concerns is that perhaps 
I'm Just Not Getting It and that if I followed the Ruby Way my troubles 
would vanish :)
Posted by Florian Frank (Guest)
on 16.07.2006 04:15
(Received via mailing list)
Luke Burton wrote:

>I just think that net/http's abysmal speed is somewhat anomalous. I am 
>confident there is a simple explanation - maybe some tight loop in there 
>doing something not particularly clever - but I haven't had the time as 
>yet to dive right into net/http and find the reason. And I haven't had 
>much success with Ruby profilers either. More suggestions welcome here!
>  
>
I think you're right. In the GET case (for both open-uri and net/http) I
have found a possibility to speed up the download. The method
BufferedIO#rbuf_fill in net/protocol.rb uses a fixed buffer size of
1024. I changed this to 8192:

    def rbuf_fill
      timeout(@read_timeout) {
        @rbuf << @io.sysread(8192)
      }
    end

This lead to almost the same speed I could achieve with the use of
direct sockets.

>
>http.put("/put/" + URI.escape(File.basename(resultFile))) do 
>|datasocket|
>    while file.eof? == false
>        datasocket.write(file.read(4096))
>    end
>end
>  
>
You can do this by setting Put#body_stream= to your IO object. The only
problem is, that the related method also sets a very small buffer size:

    def send_request_with_body_stream(sock, ver, path, f)
      raise ArgumentError, "Content-Length not given and
Transfer-Encoding is not `chunked'" unless content_length() or chunked?
      unless content_type()
        warn 'net/http: warning: Content-Type did not set; using
application/x-www-form-urlencoded' if $VERBOSE
        set_content_type 'application/x-www-form-urlencoded'
      end
      write_header sock, ver, path
      if chunked?
        while s = f.read(1024)
          sock.write(sprintf("%x\r\n", s.length) << s << "\r\n")
        end
        sock.write "0\r\n\r\n"
      else
        while s = f.read(1024)
          sock.write s
        end
      end
    end

I think this is rather unfortunate. It would be better, if those methods
would use higher buffer values and/or make them tweakable if necessary.
Posted by Tanaka Akira (Guest)
on 16.07.2006 13:44
(Received via mailing list)
In article <44B9A127.3020801@nixe.ping.de>,
  "Florian Frank" <flori@nixe.ping.de> writes:

> I think you're right. In the GET case (for both open-uri and net/http) I 
> have found a possibility to speed up the download. The method 
> BufferedIO#rbuf_fill in net/protocol.rb uses a fixed buffer size of 
> 1024. I changed this to 8192:
>
>     def rbuf_fill
>       timeout(@read_timeout) {
>         @rbuf << @io.sysread(8192)
>       }
>     end

I guess the timeout() is slow.

Try:

    def rbuf_fill
      @rbuf << @io.sysread(1024)
    end

However, the above is not acceptable in general since
timeout is a feature.

It is possible to implement timeout without timeout() as:

    def rbuf_fill
      begin
        @rbuf << @io.read_nonblock(4096)
      rescue Errno::EWOULDBLOCK
        if IO.select([@io], nil, nil, @read_timeout)
          @rbuf << @io.read_nonblock(4096)
        else
          raise Timeout::TimeoutError
        end
      end
    end
Posted by Florian Frank (Guest)
on 16.07.2006 14:18
(Received via mailing list)
Tanaka Akira wrote:
> I guess the timeout() is slow.
>
> Try:
>
>     def rbuf_fill
>       @rbuf << @io.sysread(1024)
>     end
>   
The main problem seems to be, that the read* methods of BufferedIO all
suffer from the small value in rbuf_fill. If I download a big file it's
very likely, that TCP packets are bigger than 1024 bytes (depending on
my network infrastructure). For every received packet I have to call
lots of Ruby methods to handle it. At the same time this renders
operation system buffers, that are usually higher than 1024 bytes
useless. This is a big overhead per packet, which reduces the maximum
bandwidth, that can be achieved.
>           @rbuf << @io.read_nonblock(4096)
>         else
>           raise Timeout::TimeoutError
>         end
>       end
>     end
>   
This would be even faster I think, because timeout also represents a lot
of overhead. For 1.8. those methods would have to be backported then.
Hint, hint... ;)
Posted by Mauricio Fernandez (Guest)
on 17.07.2006 00:45
(Received via mailing list)
On Sun, Jul 16, 2006 at 09:17:35PM +0900, Florian Frank wrote:
> >          raise Timeout::TimeoutError
> >        end
> >      end
> >    end
> >  
> This would be even faster I think, because timeout also represents a lot 
> of overhead. For 1.8. those methods would have to be backported then. 
> Hint, hint... ;)

RUBY_VERSION                         # => "1.8.5"
RUBY_RELEASE_DATE                    # => "2006-06-24"
IO.instance_methods.grep(/nonblock/) # => ["read_nonblock", 
"write_nonblock"]

(Yes, I must remove them from my 1.8 vs. 1.9 changelog summary)
Posted by Luke Burton (hagus)
on 17.07.2006 04:54
Hi all,

Thanks to the many thoughtful suggestions here, I have implemented an 
easy workaround to this problem that doesn't involve giving up the 
net/http library completely.

If you go back to my original benchmark testing code in the original 
post, I have made the following changes. Basically I override the 
necessary methods to tweak the buffer size:

class OverrideInternetMessageIO < Net::InternetMessageIO
  def rbuf_fill
    timeout(@read_timeout) {
      @rbuf << @socket.sysread(65536)
    }
  end
end

class NewHTTP < Net::HTTP
  def NewHTTP.socket_type
     OverrideInternetMessageIO
  end
end

Benchmark.bm(10) do |time|
  out = File.new("/tmp/net.tar.bz2", "w")
  time.report("net/http - bigbuffer") do
    NewHTTP.start uri.host, uri.port do |http|
      http.request_get(uri.request_uri) do |response|
        response.read_body do |segment|
          out.write(segment)
        end
      end
    end
  end
  out.close
end

After making all those changes, we see the following new results for the 
10 MB file transfer from WEBrick:

                           user     system      total         real
net/http - big buffer  0.360000   0.390000   0.750000 (  0.991848)

That's still twice as slow as a raw TCPSocket, but it's now definitely 
in the realm of "usable for large file transfers".

I couldn't make any real recommendations on what the buffers size should 
be. I imagine it's a trade off between the OS kernel's buffer size, TCP 
packet size, and memory footprint of your application. Do HTTP clients 
normally automatically negotiate a buffer? Do they pick one based on 
content type? What are the common optimisations, and should net/http 
follow them?

I tested a couple of values and found anything past 65536 bytes started 
giving negligible returns, on my G5 running OS X 10.4.7 (ruby 1.8.2 - 
i.e. the default OS X install).

Thanks again for all the pointers and commentary, and I hope to see a 
more robust solution in a future version :)

Regards,

Luke.
Posted by Luke Burton (hagus)
on 17.07.2006 05:07
An addendum: testing with ruby 1.8.4 from a Locomotive bundle I had 
handy, reveals my solution does not work on my recent versions of Ruby.

I suspect this might be due to more strictness on doing shonky things, 
such as overriding private methods, which is the basis of my hack.

For now I am satisfied (since I am targeting OS X with Ruby 1.8.2), but 
if any Ruby gurus can suggest a 1.8.4 compatible hack, I'm all ears.

Regards,

Luke.
Posted by Francis Cianfrocca (blackhedd)
on 17.07.2006 10:48
Luke Burton wrote:
> necessary methods to tweak the buffer size:
> 
> class OverrideInternetMessageIO < Net::InternetMessageIO
>   def rbuf_fill
>     timeout(@read_timeout) {
>       @rbuf << @socket.sysread(65536)
>     }
>   end
> end
> 
 >
> I couldn't make any real recommendations on what the buffers size should 
> be. I imagine it's a trade off between the OS kernel's buffer size, TCP 
> packet size, and memory footprint of your application. Do HTTP clients 
> normally automatically negotiate a buffer? Do they pick one based on 
> content type? What are the common optimisations, and should net/http 
> follow them?
> 
> I tested a couple of values and found anything past 65536 bytes started 
> giving negligible returns, on my G5 running OS X 10.4.7 (ruby 1.8.2 - 
> i.e. the default OS X install).

It would be really interesting to get a record of the actual data sizes 
read out from each of the calls to sysread(65536) in your modified code. 
If stdout is free, maybe you could do something like:
  timeout(@read_timeout) {
    a << @socket.sysread(65536)
    puts "sysread #{a.length} bytes"
    @rbuf << a
  }
just for a few trials, especially to compare the values when your large 
file in coming in from a LAN, a WAN, and the Internet. The values you 
see should give you a clue as to what the sysread buffer size ought to 
be. TCP of course has the "sliding congestion window" mechanism in which 
it adaptively increases the number of bytes that a peer may send to 
another peer before it must wait for an acknowledgement. Most of the 
time, this number may not be larger than 64K (because it's carried in a 
16-bit field in the TCP packet header), which explains your observation 
that 64K is a practical limit. With a large file transfer on a fast and 
otherwise unloaded network, you should see this value quickly reach and 
remain at 64K. (This is not something that HTTP clients have to do 
themselves, to your other question- it's built into TCP.) If your 
application runs on a LAN, I would expect a lot of benefit from 64K 
sysreads. Across the Internet, I'd be pretty surprised if you get much 
improvement from sysreads above 16K (which is also the typical 
network-driver buffer size for Berkeley-derived kernels like OSX, unless 
you've tweaked yours).

It's rather a surprise that the Ruby code which handles these raw reads 
is so inefficient that cutting down the number of passes through it 
makes such a difference. I'm usually pretty surprised when I/O 
processing in an application is more than a negligible fraction of the 
network transit time.
Posted by Francis Cianfrocca (blackhedd)
on 17.07.2006 11:22
> It would be really interesting to get a record of the actual data sizes 
> read out from each of the calls to sysread(65536) in your modified code. 
> If stdout is free, maybe you could do something like:
>   timeout(@read_timeout) {
>     a << @socket.sysread(65536)
>     puts "sysread #{a.length} bytes"
>     @rbuf << a
>   }


Yecch, sorry, obviously the second line in my code snippet should be:
    a = @socket.sysread(65536)
Posted by Daniel Martin (Guest)
on 17.07.2006 16:24
(Received via mailing list)
Luke Burton <luke@burton.echidna.id.au> writes:

> After making all those changes, we see the following new results for the 
> 10 MB file transfer from WEBrick:
>
>                            user     system      total         real
> net/http - big buffer  0.360000   0.390000   0.750000 (  0.991848)
>
> That's still twice as slow as a raw TCPSocket, but it's now definitely 
> in the realm of "usable for large file transfers".

Does this improve noticeably if you use the read_nonblock suggestion
given earlier in the thread?  Say:

class OverrideInternetMessageIO < Net::InternetMessageIO
  def rbuf_fill
    begin
      @rbuf << @io.read_nonblock(65536)
    rescue Errno::EWOULDBLOCK
      if IO.select([@io], nil, nil, @read_timeout)
        @rbuf << @io.read_nonblock(65536)
      else
        raise Timeout::TimeoutError
      end
    end
  end
end

As for the appropriate buffer size, for what it's worth apache uses
this structure to read into when it's acting as a proxy server and
reading someone else's output:

    char buffer[HUGE_STRING_LEN];

Where HUGE_STRING_LEN is defined in various apr (Apache Portable
Runtime) headers as 8192. (In Apcahe 1.3 it was in 'httpd.h')

I don't have time to track through the mozilla source to find out what
buffer size they use.
Posted by Luke Burton (hagus)
on 18.07.2006 00:12
Daniel Martin wrote:

> Does this improve noticeably if you use the read_nonblock suggestion
> given earlier in the thread?  Say:

Hi Daniel,

I tested this earlier ... but my version of Ruby doesn't have 
read_nonblock, unfortunately. I haven't had a chance to pull down 1.8.5 
and re-test.

Additionally, if I were to do so, I'd need to find a new workaround 
method. As mentioned above, my OverrideInternetMessageIO class does not 
seem to function in > ruby-1.8.2. So that makes a nice catch-22.

Of couse I could hand edit the net/http classes, which would suffice for 
an academic test. In the real world I can't run around patching people's 
net/http for them :(

Taking a step back for a moment - is there a Ruby bugzilla system or 
equivalent, where such problems can be logged and prioritised? I would 
be happy to do the grunt work of submitting the patch, now that the 
solution is pretty clear.

Regards,

Luke.
Posted by Tanaka Akira (Guest)
on 18.07.2006 06:10
(Received via mailing list)
In article <17b6c68ca7a0c35209ac51f136506734@ruby-forum.com>,
  Luke Burton <luke@burton.echidna.id.au> writes:

> I tested this earlier ... but my version of Ruby doesn't have 
> read_nonblock, unfortunately. I haven't had a chance to pull down 1.8.5 
> and re-test.

This should work with 1.8.2.

    def rbuf_fill
      if IO.select([@io], nil, nil, @read_timeout)
        @rbuf << @io.sysread(16384)
      else
        raise Timeout::TimeoutError
      end
    end

The enlarging buffer size should work well until in-kernel
TCP buffer is large enough to store data receiving between
successive rbuf_fill.

If the in-kernel buffer is not large enough, the overhead
should be reduced.  I think timeout() is the first candidate
to remove.