(notes: I posted this to comp.lang.ruby, figuring it would filter
through to appropriate mailing lists, but it did not).
It seems that net/http's implementation is extremely inefficient when
it comes to dealing with large files.
I think this is something worth fixing in subsequent versions. It
shouldn't be as bad as it is. I would also appreciate any hints or
advice on working around the problem.
Specifically, I am interested in HTTP GETs (from net/http) and HTTP PUTs
(both on the net/http side and WEBrick receiving side) that have
adequate streaming performance. I would like to GET and PUT fairly large
files, and don't want to pay such a large network and CPU performance
overhead.
Below I have attached a test suite that illustrates the problem. I used
WEBrick as the server.
"Host: localhost, port: 12000, request_uri: /ten-meg.bin"
user system total real
TCPSocket 0.030000 0.150000 0.180000 ( 0.468867)
net/http 10.620000 8.630000 19.250000 ( 21.787785)
LB net/http 10.870000 8.900000 19.770000 ( 22.259448)
open-uri 16.400000 11.900000 28.300000 ( 39.834555)
As you can see, a raw TCPSocket is orders of magnitude faster than
net/http and friends. However, I'm using read_body and receiving the
data in chunks, and I would have expected much better performance as a
result. We're talking 20MB/s for TCPSocket versus 400KB/s for net/http.
What's happening here? What can I do to fix it?
Any help appreciated.
Regards,
Luke.
#!/usr/bin/ruby
require 'net/http'
require 'open-uri'
require 'benchmark'
require 'WEBrick'
include WEBrick
uri = URI.parse("http://localhost:12000/ten-meg.bin")
sourceFolder = "/tmp/"
Kernel.system("dd if=/dev/random of=/tmp/ten-meg.bin bs=1024
count=10240")
port = 12000
server = HTTPServer.new(:Port => port, :DocumentRoot => sourceFolder)
# trap the signal for shutdown
trap("INT"){ server.shutdown }
pid = Kernel.fork {
$stdout.reopen('/tmp/WEBrick.stdout')
$stderr.reopen('/tmp/WEBrick.stderr')
server.start
}
at_exit { Process.kill("INT", pid) }
Kernel.sleep 1
p "Host: #{uri.host}, port: #{uri.port}, request_uri:
#{uri.request_uri}"
Benchmark.bm(10) do |time|
out = File.new("/tmp/tcp.tar.bz2", "w")
time.report("TCPSocket") do
s = TCPSocket.open uri.host, uri.port
s.write "GET #{uri.request_uri} HTTP/1.0\r\nHost:
#{uri.host}\r\n\r\n"
temp = s.read.split("\r\n\r\n", 2).last
s.close
out.write(temp)
end
out.close
out = File.new("/tmp/net.tar.bz2", "w")
time.report("net/http") do
Net::HTTP.start uri.host, uri.port do |http|
http.request_get(uri.request_uri) do |response|
response.read_body do |segment|
out.write(segment)
end
end
end
end
out.close
out = File.new("/tmp/luke.out", "w")
time.report("LB net/http") do
http = Net::HTTP.new(uri.host, uri.port)
http.request_get(uri.path) { |response|
response.read_body { |segment|
out.write(segment)
}
}
end
out.close
out = File.new("/tmp/uri.tar.bz2", "w")
time.report("open-uri") do
uri.open do |x|
out.write(x.read)
end
end
out.close
end
on 15.07.2006 09:44
on 15.07.2006 16:38
On Jul 15, 2006, at 3:44 AM, Luke Burton wrote: > Below I have attached a test suite that illustrates the problem. I > used > WEBrick as the server. Why don't you swap out WEBrick for Mongrel and run the same tests? I suspect that the server is the bottleneck, not the client. Gary Wright
on 15.07.2006 21:15
On Sat, 2006-07-15 at 23:37 +0900, gwtmp01@mac.com wrote: > On Jul 15, 2006, at 3:44 AM, Luke Burton wrote: > > Below I have attached a test suite that illustrates the problem. I > > used > > WEBrick as the server. > > Why don't you swap out WEBrick for Mongrel and run the same tests? > > I suspect that the server is the bottleneck, not the client. Better yet, don't use a Ruby web server at all, and use another tool you trust (httperf or ab and curl) to determine a good baseline performance. Once you've got what *could* be done with net/http then you can run net/http and compare. Also, I'm working on a faster alternative to net/http in the RFuzz http client. Stay tuned for that, but you can play with it right now: http://www.zedshaw.com/projects/rfuzz/
on 16.07.2006 02:43
Gary Wright wrote: >> I suspect that the server is the bottleneck, not the client. Zed Shaw wrote: > Better yet, don't use a Ruby web server at all, and use another tool you > trust (httperf or ab and curl) to determine a good baseline performance. > Once you've got what *could* be done with net/http then you can run > net/http and compare. Hi Zed & Garry, I neglected to mention in my post that I have already double checked that WEBrick is not the culprit. Fetching from WEBrick using curl is as fast as using TCPSocket: $ time curl -O http://localhost:12000/ten-meg.bin % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 10.0M 100 10.0M 0 0 21.9M 0 --:--:-- --:--:-- --:--:-- 29.4M real 0m0.466s user 0m0.013s sys 0m0.093s So I *know* Ruby can shift the bits around fast enough, it's just that net/http isn't playing the game. > Also, I'm working on a faster alternative to net/http in the RFuzz http > client. Stay tuned for that, but you can play with it right now: I'd definitely like to check that out! I just think that net/http's abysmal speed is somewhat anomalous. I am confident there is a simple explanation - maybe some tight loop in there doing something not particularly clever - but I haven't had the time as yet to dive right into net/http and find the reason. And I haven't had much success with Ruby profilers either. More suggestions welcome here! I have gone ahead and changed the critical section of my code to use TCPSocket instead. That solved the HTTP GET problem, but I still struggle with PUTs: file = File.open(resultFile, "r") http = Net::HTTP.new(@uri.host, @uri.port) http.put("/put/" + URI.escape(File.basename(resultFile)), file.read) Now that's not real pleasant because it relies on snarfing the whole file into memory first. I would have liked to do something like: http.put("/put/" + URI.escape(File.basename(resultFile))) do |datasocket| while file.eof? == false datasocket.write(file.read(4096)) end end This is similar to what net/http offers in the case of HTTP GET, but of course it's broken because of the aforementioned speed concerns: http.request_get("/#{file}") { |response| response.read_body { |segment| # in here, 400 KB/s max and > 70% CPU utilisation ... outputFile.write(segment) } } I still call myself a Ruby newbie, so one of my concerns is that perhaps I'm Just Not Getting It and that if I followed the Ruby Way my troubles would vanish :)
on 16.07.2006 04:15
Luke Burton wrote: >I just think that net/http's abysmal speed is somewhat anomalous. I am >confident there is a simple explanation - maybe some tight loop in there >doing something not particularly clever - but I haven't had the time as >yet to dive right into net/http and find the reason. And I haven't had >much success with Ruby profilers either. More suggestions welcome here! > > I think you're right. In the GET case (for both open-uri and net/http) I have found a possibility to speed up the download. The method BufferedIO#rbuf_fill in net/protocol.rb uses a fixed buffer size of 1024. I changed this to 8192: def rbuf_fill timeout(@read_timeout) { @rbuf << @io.sysread(8192) } end This lead to almost the same speed I could achieve with the use of direct sockets. > >http.put("/put/" + URI.escape(File.basename(resultFile))) do >|datasocket| > while file.eof? == false > datasocket.write(file.read(4096)) > end >end > > You can do this by setting Put#body_stream= to your IO object. The only problem is, that the related method also sets a very small buffer size: def send_request_with_body_stream(sock, ver, path, f) raise ArgumentError, "Content-Length not given and Transfer-Encoding is not `chunked'" unless content_length() or chunked? unless content_type() warn 'net/http: warning: Content-Type did not set; using application/x-www-form-urlencoded' if $VERBOSE set_content_type 'application/x-www-form-urlencoded' end write_header sock, ver, path if chunked? while s = f.read(1024) sock.write(sprintf("%x\r\n", s.length) << s << "\r\n") end sock.write "0\r\n\r\n" else while s = f.read(1024) sock.write s end end end I think this is rather unfortunate. It would be better, if those methods would use higher buffer values and/or make them tweakable if necessary.
on 16.07.2006 13:44
In article <44B9A127.3020801@nixe.ping.de>, "Florian Frank" <flori@nixe.ping.de> writes: > I think you're right. In the GET case (for both open-uri and net/http) I > have found a possibility to speed up the download. The method > BufferedIO#rbuf_fill in net/protocol.rb uses a fixed buffer size of > 1024. I changed this to 8192: > > def rbuf_fill > timeout(@read_timeout) { > @rbuf << @io.sysread(8192) > } > end I guess the timeout() is slow. Try: def rbuf_fill @rbuf << @io.sysread(1024) end However, the above is not acceptable in general since timeout is a feature. It is possible to implement timeout without timeout() as: def rbuf_fill begin @rbuf << @io.read_nonblock(4096) rescue Errno::EWOULDBLOCK if IO.select([@io], nil, nil, @read_timeout) @rbuf << @io.read_nonblock(4096) else raise Timeout::TimeoutError end end end
on 16.07.2006 14:18
Tanaka Akira wrote: > I guess the timeout() is slow. > > Try: > > def rbuf_fill > @rbuf << @io.sysread(1024) > end > The main problem seems to be, that the read* methods of BufferedIO all suffer from the small value in rbuf_fill. If I download a big file it's very likely, that TCP packets are bigger than 1024 bytes (depending on my network infrastructure). For every received packet I have to call lots of Ruby methods to handle it. At the same time this renders operation system buffers, that are usually higher than 1024 bytes useless. This is a big overhead per packet, which reduces the maximum bandwidth, that can be achieved. > @rbuf << @io.read_nonblock(4096) > else > raise Timeout::TimeoutError > end > end > end > This would be even faster I think, because timeout also represents a lot of overhead. For 1.8. those methods would have to be backported then. Hint, hint... ;)
on 17.07.2006 00:45
On Sun, Jul 16, 2006 at 09:17:35PM +0900, Florian Frank wrote: > > raise Timeout::TimeoutError > > end > > end > > end > > > This would be even faster I think, because timeout also represents a lot > of overhead. For 1.8. those methods would have to be backported then. > Hint, hint... ;) RUBY_VERSION # => "1.8.5" RUBY_RELEASE_DATE # => "2006-06-24" IO.instance_methods.grep(/nonblock/) # => ["read_nonblock", "write_nonblock"] (Yes, I must remove them from my 1.8 vs. 1.9 changelog summary)
on 17.07.2006 04:54
Hi all,
Thanks to the many thoughtful suggestions here, I have implemented an
easy workaround to this problem that doesn't involve giving up the
net/http library completely.
If you go back to my original benchmark testing code in the original
post, I have made the following changes. Basically I override the
necessary methods to tweak the buffer size:
class OverrideInternetMessageIO < Net::InternetMessageIO
def rbuf_fill
timeout(@read_timeout) {
@rbuf << @socket.sysread(65536)
}
end
end
class NewHTTP < Net::HTTP
def NewHTTP.socket_type
OverrideInternetMessageIO
end
end
Benchmark.bm(10) do |time|
out = File.new("/tmp/net.tar.bz2", "w")
time.report("net/http - bigbuffer") do
NewHTTP.start uri.host, uri.port do |http|
http.request_get(uri.request_uri) do |response|
response.read_body do |segment|
out.write(segment)
end
end
end
end
out.close
end
After making all those changes, we see the following new results for the
10 MB file transfer from WEBrick:
user system total real
net/http - big buffer 0.360000 0.390000 0.750000 ( 0.991848)
That's still twice as slow as a raw TCPSocket, but it's now definitely
in the realm of "usable for large file transfers".
I couldn't make any real recommendations on what the buffers size should
be. I imagine it's a trade off between the OS kernel's buffer size, TCP
packet size, and memory footprint of your application. Do HTTP clients
normally automatically negotiate a buffer? Do they pick one based on
content type? What are the common optimisations, and should net/http
follow them?
I tested a couple of values and found anything past 65536 bytes started
giving negligible returns, on my G5 running OS X 10.4.7 (ruby 1.8.2 -
i.e. the default OS X install).
Thanks again for all the pointers and commentary, and I hope to see a
more robust solution in a future version :)
Regards,
Luke.
on 17.07.2006 05:07
An addendum: testing with ruby 1.8.4 from a Locomotive bundle I had handy, reveals my solution does not work on my recent versions of Ruby. I suspect this might be due to more strictness on doing shonky things, such as overriding private methods, which is the basis of my hack. For now I am satisfied (since I am targeting OS X with Ruby 1.8.2), but if any Ruby gurus can suggest a 1.8.4 compatible hack, I'm all ears. Regards, Luke.
on 17.07.2006 10:48
Luke Burton wrote: > necessary methods to tweak the buffer size: > > class OverrideInternetMessageIO < Net::InternetMessageIO > def rbuf_fill > timeout(@read_timeout) { > @rbuf << @socket.sysread(65536) > } > end > end > > > I couldn't make any real recommendations on what the buffers size should > be. I imagine it's a trade off between the OS kernel's buffer size, TCP > packet size, and memory footprint of your application. Do HTTP clients > normally automatically negotiate a buffer? Do they pick one based on > content type? What are the common optimisations, and should net/http > follow them? > > I tested a couple of values and found anything past 65536 bytes started > giving negligible returns, on my G5 running OS X 10.4.7 (ruby 1.8.2 - > i.e. the default OS X install). It would be really interesting to get a record of the actual data sizes read out from each of the calls to sysread(65536) in your modified code. If stdout is free, maybe you could do something like: timeout(@read_timeout) { a << @socket.sysread(65536) puts "sysread #{a.length} bytes" @rbuf << a } just for a few trials, especially to compare the values when your large file in coming in from a LAN, a WAN, and the Internet. The values you see should give you a clue as to what the sysread buffer size ought to be. TCP of course has the "sliding congestion window" mechanism in which it adaptively increases the number of bytes that a peer may send to another peer before it must wait for an acknowledgement. Most of the time, this number may not be larger than 64K (because it's carried in a 16-bit field in the TCP packet header), which explains your observation that 64K is a practical limit. With a large file transfer on a fast and otherwise unloaded network, you should see this value quickly reach and remain at 64K. (This is not something that HTTP clients have to do themselves, to your other question- it's built into TCP.) If your application runs on a LAN, I would expect a lot of benefit from 64K sysreads. Across the Internet, I'd be pretty surprised if you get much improvement from sysreads above 16K (which is also the typical network-driver buffer size for Berkeley-derived kernels like OSX, unless you've tweaked yours). It's rather a surprise that the Ruby code which handles these raw reads is so inefficient that cutting down the number of passes through it makes such a difference. I'm usually pretty surprised when I/O processing in an application is more than a negligible fraction of the network transit time.
on 17.07.2006 11:22
> It would be really interesting to get a record of the actual data sizes > read out from each of the calls to sysread(65536) in your modified code. > If stdout is free, maybe you could do something like: > timeout(@read_timeout) { > a << @socket.sysread(65536) > puts "sysread #{a.length} bytes" > @rbuf << a > } Yecch, sorry, obviously the second line in my code snippet should be: a = @socket.sysread(65536)
on 17.07.2006 16:24
Luke Burton <luke@burton.echidna.id.au> writes: > After making all those changes, we see the following new results for the > 10 MB file transfer from WEBrick: > > user system total real > net/http - big buffer 0.360000 0.390000 0.750000 ( 0.991848) > > That's still twice as slow as a raw TCPSocket, but it's now definitely > in the realm of "usable for large file transfers". Does this improve noticeably if you use the read_nonblock suggestion given earlier in the thread? Say: class OverrideInternetMessageIO < Net::InternetMessageIO def rbuf_fill begin @rbuf << @io.read_nonblock(65536) rescue Errno::EWOULDBLOCK if IO.select([@io], nil, nil, @read_timeout) @rbuf << @io.read_nonblock(65536) else raise Timeout::TimeoutError end end end end As for the appropriate buffer size, for what it's worth apache uses this structure to read into when it's acting as a proxy server and reading someone else's output: char buffer[HUGE_STRING_LEN]; Where HUGE_STRING_LEN is defined in various apr (Apache Portable Runtime) headers as 8192. (In Apcahe 1.3 it was in 'httpd.h') I don't have time to track through the mozilla source to find out what buffer size they use.
on 18.07.2006 00:12
Daniel Martin wrote: > Does this improve noticeably if you use the read_nonblock suggestion > given earlier in the thread? Say: Hi Daniel, I tested this earlier ... but my version of Ruby doesn't have read_nonblock, unfortunately. I haven't had a chance to pull down 1.8.5 and re-test. Additionally, if I were to do so, I'd need to find a new workaround method. As mentioned above, my OverrideInternetMessageIO class does not seem to function in > ruby-1.8.2. So that makes a nice catch-22. Of couse I could hand edit the net/http classes, which would suffice for an academic test. In the real world I can't run around patching people's net/http for them :( Taking a step back for a moment - is there a Ruby bugzilla system or equivalent, where such problems can be logged and prioritised? I would be happy to do the grunt work of submitting the patch, now that the solution is pretty clear. Regards, Luke.
on 18.07.2006 06:10
In article <17b6c68ca7a0c35209ac51f136506734@ruby-forum.com>, Luke Burton <luke@burton.echidna.id.au> writes: > I tested this earlier ... but my version of Ruby doesn't have > read_nonblock, unfortunately. I haven't had a chance to pull down 1.8.5 > and re-test. This should work with 1.8.2. def rbuf_fill if IO.select([@io], nil, nil, @read_timeout) @rbuf << @io.sysread(16384) else raise Timeout::TimeoutError end end The enlarging buffer size should work well until in-kernel TCP buffer is large enough to store data receiving between successive rbuf_fill. If the in-kernel buffer is not large enough, the overhead should be reduced. I think timeout() is the first candidate to remove.