Potential bug in Net::HTTP, and tentative patch

Hi all,

I am building a nice little system using Ruby (on Rails), one part of
which uses Net::HTTP to retrieve some data over HTTP. Everything seems
to work fine, but on some requests, I get an EOFError.

As I found out, this problem has already been reported, but without any
answer. See:
http://rubyforge.org/forum/forum.php?thread_id=28826&forum_id=6052

I think I may have traced the problem back to a bug in Net::HTTP. Here
is a two-liner to reproduce the error:

$ irb

require ‘net/http’
=> true
res =
Net::HTTP.get_response(URI.parse(‘http://snapcasa.com/get.aspx?code=1000&size=m&url=www.google.com’))
EOFError: end of file reached
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:133:in
sysread' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:133:in rbuf_fill’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/timeout.rb:56:in
timeout' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/timeout.rb:76:in timeout’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:132:in
rbuf_fill' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:116:in readuntil’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:126:in
readline' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2236:in read_chunked’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2216:in
read_body_0' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2182:in read_body’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2207:in
body' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2146:in reading_body’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:1061:in
request' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:957:in request_get’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:380:in
get_response' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:547:in start’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:379:in
`get_response’
from (irb):2>>

My quick analysis:

In http.rb, line 2236 the “read_chunked” function calls
@socket.readline. This “readline” function, in protocol.rb line 126,
calls readuntil(“\n”). This works fine if the data chunk is
“\n-terminated”, but throws an EOFError if it is not.

I am not sure about the underlying standards of chunked http, maybe the
data chunk is supposed to always been \n-terminated, and it may be a
mis-behaving server, but the fact is: I can get the example image fine
with any browser, but not with Net::HTTP.

As a tentative fix, I wrote a patch that catches the EOFError in
read_chunked. You will the patch file attached. With the patch, things
work fine:

$ irb

require ‘net/http’
=> true
res =
Net::HTTP.get_response(URI.parse(‘http://snapcasa.com/get.aspx?code=1000&size=m&url=www.google.com’))
=> #<Net::HTTPOK 200 OK readbody=true>
File.open(‘test.jpg’,‘w’).write res.body
=> 2881

With the patch, the above gives me a perfectly fine JPEG file. However,
I am afraid my current patch, with a big begin…rescue around most of
the body of the read_chunked function, catches the EOFError at level
higher than necessary, which is not good practice…

Anyway, before continuing any further, could someone involved in the
development of Ruby take a look at this, confirm the existence of the
bug, and maybe even come up with a better fix?

PS: please let me know if I posted this in the wrong list, or if I
should open a bug report on some bug tracking system.

Thank you for your help.

Hi,

At Mon, 13 Apr 2009 12:40:32 +0900,
Yves-Eric Martin wrote in [ruby-talk:333704]:

In http.rb, line 2236 the “read_chunked” function calls
@socket.readline. This “readline” function, in protocol.rb line 126,
calls readuntil("\n"). This works fine if the data chunk is
“\n-terminated”, but throws an EOFError if it is not.

Not “\n-terminated”.

According to RFC2616 and RFC2068, chunks consist from
chunk-size and chunk-body, and the chunked-body is terminated
by “0” size chunk.

That is, the response doesn’t seem to follow the RFCs.

Thank you for pointing me to the RFC. Indeed, the response does not
seem RFC-compliant…

Other than my quick and dirty patch, is there a way to tell Net::HTTP
to ignore the EOFError and accept non-compliant input? Again, the point
is that an image, which displays fine in Internet Explorer, Firefox and
Safari, cannot be downloaded with Net::HTTP. While I understand the
“not RFC-compliant” argument, for practical reasons, it does seem a bit
limiting…

Thank you,

PS: I will also contact the administrator of the problem site regarding
this
RFC compliance issue.


Yves-Eric

Works like a charm!

Thank you Nobu for your great help. I owe you a beer.


Yves-Eric

Hi,

At Tue, 14 Apr 2009 12:33:01 +0900,
Yves-Eric Martin wrote in [ruby-talk:333820]:

Other than my quick and dirty patch, is there a way to tell Net::HTTP
to ignore the EOFError and accept non-compliant input? Again, the point
is that an image, which displays fine in Internet Explorer, Firefox and
Safari, cannot be downloaded with Net::HTTP. While I understand the
“not RFC-compliant” argument, for practical reasons, it does seem a bit
limiting…

See rdoc of Net::HTTPResponse#read_body and
Net::HTTP#request_get.

out = “” # or open(destfile, “wb”)
begin
Net::HTTP.get_response(uri) do |res|
res.read_body {|s| out << s}
end
rescue EOFError
end