Potential bug in Net::HTTP, and tentative patch


#1

Hi all,

I am building a nice little system using Ruby (on Rails), one part of
which uses Net::HTTP to retrieve some data over HTTP. Everything seems
to work fine, but on some requests, I get an EOFError.

As I found out, this problem has already been reported, but without any
answer. See:
http://rubyforge.org/forum/forum.php?thread_id=28826&forum_id=6052

I think I may have traced the problem back to a bug in Net::HTTP. Here
is a two-liner to reproduce the error:

$ irb

require ‘net/http’
=> true

res =
Net::HTTP.get_response(URI.parse(‘http://snapcasa.com/get.aspx?code=1000&size=m&url=www.google.com’))
EOFError: end of file reached
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:133:in
sysread' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:133:inrbuf_fill’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/timeout.rb:56:in
timeout' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/timeout.rb:76:intimeout’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:132:in
rbuf_fill' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:116:inreaduntil’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/protocol.rb:126:in
readline' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2236:inread_chunked’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2216:in
read_body_0' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2182:inread_body’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2207:in
body' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:2146:inreading_body’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:1061:in
request' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:957:inrequest_get’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:380:in
get_response' from /System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:547:instart’
from
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/net/http.rb:379:in
`get_response’
from (irb):2>>

My quick analysis:

In http.rb, line 2236 the “read_chunked” function calls
@socket.readline. This “readline” function, in protocol.rb line 126,
calls readuntil("\n"). This works fine if the data chunk is
“\n-terminated”, but throws an EOFError if it is not.

I am not sure about the underlying standards of chunked http, maybe the
data chunk is supposed to always been \n-terminated, and it may be a
mis-behaving server, but the fact is: I can get the example image fine
with any browser, but not with Net::HTTP.

As a tentative fix, I wrote a patch that catches the EOFError in
read_chunked. You will the patch file attached. With the patch, things
work fine:

$ irb

require ‘net/http’
=> true

res =
Net::HTTP.get_response(URI.parse(‘http://snapcasa.com/get.aspx?code=1000&size=m&url=www.google.com’))
=> #<Net::HTTPOK 200 OK readbody=true>

File.open(‘test.jpg’,‘w’).write res.body
=> 2881

With the patch, the above gives me a perfectly fine JPEG file. However,
I am afraid my current patch, with a big begin…rescue around most of
the body of the read_chunked function, catches the EOFError at level
higher than necessary, which is not good practice…

Anyway, before continuing any further, could someone involved in the
development of Ruby take a look at this, confirm the existence of the
bug, and maybe even come up with a better fix?

PS: please let me know if I posted this in the wrong list, or if I
should open a bug report on some bug tracking system.

Thank you for your help.


#2

Hi,

At Mon, 13 Apr 2009 12:40:32 +0900,
Yves-Eric Martin wrote in [ruby-talk:333704]:

In http.rb, line 2236 the “read_chunked” function calls
@socket.readline. This “readline” function, in protocol.rb line 126,
calls readuntil("\n"). This works fine if the data chunk is
“\n-terminated”, but throws an EOFError if it is not.

Not “\n-terminated”.

According to RFC2616 and RFC2068, chunks consist from
chunk-size and chunk-body, and the chunked-body is terminated
by “0” size chunk.

That is, the response doesn’t seem to follow the RFCs.


#3

Thank you for pointing me to the RFC. Indeed, the response does not
seem RFC-compliant…

Other than my quick and dirty patch, is there a way to tell Net::HTTP
to ignore the EOFError and accept non-compliant input? Again, the point
is that an image, which displays fine in Internet Explorer, Firefox and
Safari, cannot be downloaded with Net::HTTP. While I understand the
“not RFC-compliant” argument, for practical reasons, it does seem a bit
limiting…

Thank you,

PS: I will also contact the administrator of the problem site regarding
this
RFC compliance issue.


Yves-Eric


#4

Works like a charm!

Thank you Nobu for your great help. I owe you a beer.


Yves-Eric


#5

Hi,

At Tue, 14 Apr 2009 12:33:01 +0900,
Yves-Eric Martin wrote in [ruby-talk:333820]:

Other than my quick and dirty patch, is there a way to tell Net::HTTP
to ignore the EOFError and accept non-compliant input? Again, the point
is that an image, which displays fine in Internet Explorer, Firefox and
Safari, cannot be downloaded with Net::HTTP. While I understand the
“not RFC-compliant” argument, for practical reasons, it does seem a bit
limiting…

See rdoc of Net::HTTPResponse#read_body and
Net::HTTP#request_get.

out = “” # or open(destfile, “wb”)
begin
Net::HTTP.get_response(uri) do |res|
res.read_body {|s| out << s}
end
rescue EOFError
end