Forum: Ruby-core [ruby-trunk - Feature #6492][Open] Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip response

Posted by drbrain (Eric Hodel) (Guest)
on 2012-05-25 00:38
(Received via mailing list)
Issue #6492 has been reported by drbrain (Eric Hodel).

----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492

Author: drbrain (Eric Hodel)
Status: Open
Priority: Normal
Assignee:
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by drbrain (Eric Hodel) (Guest)
on 2012-05-25 02:15
(Received via mailing list)
Issue #6492 has been updated by drbrain (Eric Hodel).

File net.http.inflate_by_default.patch added

Opps, forgot patch.
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-26810

Author: drbrain (Eric Hodel)
Status: Open
Priority: Normal
Assignee:
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by naruse (Yui NARUSE) (Guest)
on 2012-05-25 03:26
(Received via mailing list)
Issue #6492 has been updated by naruse (Yui NARUSE).


I agree with the concept of the patch, but

> +  ensure
> +    @socket.finish if Inflater === @socket

When @socket is Socket-like object, the object should behave like a 
socket.
Inflater#finish should be Inflater#shutdown or Inflater#close and this 
if-clause is not needed.

> +        inflater.read len, dest
>        ensure
>          total += len
>          @socket.read 2   # \r\n  def

this variable inflater is confusing with the inflater method.

>+    def read clen, dest, ignore_eof = false
> +      temp_dest = ''
> +
> +      @socket.read clen, temp_dest, ignore_eof
> +
> +      dest << @inflate.inflate(temp_dest)
> +    end

This read method return a string whose length is not clen, this is 
wrong.
Other IO-like object for example Zlib::GzipReader returns a string whose 
length is clen.
So Inflater should have a internal buffer and return the string whose 
length is just clen.
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-26812

Author: drbrain (Eric Hodel)
Status: Open
Priority: Normal
Assignee:
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by drbrain (Eric Hodel) (Guest)
on 2012-05-25 08:16
(Received via mailing list)
Issue #6492 has been updated by drbrain (Eric Hodel).


naruse (Yui NARUSE) wrote:
> I agree with the concept of the patch, but
>
> > +  ensure
> > +    @socket.finish if Inflater === @socket
>
> When @socket is Socket-like object, the object should behave like a socket.
> Inflater#finish should be Inflater#shutdown or Inflater#close and this if-clause 
is not needed.

I considered this, but calling #close would also terminate a persistent 
connection which is undesirable.

I don't see a way to cleanly finish the inflate stream without an 
if-clause.

> […]

I will submit a new patch that includes fixes for the rest of your 
comments.

----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-26820

Author: drbrain (Eric Hodel)
Status: Open
Priority: Normal
Assignee:
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by naruse (Yui NARUSE) (Guest)
on 2012-05-25 08:39
(Received via mailing list)
Issue #6492 has been updated by naruse (Yui NARUSE).


drbrain (Eric Hodel) wrote:
>
> I don't see a way to cleanly finish the inflate stream without an if-clause.

I see,

If Inflater's @socket.read returns nil or a string shorter than clen, it 
means the input is finished and @inflate can finish.
So at that time, you can call @inflate.finish.

But on persistent connection current simple read all may eat another 
content, mustn't it?
I suspect they must see content body.
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-26821

Author: drbrain (Eric Hodel)
Status: Open
Priority: Normal
Assignee:
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by drbrain (Eric Hodel) (Guest)
on 2012-05-25 19:06
(Received via mailing list)
Issue #6492 has been updated by drbrain (Eric Hodel).


naruse (Yui NARUSE) wrote:
> If Inflater's @socket.read returns nil or a string shorter than clen, it means 
the input is finished and @inflate can finish.
> So at that time, you can call @inflate.finish.

I hadn't thought of that, I will implement it.

> But on persistent connection current simple read all may eat another content, 
mustn't it?
> I suspect they must see content body.

The response must contain Content-Length or Transfer-Encoding: chunked 
to be persistent, so this is OK. Net::HTTP already handles this.
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-26827

Author: drbrain (Eric Hodel)
Status: Open
Priority: Normal
Assignee:
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by drbrain (Eric Hodel) (Guest)
on 2012-05-30 01:46
(Received via mailing list)
Issue #6492 has been updated by drbrain (Eric Hodel).

File net.http.inflate_by_default.2.patch added

I've updated this patch.  Upon working with the code again and looking 
at RFC 2616, I have made the following changes:

> naruse (Yui NARUSE) wrote:
> > If Inflater's @socket.read returns nil or a string shorter than clen, it means 
the input is finished and @inflate can finish.
> > So at that time, you can call @inflate.finish.
>
> I hadn't thought of that, I will implement it.

Due to read_chunked, and persistent connections I don't see how to make 
this work.

When reading the body's Content-Length or Content-Range this strategy 
would work, but read_chunked reads multiple chunks of the compressed 
body and indicates the input to inflate is finished with a terminating 
"0\r\n\r\n" on the raw socket.  Adding this communication between the 
raw socket and Inflater seems worse.

When the connection is persistent, #read should only return nil when the 
connection was abnormally terminated in which case we will throw away 
the body.

For #read_all, this would work.

Due to all the special cases, I changed Net::HTTPResponse#inflater to 
yield the Inflater and automatically clean it up.  This keeps the 
special information about cleanup out of #read_body_0

> this variable inflater is confusing with the inflater method.

In Net::HTTPResponse#read_chunked, the confusing "inflater" variable has 
been replaced with "chunk_data_io" which comes from RFC 2616 section 
3.6.1.

> This read method return a string whose length is not clen, this is wrong.
> Other IO-like object for example Zlib::GzipReader returns a string whose length 
is clen.
> So Inflater should have a internal buffer and return the string whose length is 
just clen.

Upon review, I think this is OK.

RFC 2616 specifies that Content-Length and Content-Range (which are used 
for clen) refer to the transferred bytes and are used to read the 
correct amount of data from the response to maintain the persistent 
connection.  Net::HTTPResponse#read_body doesn't allow the user to 
specify the amount of bytes they wish to read, so returning more data to 
the user is OK.

I have made an additional change beyond your review:

I've added a Net::ReadAdapter to the Inflater to stream of the encoded 
response body through inflate without buffering it all.  This will 
reduce memory consumption for large responses.
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-26895

Author: drbrain (Eric Hodel)
Status: Open
Priority: Normal
Assignee:
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by mame (Yusuke Endoh) (Guest)
on 2012-05-31 16:11
(Received via mailing list)
Issue #6492 has been updated by mame (Yusuke Endoh).

Status changed from Open to Assigned
Assignee set to naruse (Yui NARUSE)


----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-26917

Author: drbrain (Eric Hodel)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by naruse (Yui NARUSE) (Guest)
on 2012-06-06 21:10
(Received via mailing list)
Issue #6492 has been updated by naruse (Yui NARUSE).


drbrain (Eric Hodel) wrote:
> I've updated this patch.  Upon working with the code again and looking at RFC 
2616, I have made the following changes:
>
> > naruse (Yui NARUSE) wrote:
> > > If Inflater's @socket.read returns nil or a string shorter than clen, it 
means the input is finished and @inflate can finish.
> > > So at that time, you can call @inflate.finish.
> >
> > I hadn't thought of that, I will implement it.
>
> Due to read_chunked, and persistent connections I don't see how to make this 
work.

Yeah, I thought adding an another layer, transport encoding decoder, but 
it is just an idea and I don't suggest it.

> RFC 2616 specifies that Content-Length and Content-Range (which are used for 
clen) refer to the transferred bytes and are used to read the correct amount of 
data from the response to maintain the persistent connection. 
Net::HTTPResponse#read_body doesn't allow the user to specify the amount of bytes 
they wish to read, so returning more data to the user is OK.
Your patch hides content-encoding layer.
Content-Length and Content-Range are the member of the layer.

Net:HTTPRequest#read is on the layer.
A user of net/http can't know whether a request used content-encoding or 
not.
On such situation, it can't be a reason why hidden Content-Encoding 
layer effects the behavior of read method.

> I have made an additional change beyond your review:
>
> I've added a Net::ReadAdapter to the Inflater to stream of the encoded response 
body through inflate without buffering it all.  This will reduce memory 
consumption for large responses.

ok.
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-27041

Author: drbrain (Eric Hodel)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by drbrain (Eric Hodel) (Guest)
on 2012-06-08 02:48
(Received via mailing list)
Issue #6492 has been updated by drbrain (Eric Hodel).


naruse (Yui NARUSE) wrote:
> drbrain (Eric Hodel) wrote:
> > Due to read_chunked, and persistent connections I don't see how to make this 
work.
>
> Yeah, I thought adding an another layer, transport encoding decoder, but it is 
just an idea and I don't suggest it.

I had this idea too, but it would be a larger change. I hope we can 
create something simpler, but also usable.

>
> Net:HTTPRequest#read is on the layer.

I'm confused.

There is Net::BufferedIO#read, but no Net::HTTPResponse#read.  There is 
Net::HTTPResponse#read_body which lets you read the entire body or 
chunks of unknown size.

I don't see a way for the user to say "read 10 bytes of the response 
body" without manually buffering:

  require 'net/http'

  req = Net::HTTP::Get.new '/'

  body_part = Net::HTTP.start 'localhost' do |http|
    buffer = ''
    target_size = 10

    http.request req do |res|
      res.read_body do |chunk|
        break if buffer.length == target_size

        buffer << chunk[0, target_size - buffer.length]
      end
    end

    buffer
  end

  p body_part

Since Net::HTTPResponse is not usable as an IO I don't think IO-like 
behavior should apply to Net::HTTPResponse::Inflater.

> A user of net/http can't know whether a request used content-encoding or not.

I am unsure what you mean by "can't".

Do you mean "a user of net/http must be able to tell content-encoding 
was present"?

When this patch is combined with #6494 they will not be able to know 
whether a request used content-encoding or not.  I think this is good, 
the user should not have to worry about how the bytes were transported. 
(This behavior matches Net::HTTP#get).

If we want the user to be able to handle content-encoding themselves I 
think adding a Net::HTTP#compression = false (which will disable both 
#6492 and #6494) would be best.  We can also add an indicator on 
Net::HTTPResponse that decompression was performed.

For Content-Length with Content-Encoding, the Content-Length will be 
invalid.  I think this is OK because RFC 2616 doesn't contain an 
indicator of the decoded length and the user is most likely interested 
in the decoded body.

Content-Range with Content-Encoding requires special handling.  The 
compressed stream may start anywhere in the underlying block.  (For a 
deflate-based stream the user would need to manually reconstruct the 
complete response in order to inflate it.)  I think such users should 
disable compression.

> On such situation, it can't be a reason why hidden Content-Encoding layer 
effects the behavior of read method.

I agree that in RFC 2616 that Content-Encoding, Content-Length and 
Content-Range are all on the same layer, but without an IO-like 
interface for the Net::HTTPResponse body I don't think a restriction on 
the behavior of the read method should apply.  Since this API is 
entirely internal, I think it is OK if a future addition to the API 
needs to add buffering to be IO-like.
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-27086

Author: drbrain (Eric Hodel)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by naruse (Yui NARUSE) (Guest)
on 2012-06-08 16:08
(Received via mailing list)
Issue #6492 has been updated by naruse (Yui NARUSE).


drbrain (Eric Hodel) wrote:
> > > > So Inflater should have a internal buffer and return the string whose 
length is just clen.
> I'm confused.
> (snip)
> Since Net::HTTPResponse is not usable as an IO I don't think IO-like behavior 
should apply to Net::HTTPResponse::Inflater.

Ah, yes, you are correct, it is only for Net::HTTPResponse::Inflater.

> > A user of net/http can't know whether a request used content-encoding or not.
>
> I am unsure what you mean by "can't".
>
> Do you mean "a user of net/http must be able to tell content-encoding was 
present"?

Yes.

> When this patch is combined with #6494 they will not be able to know whether a 
request used content-encoding or not.  I think this is good, the user should not 
have to worry about how the bytes were transported.  (This behavior matches 
Net::HTTP#get).

Yeah, I think it is acceptable.

> If we want the user to be able to handle content-encoding themselves I think 
adding a Net::HTTP#compression = false (which will disable both #6492 and #6494) 
would be best.  We can also add an indicator on Net::HTTPResponse that 
decompression was performed.

Someone may want such function, but it is not required until someone 
actually request.

> For Content-Length with Content-Encoding, the Content-Length will be invalid.  I 
think this is OK because RFC 2616 doesn't contain an indicator of the decoded 
length and the user is most likely interested in the decoded body.

It is OK for an HTTP client itself because Content-Length is for the 
reader of socket.
A client can know how many bytes should it read by Content-Length.
For this purpose, Content-Length must show the compressed size.

> Content-Range with Content-Encoding requires special handling.  The compressed 
stream may start anywhere in the underlying block.  (For a deflate-based stream 
the user would need to manually reconstruct the complete response in order to 
inflate it.)  I think such users should disable compression.

So on range response with content-encoding, the response won't 
uncompress the body
and keep Content-Encoding field, right?
If so, I agree.

> > On such situation, it can't be a reason why hidden Content-Encoding layer 
effects the behavior of read method.
>
> I agree that in RFC 2616 that Content-Encoding, Content-Length and Content-Range 
are all on the same layer, but without an IO-like interface for the 
Net::HTTPResponse body I don't think a restriction on the behavior of the read 
method should apply.  Since this API is entirely internal, I think it is OK if a 
future addition to the API needs to add buffering to be IO-like.

OK, I agree if the read method has a comment which explain it breaks the 
manner of IO-like object
because it is internal API.
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-27101

Author: drbrain (Eric Hodel)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by drbrain (Eric Hodel) (Guest)
on 2012-06-08 22:48
(Received via mailing list)
Issue #6492 has been updated by drbrain (Eric Hodel).


naruse (Yui NARUSE) wrote:
> > When this patch is combined with #6494 they will not be able to know whether a 
request used content-encoding or not.  I think this is good, the user should not 
have to worry about how the bytes were transported.  (This behavior matches 
Net::HTTP#get).
>
> Yeah, I think it is acceptable.

Ok.

> > If we want the user to be able to handle content-encoding themselves I think 
adding a Net::HTTP#compression = false (which will disable both #6492 and #6494) 
would be best.  We can also add an indicator on Net::HTTPResponse that 
decompression was performed.
>
> Someone may want such function, but it is not required until someone actually 
request.

I will make a future patch, I need such a feature to handle broken 
deflate streams in mechanize.

> > Content-Range with Content-Encoding requires special handling.  The compressed 
stream may start anywhere in the underlying block.  (For a deflate-based stream 
the user would need to manually reconstruct the complete response in order to 
inflate it.)  I think such users should disable compression.
>
> So on range response with content-encoding, the response won't uncompress the 
body
> and keep Content-Encoding field, right?
> If so, I agree.

Ok, I will update the patch

> > > On such situation, it can't be a reason why hidden Content-Encoding layer 
effects the behavior of read method.
> >
> > I agree that in RFC 2616 that Content-Encoding, Content-Length and 
Content-Range are all on the same layer, but without an IO-like interface for the 
Net::HTTPResponse body I don't think a restriction on the behavior of the read 
method should apply.  Since this API is entirely internal, I think it is OK if a 
future addition to the API needs to add buffering to be IO-like.
>
> OK, I agree if the read method has a comment which explain it breaks the manner 
of IO-like object
> because it is internal API.

Ok, I will update the patch
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-27106

Author: drbrain (Eric Hodel)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by drbrain (Eric Hodel) (Guest)
on 2012-06-08 23:48
(Received via mailing list)
Issue #6492 has been updated by drbrain (Eric Hodel).

File net.http.inflate_by_default.3.patch added

This patch adds a comment to Net::HTTPResponse::Inflater#read discussing 
returning more bytes than clen and a link to this ticket.

This patch ignores Content-Encoding when Content-Range is present.
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-27107

Author: drbrain (Eric Hodel)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by Eric Wong (Guest)
on 2012-06-09 02:29
(Received via mailing list)
"drbrain (Eric Hodel)" <drbrain@segment7.net> wrote:
> Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
> https://bugs.ruby-lang.org/issues/6492

I like Net::HTTP being able to inflate compressed responses.

However, I think doing this by default is exploitable by an evil server.
A server could compress a huge file of zeroes to trigger an
out-of-memory conditions in existing code that uses Net::HTTP.

100M compresses to 100K for me:

$ dd if=/dev/zero bs=1M count=100 | gzip -c | wc -c
101791
Posted by Eric Hodel (Guest)
on 2012-06-09 03:16
(Received via mailing list)
On Jun 8, 2012, at 5:28 PM, Eric Wong <normalperson@yhbt.net> wrote:

> I like Net::HTTP being able to inflate compressed responses.
>
> However, I think doing this by default is exploitable by an evil server.
> A server could compress a huge file of zeroes to trigger an
> out-of-memory conditions in existing code that uses Net::HTTP.

Net::HTTP#get does this by default already, this patch (and #6494) make 
this the default for all requests.

If you aren't using the API to handle a compressed 100MB request 
(Net::HTTPResponse#read_body with a block) you probably can't handle an 
raw 100MB response, so what is the difference besides bandwidth cost of 
the server?
Posted by Eric Wong (Guest)
on 2012-06-09 07:05
(Received via mailing list)
Eric Hodel <drbrain@segment7.net> wrote:
> On Jun 8, 2012, at 5:28 PM, Eric Wong <normalperson@yhbt.net> wrote:
>
> > I like Net::HTTP being able to inflate compressed responses.
> >
> > However, I think doing this by default is exploitable by an evil server.
> > A server could compress a huge file of zeroes to trigger an
> > out-of-memory conditions in existing code that uses Net::HTTP.

> Net::HTTP#get does this by default already, this patch (and #6494)
> make this the default for all requests.

I've always considered Net::HTTP#get (or anything where slurping is done
blindly) dangerous when talking to untrusted servers regardless of gzip.

> If you aren't using the API to handle a compressed 100MB request
> (Net::HTTPResponse#read_body with a block) you probably can't handle
> an raw 100MB response, so what is the difference besides bandwidth
> cost of the server?

With your patch, I'm getting 16M chunks from read_body.  Maybe on newer
systems, 16M is "safe" to slurp in memory.  I think it's too big, but I
may also be a dinosaur :)

Also, HTTP servers may blindly send Content-Encoding:gzip data
regardless of whether the client requested with Accept-Encoding:gzip or
not.  I seem to recall reading of a major website that forces gzip on
visitors regardless of their Accept-Encoding:.

------------------------------ 8< -----------------------------
require 'uri'
require 'net/http'

# feel free to use this URL for testing
uri = URI('http://yhbt.net/x')

Net::HTTP.start(uri.host, uri.port) do |http|
  request = Net::HTTP::Get.new(uri.request_uri)
  request["Accept-Encoding"] = "gzip"
  http.request request do |response|
    response.read_body do |chunk|
      p [ chunk.bytesize ]
    end
  end
end
------------------------------ 8< -----------------------------

I only used "gzip -9" to generate this test example, I'm not sure if
there are ways to use zlib to compress even more aggressively.

Achieving bzip2-level compression ratios would be very scary :)
  dd if=/dev/zero bs=1M count=1000 | bzip2 -c -9 | wc -c
  => 753
Posted by naruse (Yui NARUSE) (Guest)
on 2012-06-14 22:45
(Received via mailing list)
Issue #6492 has been updated by naruse (Yui NARUSE).


How about adding a new attribute which limits the size of reading body?
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-27255

Author: drbrain (Eric Hodel)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by Eric Wong (Guest)
on 2012-06-15 01:19
(Received via mailing list)
"naruse (Yui NARUSE)" <naruse@airemix.jp> wrote:
> How about adding a new attribute which limits the size of reading body?

Maybe.  I'm not familiar with zlib internals, but does it need to
allocate that memory internally even if it's not returned to the
callers?

Perhaps adding an attribute to toggle transparent inflate and
documenting it with a warning about possible memory usage is
the way to go.
Posted by naruse (Yui NARUSE) (Guest)
on 2012-06-15 03:27
(Received via mailing list)
Issue #6492 has been updated by naruse (Yui NARUSE).


normalperson (Eric Wong) wrote:
> "naruse (Yui NARUSE)" <naruse@airemix.jp> wrote:
>  > How about adding a new attribute which limits the size of reading body?
>
>  Maybe.  I'm not familiar with zlib internals, but does it need to
>  allocate that memory internally even if it's not returned to the
>  callers?

Yeah, currently you are correct.
zlib itself has such mechanism, but ext/zlib doesn't.
I'm inspecting deeper now.
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-27257

Author: drbrain (Eric Hodel)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by drbrain (Eric Hodel) (Guest)
on 2012-07-10 22:30
(Received via mailing list)
Issue #6492 has been updated by drbrain (Eric Hodel).

File net.http.inflate_by_default.4.patch added

This patch uses streaming zlib from #6612 which limits the size of a 
chunk to 16KB (the maximum size of the zlib buffer) for streaming output 
without the potential for DOS due to large memory growth.

running the test from note 15 shows all 16KB chunk byte sizes.
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-27930

Author: drbrain (Eric Hodel)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by naruse (Yui NARUSE) (Guest)
on 2012-07-11 18:55
(Received via mailing list)
Issue #6492 has been updated by naruse (Yui NARUSE).


drbrain (Eric Hodel) wrote:
> This patch uses streaming zlib from #6612 which limits the size of a chunk to 
16KB (the maximum size of the zlib buffer) for streaming output without the 
potential for DOS due to large memory growth.
>
> running the test from note 15 shows all 16KB chunk byte sizes.

OK, commit it!
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-27944

Author: drbrain (Eric Hodel)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by drbrain (Eric Hodel) (Guest)
on 2012-07-20 01:20
(Received via mailing list)
Issue #6492 has been updated by drbrain (Eric Hodel).

Status changed from Assigned to Closed

Committed in r36473
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-28216

Author: drbrain (Eric Hodel)
Status: Closed
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Posted by uggsoutlet (uggsoutlet uggsoutlet) (Guest)
on 2012-11-22 07:33
(Received via mailing list)
Issue #6492 has been updated by uggsoutlet (uggsoutlet uggsoutlet).


=begin
would be an excellent rule. Such an attitude would emphasize  ((<uggs on 
sale|URL:http://www.gooduggboots.org/>)) sharply the value of life. 
Every day we should with gentleness, vigor, hold ((<cheap ugg 
boots|URL:http://www.gooduggboots.org/>)) the heart of thanksgiving to 
life. But when the time for endless days, months and years passed in 
((<uggs outlet|URL:http://www.gooduggboots.org/>)) front of us, we are 
often not the seed feeling. Of course, there is also " eat, drink, enjoy 
.458gyu854
=end
----------------------------------------
Feature #6492: Inflate all HTTP Content-Encoding: deflate, gzip, x-gzip 
responses by default
https://bugs.ruby-lang.org/issues/6492#change-33489

Author: drbrain (Eric Hodel)
Status: Closed
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: lib
Target version: 2.0.0


=begin
This patch moves the compression-handling code from Net::HTTP#get to 
Net::HTTPResponse to allow decompression to occur by default on any 
response body.  (A future patch will set the Accept-Encoding on all 
requests that allow response bodies by default.)

Instead of having separate decompression code for deflate and 
gzip-encoded responses, (({Zlib::Inflate.new(32 + Zlib::MAX_WBITS)})) is 
used which automatically detects and inflated gzip-wrapped streams which 
allows for simpler processing of gzip bodies (no need to create a 
StringIO).
=end
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.