Forum: Ruby-core Zlib::GzipReader produce different outputs for different methods applied

5cf8f058a4c094bb708174fb43e7a387?d=identicon&s=25 unknown (Guest)
on 2014-08-24 16:43
(Received via mailing list)
Issue #10101 has been updated by Tomoyuki Chikanaga.

Backport changed from 2.0.0: UNKNOWN, 2.1: UNKNOWN to 2.0.0: REQUIRED,
2.1: REQUIRED

Hello, Rafael.
Thank you for your report.

I can reproduce with your sample on 2.0.0p433 and 2.1.3, and it can be
easily reproduced similar case with large gzip'ed file as follows.

    $ dd if=/dev/zero of=foo count=5000
    $ gzip foo
    $ ruby test1.rb foo.gz
    Size of read: 2560000
    Size of each_byte: 2097151
    Size of readbyte: 2560000

In this case, only `each_byte' returns wrong value. I suspect there are
several different cause.
I don't have time to investigate this right now.
And zlib has no maintainer according to
https://bugs.ruby-lang.org/projects/ruby/wiki/Main...
Are there anyone who can handle this?

----------------------------------------
Bug #10101: Zlib::GzipReader produce different outputs for different
methods applied
https://bugs.ruby-lang.org/issues/10101#change-48464

* Author: Rafael Manzo
* Status: Open
* Priority: Normal
* Assignee:
* Category: ext
* Target version:
* ruby -v: ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux]
* Backport: 2.0.0: REQUIRED, 2.1: REQUIRED
----------------------------------------
The methods `read`, `readbyte` and `each_byte` are producing different
outputs. Comparing with the unziped file, only the result of readbyte is
correct according to the size but comparing byte per byte with the
original file sometimes gives differences at the same positions.

This part of the differences I couldn't reproduce in a way that I could
share on the internet because the original file is a magnetic resonance
image subject to confidentiality.

But fortunately I was able to reproduce the bug on input size. I've
attached a script that illustrates the problem and here is the link for
the file that I've used for the following sample output:

https://drive.google.com/file/d/0B3O0CbLN-q0TcmhGR...

Sorry about the size, but I couldn't produce a smaller file.

<code>
[manzo@WALL-A gz_debug]$ ruby -v
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux]
[manzo@WALL-A gz_debug]$ ruby test1.rb sample.gz
Size of read: 45102570
Size of each_byte: 4668
Size of readbyte: 45158752
</code>

I hope I'm right on this report and thank you a lot for your time!


---Files--------------------------------
test1.rb (316 Bytes)
18813f71506ebad74179bf8c5a136696?d=identicon&s=25 Eric Wong (Guest)
on 2014-08-31 01:57
(Received via mailing list)
nagachika00@gmail.com wrote:
> I don't have time to investigate this right now.
> And zlib has no maintainer according to
> https://bugs.ruby-lang.org/projects/ruby/wiki/Main...
> Are there anyone who can handle this?

Hi, r47327 should fix this:
------------------------------------------------------------------------
r47327 | normal | 2014-08-30 23:53:28 +0000 (Sat, 30 Aug 2014) | 18
lines

zlib: GzipReader#rewind preserves ZSTREAM_FLAG_GZFILE

* ext/zlib/zlib.c (gzfile_reset): preserve ZSTREAM_FLAG_GZFILE
  [Bug #10101]

* test/zlib/test_zlib.rb (test_rewind): test each_byte

We must preserve the ZSTREAM_FLAG_GZFILE flag to prevent
zstream_detach_buffer from:

a) returning Qnil and breaking out of the `each_byte' loop
b) yielding a large string to each_byte

Note: the test case in bug report takes a long time.  I found this
bug because I noticed the massive time descrepancy between
`each_byte' and `readbyte' loop before this patch.  With this patch,
`each_byte' and `readbyte' both take very long.
This topic is locked and can not be replied to.