on 2014-08-31 00:04
(Received via mailing list)
Issue #10101 has been updated by cremno phobia.

`read` returns a string with external encoding. In your case it seems to
be `UTF-8`. The encodings of the given `IO` object are ignored. Using
`` doesn't work either, by the way. It still
ignores the b`, but as a workaround you can change the encoding of the
returned string, pass `external_encoding: Encoding::ASCII_8BIT` as new
argument, call `String#bytesize`, etc.

After some rearranging and duplicating of the remaining two cases, I
can't say why `each_byte` *sometimes* fails. But with the following
lines, `[-2048, 1]` (2048 looks interesting) is printed by `f_gz.rewind`
when it fails.
  p args

* Author: Rafael Manzo
* Status: Open
* Priority: Normal
* Category: ext
* ruby -v: ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux]
* Backport: 2.0.0: REQUIRED, 2.1: REQUIRED
The methods `read`, `readbyte` and `each_byte` are producing different
outputs. Comparing with the unziped file, only the result of readbyte is
correct according to the size but comparing byte per byte with the
original file sometimes gives differences at the same positions.

This part of the differences I couldn't reproduce in a way that I could
share on the internet because the original file is a magnetic resonance
image subject to confidentiality.

But fortunately I was able to reproduce the bug on input size. I've
attached a script that illustrates the problem and here is the link for
the file that I've used for the following sample output:

Sorry about the size, but I couldn't produce a smaller file.

[manzo@WALL-A gz_debug]$ ruby -v
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-linux]
[manzo@WALL-A gz_debug]$ ruby test1.rb sample.gz
Size of read: 45102570
Size of each_byte: 4668
Size of readbyte: 45158752

I hope I'm right on this report and thank you a lot for your time!

test1.rb (316 Bytes)
