File::BINARY does not behave as advertised. How do I help to fix this?

I noticed some anamolous behavior opening files with the file mode
flags. If the default internal encoding is set, when using the file
mode flags to open a file, the file’s external encoding is set to
something other than ASCII-8BIT, which can cause binary file
operations (such as Marshal.dump) to blow up.

Please forgive the long message, and let me know if it would be more
appropriate to open some issues, but since I’ve never posted an Ruby
issue before, I wanted to make sure I was not being naive and that I
understand what is really going on.

Here is a simple example of what I mean:

#/usr/bin/env ruby
Encoding.default_internal = ‘UTF-8’

File.open(‘test’,File::CREAT | File::RDWR | File::BINARY) do |f|
# This should be ASCII-8BIT, right? At least according to io.c, line
10792
puts “Integer Flags Encoding: #{f.external_encoding.to_s}”
end

File.open(‘test2’,‘w+b’) do |f|
# This actually is ASCII-8BIT
puts “String Mode Encoding: #{f.external_encoding.to_s}”
end

And running it:

file-binary-test cpope$ ruby simple_file_test.rb
Integer Flags Encoding: UTF-8
String Mode Encoding: ASCII-8BIT

I don’t think that is the intended behavior. If I look at IO.c in the
latest Ruby code snapshot:

— io.c (last night’s snapshot)
10792 #ifndef O_BINARY
10793 # define O_BINARY 0
10794 #endif
10795 /* disable line code conversion and make ASCII-8BIT */
10796 rb_file_const(“BINARY”, INT2FIX(O_BINARY));

As one can see above, first of all, File::BINARY will be zero in every
case that I can suss out in the Ruby source code - there is nowhere in
the 1.9.x codebase I can see that defines O_BINARY to be anything but
zero, and as was empirically demonstrated above, opening a file with
this constant will not set the encoding to ASCII-8BIT. What is really
bad about this is when using the integer flags to open a file, there
is not a good way to check if a developer intended for a it to be
opened as a binary file. There is, of course, a way to manually
specify the encoding for a file opened with the integer flags, which
would be the right thing to do in the case above.

So my first question is: How do we address this deficiency? I can’t
think of a better way than to document the ‘catch’ with using the
integer flags in this case. I’ve noticed that many of the File
constants aren’t documented, so I’m happy to give it a shot if that’s
the best approach.

But this brings us to another issue. There are some places in the Ruby
standard library that depend on File::BINARY actually opening a file
suitable for writing Binary data. For example, in PStore:

At the top of lib/pstore.rb
96 class PStore
97 binmode = defined?(File::BINARY) ? File::BINARY : 0
98 RDWR_ACCESS = File::RDWR | File::CREAT | binmode
99 RD_ACCESS = File::RDONLY | binmode
100 WR_ACCESS = File::WRONLY | File::CREAT | File::TRUNC | binmode

These flags are passed to the bottlenecks that open the data file for
reading and writing. Because it is using the integer constants to
define how the file is opened, it’s not hard to make PStore blow up in
the course or normal operation. To conserve space, I’ve put some
sample code in this gist: PStore File Encoding Issue · GitHub

So my second thought is that this is an issue with the PStore library,
and that it would be appropriate to modify the file bottlenecks so
they explicitly specify ASCII-8BIT as the file encoding. Is there any
reason that I’m off target and I should not log that as an issue with
a test and a patch?

Apologies in advance if I am using the wrong forum or am totally
off-base with my questions.

Thank you for your time,
Cameron

On Sep 12, 2011, at 09:47 , Cameron Pope wrote:

Please forgive the long message, and let me know if it would be more
appropriate to open some issues, but since I’ve never posted an Ruby
issue before, I wanted to make sure I was not being naive and that I
understand what is really going on.

Please send this to ruby-core@

Hi,

In short, the comment was wrong. O_BINARY only disables newline
conversion, does not change encoding of the output. I recommend “b”
file mode, which is smarter. Whether we should update PStore is
controversial. The discussion should move to ruby-core.

          matz.

In message “Re: File::BINARY does not behave as advertised. How do I
help to fix this?”
on Tue, 13 Sep 2011 01:47:11 +0900, Cameron Pope
[email protected] writes:
|
|I noticed some anamolous behavior opening files with the file mode
|flags. If the default internal encoding is set, when using the file
|mode flags to open a file, the file’s external encoding is set to
|something other than ASCII-8BIT, which can cause binary file
|operations (such as Marshal.dump) to blow up.
|
|Please forgive the long message, and let me know if it would be more
|appropriate to open some issues, but since I’ve never posted an Ruby
|issue before, I wanted to make sure I was not being naive and that I
|understand what is really going on.
|
|Here is a simple example of what I mean:
|
| #/usr/bin/env ruby
| Encoding.default_internal = ‘UTF-8’
|
| File.open(‘test’,File::CREAT | File::RDWR | File::BINARY) do |f|
| # This should be ASCII-8BIT, right? At least according to io.c, line 10792
| puts “Integer Flags Encoding: #{f.external_encoding.to_s}”
| end
|
| File.open(‘test2’,‘w+b’) do |f|
| # This actually is ASCII-8BIT
| puts “String Mode Encoding: #{f.external_encoding.to_s}”
| end
|
|And running it:
|
| file-binary-test cpope$ ruby simple_file_test.rb
| Integer Flags Encoding: UTF-8
| String Mode Encoding: ASCII-8BIT
|
|I don’t think that is the intended behavior. If I look at IO.c in the
|latest Ruby code snapshot:
|
| — io.c (last night’s snapshot)
| 10792 #ifndef O_BINARY
| 10793 # define O_BINARY 0
| 10794 #endif
| 10795 /* disable line code conversion and make ASCII-8BIT */
| 10796 rb_file_const(“BINARY”, INT2FIX(O_BINARY));
|
|As one can see above, first of all, File::BINARY will be zero in every
|case that I can suss out in the Ruby source code - there is nowhere in
|the 1.9.x codebase I can see that defines O_BINARY to be anything but
|zero, and as was empirically demonstrated above, opening a file with
|this constant will not set the encoding to ASCII-8BIT. What is really
|bad about this is when using the integer flags to open a file, there
|is not a good way to check if a developer intended for a it to be
|opened as a binary file. There is, of course, a way to manually
|specify the encoding for a file opened with the integer flags, which
|would be the right thing to do in the case above.
|
|So my first question is: How do we address this deficiency? I can’t
|think of a better way than to document the ‘catch’ with using the
|integer flags in this case. I’ve noticed that many of the File
|constants aren’t documented, so I’m happy to give it a shot if that’s
|the best approach.
|
|But this brings us to another issue. There are some places in the Ruby
|standard library that depend on File::BINARY actually opening a file
|suitable for writing Binary data. For example, in PStore:
|
|At the top of lib/pstore.rb
| 96 class PStore
| 97 binmode = defined?(File::BINARY) ? File::BINARY : 0
| 98 RDWR_ACCESS = File::RDWR | File::CREAT | binmode
| 99 RD_ACCESS = File::RDONLY | binmode
| 100 WR_ACCESS = File::WRONLY | File::CREAT | File::TRUNC | binmode
|
|These flags are passed to the bottlenecks that open the data file for
|reading and writing. Because it is using the integer constants to
|define how the file is opened, it’s not hard to make PStore blow up in
|the course or normal operation. To conserve space, I’ve put some
|sample code in this gist: PStore File Encoding Issue · GitHub
|
|So my second thought is that this is an issue with the PStore library,
|and that it would be appropriate to modify the file bottlenecks so
|they explicitly specify ASCII-8BIT as the file encoding. Is there any
|reason that I’m off target and I should not log that as an issue with
|a test and a patch?
|
|Apologies in advance if I am using the wrong forum or am totally
|off-base with my questions.
|
|Thank you for your time,
|Cameron