I noticed some anamolous behavior opening files with the file mode
flags. If the default internal encoding is set, when using the file
mode flags to open a file, the file’s external encoding is set to
something other than ASCII-8BIT, which can cause binary file
operations (such as Marshal.dump) to blow up.
Please forgive the long message, and let me know if it would be more
appropriate to open some issues, but since I’ve never posted an Ruby
issue before, I wanted to make sure I was not being naive and that I
understand what is really going on.
Here is a simple example of what I mean:
#/usr/bin/env ruby
Encoding.default_internal = ‘UTF-8’
File.open(‘test’,File::CREAT | File::RDWR | File::BINARY) do |f|
# This should be ASCII-8BIT, right? At least according to io.c, line
10792
puts “Integer Flags Encoding: #{f.external_encoding.to_s}”
end
File.open(‘test2’,‘w+b’) do |f|
# This actually is ASCII-8BIT
puts “String Mode Encoding: #{f.external_encoding.to_s}”
end
And running it:
file-binary-test cpope$ ruby simple_file_test.rb
Integer Flags Encoding: UTF-8
String Mode Encoding: ASCII-8BIT
I don’t think that is the intended behavior. If I look at IO.c in the
latest Ruby code snapshot:
— io.c (last night’s snapshot)
10792 #ifndef O_BINARY
10793 # define O_BINARY 0
10794 #endif
10795 /* disable line code conversion and make ASCII-8BIT */
10796 rb_file_const(“BINARY”, INT2FIX(O_BINARY));
As one can see above, first of all, File::BINARY will be zero in every
case that I can suss out in the Ruby source code - there is nowhere in
the 1.9.x codebase I can see that defines O_BINARY to be anything but
zero, and as was empirically demonstrated above, opening a file with
this constant will not set the encoding to ASCII-8BIT. What is really
bad about this is when using the integer flags to open a file, there
is not a good way to check if a developer intended for a it to be
opened as a binary file. There is, of course, a way to manually
specify the encoding for a file opened with the integer flags, which
would be the right thing to do in the case above.
So my first question is: How do we address this deficiency? I can’t
think of a better way than to document the ‘catch’ with using the
integer flags in this case. I’ve noticed that many of the File
constants aren’t documented, so I’m happy to give it a shot if that’s
the best approach.
But this brings us to another issue. There are some places in the Ruby
standard library that depend on File::BINARY actually opening a file
suitable for writing Binary data. For example, in PStore:
At the top of lib/pstore.rb
96 class PStore
97 binmode = defined?(File::BINARY) ? File::BINARY : 0
98 RDWR_ACCESS = File::RDWR | File::CREAT | binmode
99 RD_ACCESS = File::RDONLY | binmode
100 WR_ACCESS = File::WRONLY | File::CREAT | File::TRUNC | binmode
These flags are passed to the bottlenecks that open the data file for
reading and writing. Because it is using the integer constants to
define how the file is opened, it’s not hard to make PStore blow up in
the course or normal operation. To conserve space, I’ve put some
sample code in this gist: PStore File Encoding Issue · GitHub
So my second thought is that this is an issue with the PStore library,
and that it would be appropriate to modify the file bottlenecks so
they explicitly specify ASCII-8BIT as the file encoding. Is there any
reason that I’m off target and I should not log that as an issue with
a test and a patch?
Apologies in advance if I am using the wrong forum or am totally
off-base with my questions.
Thank you for your time,
Cameron