Abinoam Jr. wrote in post #1150267:
Hi James F.,
The main source of problems I’ve had with encondings are related to
the string being a byte representation of some enconding but being
“marked” as another encoding.
If my default encoding is utf-8, and if I open an utf-16 file it will
not “auto-detect” it as utf-16.
The strings coming from that file will be marked as utf-8 but their
byte representation will be utf-16.
Ah hah. That is what I’m seeing in the example I posted above. If I
add a line to inspect the string read in by popen3():
require ‘open3’
Open3.popen3(‘ruby do_stuff.rb’) do |stdin, stdout, stderr, wait_thr|
data = stdout.read
p data.encoding
p data
end
–output:–
#Encoding:UTF-8
“\xFE\xFF\u0000h\u0000e\u0000l\u0000l\u0000o\u0000
\u0000w\u0000o\u0000r\u0000l\u0000d\n”
Clearly, that is not a UTF-8 encoding (ASCII characters occupy 1 byte in
UTF-8).
But where does UTF-8 come from?
$ ruby -v
ruby 1.9.3p547 (2014-05-14 revision 45962) [x86_64-darwin10.8.0]
Let’s check with James G. III…
==
There’s another way Strings are commonly created and that’s by reading
from some IO object. It doesn’t make sense to give those Strings the
source Encoding[by default ASCII in ruby 1.9.3] because the external
data doesn’t have to be related to your source code.
The external Encoding is the Encoding the data is in inside the IO
object.
The default external Encoding is pulled from your environment, much like
the source Encoding is for code given on the command-line. Have a look:
$ echo $LC_CTYPE
Gray Soft / Character Encodings / Ruby 1.9's Three Default Encodings
I get a blank line for that last command(on a Mac), but if I can do
this:
$ echo $LANG
en_US.UTF-8
It doesn’t appear that
you can pass the encoding in up front with Open3.popen3
No, but after looking through the ruby docs for awhile, you can do
it post festum
:
do_stuff.rb:
#!/usr/bin/env ruby
my_str = “hello world”
#puts my_str.encoding
#puts my_str.bytesize
x = my_str.encode(‘UTF-16’)
#puts x.encoding
#puts x.bytesize
#p x
print x
my_prog.rb:
require ‘open3’
Open3.popen3(‘ruby do_stuff.rb’) do |stdin, stdout, stderr, wait_thr|
puts Encoding.default_external.name
puts “popen3_stdout external: #{stdout.external_encoding.name}”
stdout.set_encoding ‘UTF-16:UTF-8’ #<—HERE***
#read() data as UTF-16 and convert to UTF-8
puts “popen3_stdout external: #{stdout.external_encoding.name}”
data = stdout.read
puts “data says it’s encoded with: #{data.encoding}”
puts “Let’s see if that’s true:”
p data
end
–output:–
UTF-8
popen3_stdout external: UTF-8
popen3_stdout external: UTF-16
data says it’s encoded with: UTF-8
Let’s see if that’s true:
“hello world”