Output UTF-16LE BOM to file - 1.9


#1

ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]

With this code:

File.open(‘zz.txt’, ‘w:UTF-16LE’) do |f|
f.print “Hello Uni-world”
end

…I get no BOM

guts = File.read(‘zz.txt’)
puts guts.bytes.to_a.inspect

#=> [72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 32, 0,…

…and my brain can’t concoct a way to insert it myself, though I know
it must be simple…


#2

On Apr 9, 2009, at 3:31 PM, Chris M. wrote:

guts = File.read(‘zz.txt’)
puts guts.bytes.to_a.inspect

#=> [72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 32, 0,…

…and my brain can’t concoct a way to insert it myself, though I know
it must be simple…

Yeah, it’s easy stuff.

A Unicode BOM is just the character U+FEFF encoded at the beginning of
the document. You can insert that character yourself with Ruby 1.9’s
Unicode escape and it will be transcoded into the proper byte order
based on the external_encoding() you are writing to:

$ cat utf16_bom.rb

encoding: UTF-8

File.open(“utf16_bom.txt”, “w:UTF-16LE”) do |f|
f.puts “\uFEFFThis is UTF-16LE with a BOM.”
end
$ ruby -v utf16_bom.rb
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-darwin9.6.0]
$ ruby -e ‘p File.binread(ARGV.shift)[0…9]’ utf16_bom.txt
“\xFF\xFET\x00h\x00i\x00s\x00”

Hope that helps.

James Edward G. II