Forum: Ruby Output UTF-16LE BOM to file - 1.9

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
A159b33b19f34f79fac0a72b5fa1c188?d=identicon&s=25 Chris Morris (Guest)
on 2009-04-09 22:32
(Received via mailing list)
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-mswin32]

With this code:

File.open('zz.txt', 'w:UTF-16LE') do |f|
  f.print "Hello Uni-world"
end

...I get no BOM

guts = File.read('zz.txt')
puts guts.bytes.to_a.inspect

#=> [72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 32, 0,...

...and my brain can't concoct a way to insert it myself, though I know
it must be simple...
4299e35bacef054df40583da2d51edea?d=identicon&s=25 James Gray (bbazzarrakk)
on 2009-04-12 18:39
(Received via mailing list)
On Apr 9, 2009, at 3:31 PM, Chris Morris wrote:

> guts = File.read('zz.txt')
> puts guts.bytes.to_a.inspect
>
> #=> [72, 0, 101, 0, 108, 0, 108, 0, 111, 0, 32, 0,...
>
> ...and my brain can't concoct a way to insert it myself, though I know
> it must be simple...

Yeah, it's easy stuff.

A Unicode BOM is just the character U+FEFF encoded at the beginning of
the document.  You can insert that character yourself with Ruby 1.9's
Unicode escape and it will be transcoded into the proper byte order
based on the external_encoding() you are writing to:

$ cat utf16_bom.rb
# encoding: UTF-8
File.open("utf16_bom.txt", "w:UTF-16LE") do |f|
   f.puts "\uFEFFThis is UTF-16LE with a BOM."
end
$ ruby -v utf16_bom.rb
ruby 1.9.1p0 (2009-01-30 revision 21907) [i386-darwin9.6.0]
$ ruby -e 'p File.binread(ARGV.shift)[0..9]' utf16_bom.txt
"\xFF\xFET\x00h\x00i\x00s\x00"

Hope that helps.

James Edward Gray II
This topic is locked and can not be replied to.