StringIO and encodings

This surprised me:

$ ruby -v
ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-linux]

$ ruby -e ‘p “”.encoding’
#Encoding:ISO-8859-1

$ cat a.rb

encoding: utf-8

require ‘stringio’
s = StringIO.new

a = “abc”
s.puts(a)

p :string => a.encoding
p :stringio => s.string.encoding

$ ruby a.rb
{:string=>#Encoding:UTF-8}
{:stringio=>#Encoding:ISO-8859-1} # <- WTF?

$ cat b.rb

encoding: utf-8

require ‘stringio’
s = StringIO.new("") # <- note the constructor parameter

a = “abc”
s.puts(a)

p :string => a.encoding
p :stringio => s.string.encoding

$ ruby b.rb
{:string=>#Encoding:UTF-8}
{:stringio=>#Encoding:UTF-8} # <- better!

I presume that what’s going on here is a simple matter of defaults. If
you don’t pass an initial string to StringIO.new, it constructs one for
itself using the default encoding from the locale, and the encoding
coercion rules mean that the internal string’s encoding will always be
the same. StringIO.new has no knowledge of the file encoding at the
location it’s called from.

This behaviour seems odd to me. I think a better behaviour would be
either to always force a string parameter, so that it never has to
pick a default encoding itself, or that it should not make itself an
internal string on #new, but instead #dup the first string it gets
passed as a parameter to #write or #puts and use that instead.

Thoughts?


Alex

On Tue, Sep 20, 2011 at 4:18 PM, Alex Y. [email protected]
wrote:

StringIO.new has no knowledge of the file encoding at the
location it’s called from.

Can it not be changed so that it knows the internal encoding, instead?
That
would stop you having to break the argument-less constructor or doing
any
#dup’ing, no?

On Sep 20, 2011, at 8:32 AM, Alex Y. wrote:

I don’t know if there’s an API for that, but I suspect there isn’t.
It’s not that hard to check:

$ ri StringIO | grep encoding
external_encoding
internal_encoding
set_encoding

If there were, then yes, that’s the way to do it.

$ ri StringIO.set_encoding
StringIO.set_encoding

(from ruby core)

strio.set_encoding(ext_enc, [int_enc[, opt]]) => strio

Eric H. wrote in post #1022972:

On Sep 20, 2011, at 8:32 AM, Alex Y. wrote:

I don’t know if there’s an API for that, but I suspect there isn’t.
It’s not that hard to check:

$ ri StringIO | grep encoding
external_encoding
internal_encoding
set_encoding

$ ri StringIO
Nothing known about StringIO

is what I get. I never assume ri works.


Alex

Adam P. wrote in post #1022947:

On Tue, Sep 20, 2011 at 4:18 PM, Alex Y. [email protected]
wrote:

StringIO.new has no knowledge of the file encoding at the
location it’s called from.

Can it not be changed so that it knows the internal encoding, instead?
That
would stop you having to break the argument-less constructor or doing
any
#dup’ing, no?

I don’t know if there’s an API for that, but I suspect there isn’t. If
there were, then yes, that’s the way to do it.


Alex

Ryan D. wrote in post #1023415:

On Sep 23, 2011, at 00:07 , Alex Y. wrote:

$ ri StringIO
Nothing known about StringIO

is what I get. I never assume ri works.

soooo… instead of fixing it and empowering yourself… you choose…
what exactly?

rubydoc.info, usually. Saves fixing it on every single box I ever
touch.


Alex

On Fri, Sep 23, 2011 at 4:05 AM, Alex Y. [email protected]
wrote:

rubydoc.info, usually. Saves fixing it on every single box I ever
touch.

+1, ri has worked for me once before, but rarely does, and I don’t enjoy
the
format anyway. I used to build docs and host them with gem server but
now
I turn off ri and rdoc and just use rdoc.info since it has not only core
docs, but also gems.

Occasionally I use ruby-doc.org, and for Rails I use
guides.rubyonrails.organd
api.rubyonrails.org

On Sep 23, 2011, at 00:07 , Alex Y. wrote:

$ ri StringIO
Nothing known about StringIO

is what I get. I never assume ri works.

soooo… instead of fixing it and empowering yourself… you choose…
what exactly?

Alex Y. wrote in post #1022945:

This surprised me:

Nothing surprises me any more about encodings in ruby 1.9.

FWIW, there’s a similar case with String.new. Whereas a string literal
gets its encoding from the source encoding of the file, String.new
doesn’t.

brian@x100:~$ ruby192 -e ‘p “”.encoding’
#Encoding:UTF-8
brian@x100:~$ ruby192 -e ‘p String.new.encoding’
#Encoding:ASCII-8BIT
brian@x100:~$ echo ‘p “”.encoding’ | ruby192
#Encoding:UTF-8
brian@x100:~$ echo ‘p String.new.encoding’ | ruby192
#Encoding:ASCII-8BIT
brian@x100:~$ echo ‘p “”.encoding’ > x.rb && ruby192 x.rb
#Encoding:US-ASCII
brian@x100:~$ echo ‘p String.new.encoding’ > x.rb && ruby192 x.rb
#Encoding:ASCII-8BIT

However, String.new doesn’t seem to be getting its encoding from the
environment, which your program suggests StringIO.new does.

All of this is completely undocumented, and therefore whatever behaviour
you get is what you get. Fine if you like stamp collecting though.

Regards,

Brian.