This surprised me:
$ ruby -v
ruby 1.9.2p290 (2011-07-09 revision 32553) [x86_64-linux]
$ ruby -e ‘p “”.encoding’
#Encoding:ISO-8859-1
$ cat a.rb
encoding: utf-8
require ‘stringio’
s = StringIO.new
a = “abc”
s.puts(a)
p :string => a.encoding
p :stringio => s.string.encoding
$ ruby a.rb
{:string=>#Encoding:UTF-8}
{:stringio=>#Encoding:ISO-8859-1} # <- WTF?
$ cat b.rb
encoding: utf-8
require ‘stringio’
s = StringIO.new("") # <- note the constructor parameter
a = “abc”
s.puts(a)
p :string => a.encoding
p :stringio => s.string.encoding
$ ruby b.rb
{:string=>#Encoding:UTF-8}
{:stringio=>#Encoding:UTF-8} # <- better!
I presume that what’s going on here is a simple matter of defaults. If
you don’t pass an initial string to StringIO.new, it constructs one for
itself using the default encoding from the locale, and the encoding
coercion rules mean that the internal string’s encoding will always be
the same. StringIO.new has no knowledge of the file encoding at the
location it’s called from.
This behaviour seems odd to me. I think a better behaviour would be
either to always force a string parameter, so that it never has to
pick a default encoding itself, or that it should not make itself an
internal string on #new, but instead #dup the first string it gets
passed as a parameter to #write or #puts and use that instead.
Thoughts?
–
Alex
On Tue, Sep 20, 2011 at 4:18 PM, Alex Y. [email protected]
wrote:
StringIO.new has no knowledge of the file encoding at the
location it’s called from.
Can it not be changed so that it knows the internal encoding, instead?
That
would stop you having to break the argument-less constructor or doing
any
#dup’ing, no?
On Sep 20, 2011, at 8:32 AM, Alex Y. wrote:
I don’t know if there’s an API for that, but I suspect there isn’t.
It’s not that hard to check:
$ ri StringIO | grep encoding
external_encoding
internal_encoding
set_encoding
If there were, then yes, that’s the way to do it.
$ ri StringIO.set_encoding
StringIO.set_encoding
(from ruby core)
strio.set_encoding(ext_enc, [int_enc[, opt]]) => strio
Eric H. wrote in post #1022972:
On Sep 20, 2011, at 8:32 AM, Alex Y. wrote:
I don’t know if there’s an API for that, but I suspect there isn’t.
It’s not that hard to check:
$ ri StringIO | grep encoding
external_encoding
internal_encoding
set_encoding
$ ri StringIO
Nothing known about StringIO
is what I get. I never assume ri works.
–
Alex
Adam P. wrote in post #1022947:
On Tue, Sep 20, 2011 at 4:18 PM, Alex Y. [email protected]
wrote:
StringIO.new has no knowledge of the file encoding at the
location it’s called from.
Can it not be changed so that it knows the internal encoding, instead?
That
would stop you having to break the argument-less constructor or doing
any
#dup’ing, no?
I don’t know if there’s an API for that, but I suspect there isn’t. If
there were, then yes, that’s the way to do it.
–
Alex
Ryan D. wrote in post #1023415:
On Sep 23, 2011, at 00:07 , Alex Y. wrote:
$ ri StringIO
Nothing known about StringIO
is what I get. I never assume ri works.
soooo… instead of fixing it and empowering yourself… you choose…
what exactly?
rubydoc.info, usually. Saves fixing it on every single box I ever
touch.
–
Alex
On Fri, Sep 23, 2011 at 4:05 AM, Alex Y. [email protected]
wrote:
rubydoc.info, usually. Saves fixing it on every single box I ever
touch.
+1, ri has worked for me once before, but rarely does, and I don’t enjoy
the
format anyway. I used to build docs and host them with gem server
but
now
I turn off ri and rdoc and just use rdoc.info since it has not only core
docs, but also gems.
Occasionally I use ruby-doc.org, and for Rails I use
guides.rubyonrails.organd
api.rubyonrails.org
On Sep 23, 2011, at 00:07 , Alex Y. wrote:
$ ri StringIO
Nothing known about StringIO
is what I get. I never assume ri works.
soooo… instead of fixing it and empowering yourself… you choose…
what exactly?
Alex Y. wrote in post #1022945:
This surprised me:
Nothing surprises me any more about encodings in ruby 1.9.
FWIW, there’s a similar case with String.new. Whereas a string literal
gets its encoding from the source encoding of the file, String.new
doesn’t.
brian@x100:~$ ruby192 -e ‘p “”.encoding’
#Encoding:UTF-8
brian@x100:~$ ruby192 -e ‘p String.new.encoding’
#Encoding:ASCII-8BIT
brian@x100:~$ echo ‘p “”.encoding’ | ruby192
#Encoding:UTF-8
brian@x100:~$ echo ‘p String.new.encoding’ | ruby192
#Encoding:ASCII-8BIT
brian@x100:~$ echo ‘p “”.encoding’ > x.rb && ruby192 x.rb
#Encoding:US-ASCII
brian@x100:~$ echo ‘p String.new.encoding’ > x.rb && ruby192 x.rb
#Encoding:ASCII-8BIT
However, String.new doesn’t seem to be getting its encoding from the
environment, which your program suggests StringIO.new does.
All of this is completely undocumented, and therefore whatever behaviour
you get is what you get. Fine if you like stamp collecting though.
Regards,
Brian.