Forum: Ruby-core [ruby-trunk - Bug #7964][Open] Writing an ASCII-8BIT String to a StringIO created from a UTF-8 Strin

Posted by Brian Ford (brixen)
on 2013-02-26 08:33
(Received via mailing list)
Issue #7964 has been reported by brixen (Brian Ford).

----------------------------------------
Bug #7964: Writing an ASCII-8BIT String to a StringIO created from a 
UTF-8 String
https://bugs.ruby-lang.org/issues/7964

Author: brixen (Brian Ford)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]


In the following script, an ASCII-8BIT String is written to a StringIO 
created with a UTF-8 String without error. However, a << b or a + b will 
raise an exception, as will writing an ASCII-8BIT String to a File with 
UTF-8 external encoding.

$ cat file_enc.rb
# encoding: utf-8
require 'stringio'

a = "On a very cold morning, it was -8°F."
b = a.dup.force_encoding "ascii-8bit"

io = StringIO.new a
io.write(b)
p io.string.encoding

File.open "data.txt", "w:utf-8" do |f|
  f.write a
  f.write b
end

$ ruby2.0 -v file_enc.rb
ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]
#<Encoding:UTF-8>
file_enc.rb:13:in `write': "\xC2" from ASCII-8BIT to UTF-8 
(Encoding::UndefinedConversionError)
  from file_enc.rb:13:in `block in <main>'
  from file_enc.rb:11:in `open'
  from file_enc.rb:11:in `<main>'

$ ruby1.9.3 -v file_enc.rb
ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-darwin10.8.0]
#<Encoding:UTF-8>
file_enc.rb:13:in `write': "\xC2" from ASCII-8BIT to UTF-8 
(Encoding::UndefinedConversionError)
  from file_enc.rb:13:in `block in <main>'
  from file_enc.rb:11:in `open'
  from file_enc.rb:11:in `<main>'
Posted by Nobuyoshi Nakada (nobu)
on 2013-02-26 08:51
(Received via mailing list)
Issue #7964 has been updated by nobu (Nobuyoshi Nakada).

Description updated


----------------------------------------
Bug #7964: Writing an ASCII-8BIT String to a StringIO created from a 
UTF-8 String
https://bugs.ruby-lang.org/issues/7964#change-37085

Author: brixen (Brian Shirai)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]


=begin
In the following script, an ASCII-8BIT String is written to a StringIO 
created with a UTF-8 String without error. However, a << b or a + b will 
raise an exception, as will writing an ASCII-8BIT String to a File with 
UTF-8 external encoding.

+ $ cat file_enc.rb
  # encoding: utf-8
  require 'stringio'

  a = "On a very cold morning, it was -8°F."
  b = a.dup.force_encoding "ascii-8bit"

  io = StringIO.new a
  io.write(b)
  p io.string.encoding

  File.open "data.txt", "w:utf-8" do |f|
    f.write a
    f.write b
  end

+ $ ruby2.0 -v file_enc.rb
  ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]
  #<Encoding:UTF-8>
  file_enc.rb:13:in `write': "\xC2" from ASCII-8BIT to UTF-8 
(Encoding::UndefinedConversionError)
    from file_enc.rb:13:in `block in <main>'
    from file_enc.rb:11:in `open'
    from file_enc.rb:11:in `<main>'

+ $ ruby1.9.3 -v file_enc.rb
  ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-darwin10.8.0]
  #<Encoding:UTF-8>
  file_enc.rb:13:in `write': "\xC2" from ASCII-8BIT to UTF-8 
(Encoding::UndefinedConversionError)
    from file_enc.rb:13:in `block in <main>'
    from file_enc.rb:11:in `open'
    from file_enc.rb:11:in `<main>'
=end
Posted by Nobuyoshi Nakada (nobu)
on 2013-02-26 08:56
(Received via mailing list)
Issue #7964 has been updated by nobu (Nobuyoshi Nakada).

Category set to ext
Status changed from Open to Assigned
Assignee set to nobu (Nobuyoshi Nakada)
Target version set to current: 2.1.0

Currently, StringIO does not support encoding conversion on write, so 
`io.write(b)' does not raise any exceptions.
----------------------------------------
Bug #7964: Writing an ASCII-8BIT String to a StringIO created from a 
UTF-8 String
https://bugs.ruby-lang.org/issues/7964#change-37086

Author: brixen (Brian Shirai)
Status: Assigned
Priority: Normal
Assignee: nobu (Nobuyoshi Nakada)
Category: ext
Target version: current: 2.1.0
ruby -v: ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]


=begin
In the following script, an ASCII-8BIT String is written to a StringIO 
created with a UTF-8 String without error. However, a << b or a + b will 
raise an exception, as will writing an ASCII-8BIT String to a File with 
UTF-8 external encoding.

+ $ cat file_enc.rb
  # encoding: utf-8
  require 'stringio'

  a = "On a very cold morning, it was -8°F."
  b = a.dup.force_encoding "ascii-8bit"

  io = StringIO.new a
  io.write(b)
  p io.string.encoding

  File.open "data.txt", "w:utf-8" do |f|
    f.write a
    f.write b
  end

+ $ ruby2.0 -v file_enc.rb
  ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]
  #<Encoding:UTF-8>
  file_enc.rb:13:in `write': "\xC2" from ASCII-8BIT to UTF-8 
(Encoding::UndefinedConversionError)
    from file_enc.rb:13:in `block in <main>'
    from file_enc.rb:11:in `open'
    from file_enc.rb:11:in `<main>'

+ $ ruby1.9.3 -v file_enc.rb
  ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-darwin10.8.0]
  #<Encoding:UTF-8>
  file_enc.rb:13:in `write': "\xC2" from ASCII-8BIT to UTF-8 
(Encoding::UndefinedConversionError)
    from file_enc.rb:13:in `block in <main>'
    from file_enc.rb:11:in `open'
    from file_enc.rb:11:in `<main>'
=end
Posted by "duerst (Martin Dürst)" <duerst@it.aoyama.ac.jp> (Guest)
on 2013-02-26 10:19
(Received via mailing list)
Issue #7964 has been updated by duerst (Martin Dürst).


nobu (Nobuyoshi Nakada) wrote:
> Currently, StringIO does not support encoding conversion on write, so 
`io.write(b)' does not raise any exceptions.

Should StringIO support encoding conversion? I think it should, because 
it should work like IO. However, the question is whether the resulting 
string should always be BINARY (exactly mirroring what happens with real 
IO), or whether it should have its own encoding (this might allow 
collecting substrings in different encodings into a string with a single 
encoding without any explicit conversions).

I think that somebody should open a feature for this, and of course 
patches would be welcome.

As an aside, I think it would be easier implementing StingIO in Ruby, or 
is StringIO performance critical?
----------------------------------------
Bug #7964: Writing an ASCII-8BIT String to a StringIO created from a 
UTF-8 String
https://bugs.ruby-lang.org/issues/7964#change-37089

Author: brixen (Brian Shirai)
Status: Assigned
Priority: Normal
Assignee: nobu (Nobuyoshi Nakada)
Category: ext
Target version: current: 2.1.0
ruby -v: ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]


=begin
In the following script, an ASCII-8BIT String is written to a StringIO 
created with a UTF-8 String without error. However, a << b or a + b will 
raise an exception, as will writing an ASCII-8BIT String to a File with 
UTF-8 external encoding.

+ $ cat file_enc.rb
  # encoding: utf-8
  require 'stringio'

  a = "On a very cold morning, it was -8°F."
  b = a.dup.force_encoding "ascii-8bit"

  io = StringIO.new a
  io.write(b)
  p io.string.encoding

  File.open "data.txt", "w:utf-8" do |f|
    f.write a
    f.write b
  end

+ $ ruby2.0 -v file_enc.rb
  ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]
  #<Encoding:UTF-8>
  file_enc.rb:13:in `write': "\xC2" from ASCII-8BIT to UTF-8 
(Encoding::UndefinedConversionError)
    from file_enc.rb:13:in `block in <main>'
    from file_enc.rb:11:in `open'
    from file_enc.rb:11:in `<main>'

+ $ ruby1.9.3 -v file_enc.rb
  ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-darwin10.8.0]
  #<Encoding:UTF-8>
  file_enc.rb:13:in `write': "\xC2" from ASCII-8BIT to UTF-8 
(Encoding::UndefinedConversionError)
    from file_enc.rb:13:in `block in <main>'
    from file_enc.rb:11:in `open'
    from file_enc.rb:11:in `<main>'
=end
Posted by naruse (Yui NARUSE) (Guest)
on 2013-02-26 11:37
(Received via mailing list)
Issue #7964 has been updated by naruse (Yui NARUSE).



The examples are not equal.
Correct comparison is

 StringIO.open a, "w" do |io|
   io.write(b)
 end

 File.open "data.txt", "w" do |io|
   io.write b
 end

So it won't raise error even if StringIO supports external/internal 
encoding.


duerst (Martin Dürst) wrote:
> nobu (Nobuyoshi Nakada) wrote:
> > Currently, StringIO does not support encoding conversion on write, so 
`io.write(b)' does not raise any exceptions.
>
> Should StringIO support encoding conversion? I think it should, because it 
should work like IO. However, the question is whether the resulting string should 
always be BINARY (exactly mirroring what happens with real IO), or whether it 
should have its own encoding (this might allow collecting substrings in different 
encodings into a string with a single encoding without any explicit conversions).

Agreed.

> I think that somebody should open a feature for this, and of course patches 
would be welcome.
>
> As an aside, I think it would be easier implementing StingIO in Ruby, or is 
StringIO performance critical?

see https://bugs.ruby-lang.org/issues/5677
----------------------------------------
Bug #7964: Writing an ASCII-8BIT String to a StringIO created from a 
UTF-8 String
https://bugs.ruby-lang.org/issues/7964#change-37096

Author: brixen (Brian Shirai)
Status: Assigned
Priority: Normal
Assignee: nobu (Nobuyoshi Nakada)
Category: ext
Target version: current: 2.1.0
ruby -v: ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]


=begin
In the following script, an ASCII-8BIT String is written to a StringIO 
created with a UTF-8 String without error. However, a << b or a + b will 
raise an exception, as will writing an ASCII-8BIT String to a File with 
UTF-8 external encoding.

+ $ cat file_enc.rb
  # encoding: utf-8
  require 'stringio'

  a = "On a very cold morning, it was -8°F."
  b = a.dup.force_encoding "ascii-8bit"

  io = StringIO.new a
  io.write(b)
  p io.string.encoding

  File.open "data.txt", "w:utf-8" do |f|
    f.write a
    f.write b
  end

+ $ ruby2.0 -v file_enc.rb
  ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]
  #<Encoding:UTF-8>
  file_enc.rb:13:in `write': "\xC2" from ASCII-8BIT to UTF-8 
(Encoding::UndefinedConversionError)
    from file_enc.rb:13:in `block in <main>'
    from file_enc.rb:11:in `open'
    from file_enc.rb:11:in `<main>'

+ $ ruby1.9.3 -v file_enc.rb
  ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-darwin10.8.0]
  #<Encoding:UTF-8>
  file_enc.rb:13:in `write': "\xC2" from ASCII-8BIT to UTF-8 
(Encoding::UndefinedConversionError)
    from file_enc.rb:13:in `block in <main>'
    from file_enc.rb:11:in `open'
    from file_enc.rb:11:in `<main>'
=end
Posted by Brian Ford (brixen)
on 2013-02-27 02:38
(Received via mailing list)
Issue #7964 has been updated by brixen (Brian Shirai).


Martin, what do you mean by: "However, the question is whether the 
resulting string should always be BINARY (exactly mirroring what happens 
with real IO)..."?

If StringIO is going to fake aliasing #pos across instances that have 
been #dup'd, it should certainly have the same encoding-related 
behavior. Cf. http://bugs.ruby-lang.org/issues/7220
----------------------------------------
Bug #7964: Writing an ASCII-8BIT String to a StringIO created from a 
UTF-8 String
https://bugs.ruby-lang.org/issues/7964#change-37126

Author: brixen (Brian Shirai)
Status: Assigned
Priority: Normal
Assignee: nobu (Nobuyoshi Nakada)
Category: ext
Target version: current: 2.1.0
ruby -v: ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]


=begin
In the following script, an ASCII-8BIT String is written to a StringIO 
created with a UTF-8 String without error. However, a << b or a + b will 
raise an exception, as will writing an ASCII-8BIT String to a File with 
UTF-8 external encoding.

+ $ cat file_enc.rb
  # encoding: utf-8
  require 'stringio'

  a = "On a very cold morning, it was -8°F."
  b = a.dup.force_encoding "ascii-8bit"

  io = StringIO.new a
  io.write(b)
  p io.string.encoding

  File.open "data.txt", "w:utf-8" do |f|
    f.write a
    f.write b
  end

+ $ ruby2.0 -v file_enc.rb
  ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-darwin10.8.0]
  #<Encoding:UTF-8>
  file_enc.rb:13:in `write': "\xC2" from ASCII-8BIT to UTF-8 
(Encoding::UndefinedConversionError)
    from file_enc.rb:13:in `block in <main>'
    from file_enc.rb:11:in `open'
    from file_enc.rb:11:in `<main>'

+ $ ruby1.9.3 -v file_enc.rb
  ruby 1.9.3p327 (2012-11-10 revision 37606) [x86_64-darwin10.8.0]
  #<Encoding:UTF-8>
  file_enc.rb:13:in `write': "\xC2" from ASCII-8BIT to UTF-8 
(Encoding::UndefinedConversionError)
    from file_enc.rb:13:in `block in <main>'
    from file_enc.rb:11:in `open'
    from file_enc.rb:11:in `<main>'
=end
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.