Forum: Ruby-core [ruby-trunk - Bug #7752][Open] Rational/Float/Fixnum/Bignum `.to_s.encoding` is US-ASCII

Posted by coffeejunk (Maximilian Haack) (Guest)
on 2013-01-29 18:00
(Received via mailing list)
Issue #7752 has been reported by coffeejunk (Maximilian Haack).

----------------------------------------
Bug #7752: Rational/Float/Fixnum/Bignum `.to_s.encoding` is US-ASCII
https://bugs.ruby-lang.org/issues/7752

Author: coffeejunk (Maximilian Haack)
Status: Open
Priority: Normal
Assignee:
Category:
Target version: 2.0.0
ruby -v: 2.0.0dev


=begin
When converting an instance of Rational/Float/Fixnum/Bignum to a string 
with the (({.to_s})) method, the resulting string has the encoding 
US-ASCII. This happens for 1.9.3 as well as 2.0.0rc1.

(({> __ENCODING__}))
(({ => #<Encoding:UTF-8>}))

(({> Encoding.default_internal}))
(({ => #<Encoding:UTF-8>}))

(({> Encoding.default_external}))
(({ => #<Encoding:UTF-8>}))

(({> 1.to_s.encoding}))
(({#=> #<Encoding:US-ASCII>}))

(({> (2/1).to_r.to_s.encoding}))
(({ => #<Encoding:US-ASCII>}))

(({> "abc".encoding}))
(({ => #<Encoding:UTF-8>}))

=end
Posted by drbrain (Eric Hodel) (Guest)
on 2013-01-29 21:12
(Received via mailing list)
Issue #7752 has been updated by drbrain (Eric Hodel).

Category set to core

This behavior matches Time#to_s, see #5226

Since there are no non-US-ASCII characters in the result of to_s on 
Rational, Float, Fixnum or Bignum there should be no problem with the 
US-ASCII encoding.  Can you demonstrate one?

----------------------------------------
Bug #7752: Rational/Float/Fixnum/Bignum `.to_s.encoding` is US-ASCII
https://bugs.ruby-lang.org/issues/7752#change-35705

Author: coffeejunk (Maximilian Haack)
Status: Open
Priority: Normal
Assignee:
Category: core
Target version: 2.0.0
ruby -v: 2.0.0dev


=begin
When converting an instance of Rational/Float/Fixnum/Bignum to a string 
with the (({.to_s})) method, the resulting string has the encoding 
US-ASCII. This happens for 1.9.3 as well as 2.0.0rc1.

(({> __ENCODING__}))
(({ => #<Encoding:UTF-8>}))

(({> Encoding.default_internal}))
(({ => #<Encoding:UTF-8>}))

(({> Encoding.default_external}))
(({ => #<Encoding:UTF-8>}))

(({> 1.to_s.encoding}))
(({#=> #<Encoding:US-ASCII>}))

(({> (2/1).to_r.to_s.encoding}))
(({ => #<Encoding:US-ASCII>}))

(({> "abc".encoding}))
(({ => #<Encoding:UTF-8>}))

=end
Posted by coffeejunk (Maximilian Haack) (Guest)
on 2013-01-31 10:08
(Received via mailing list)
Issue #7752 has been updated by coffeejunk (Maximilian Haack).


The only problem I see is that ruby is lying to the user. It is not 
severe since, as you said, there are no non-ascii characters in the 
resulting string, but I think ruby should respect the set encoding.
----------------------------------------
Bug #7752: Rational/Float/Fixnum/Bignum `.to_s.encoding` is US-ASCII
https://bugs.ruby-lang.org/issues/7752#change-35742

Author: coffeejunk (Maximilian Haack)
Status: Open
Priority: Normal
Assignee:
Category: core
Target version: 2.0.0
ruby -v: 2.0.0dev


=begin
When converting an instance of Rational/Float/Fixnum/Bignum to a string 
with the (({.to_s})) method, the resulting string has the encoding 
US-ASCII. This happens for 1.9.3 as well as 2.0.0rc1.

(({> __ENCODING__}))
(({ => #<Encoding:UTF-8>}))

(({> Encoding.default_internal}))
(({ => #<Encoding:UTF-8>}))

(({> Encoding.default_external}))
(({ => #<Encoding:UTF-8>}))

(({> 1.to_s.encoding}))
(({#=> #<Encoding:US-ASCII>}))

(({> (2/1).to_r.to_s.encoding}))
(({ => #<Encoding:US-ASCII>}))

(({> "abc".encoding}))
(({ => #<Encoding:UTF-8>}))

=end
Posted by Joshua Ballanco (jballanc)
on 2013-01-31 14:09
(Received via mailing list)
Issue #7752 has been updated by jballanc (Joshua Ballanco).


US-ASCII is a strict subset of UTF-8, so I don't think there's 
necessarily any lying involved.
----------------------------------------
Bug #7752: Rational/Float/Fixnum/Bignum `.to_s.encoding` is US-ASCII
https://bugs.ruby-lang.org/issues/7752#change-35744

Author: coffeejunk (Maximilian Haack)
Status: Open
Priority: Normal
Assignee:
Category: core
Target version: 2.0.0
ruby -v: 2.0.0dev


=begin
When converting an instance of Rational/Float/Fixnum/Bignum to a string 
with the (({.to_s})) method, the resulting string has the encoding 
US-ASCII. This happens for 1.9.3 as well as 2.0.0rc1.

(({> __ENCODING__}))
(({ => #<Encoding:UTF-8>}))

(({> Encoding.default_internal}))
(({ => #<Encoding:UTF-8>}))

(({> Encoding.default_external}))
(({ => #<Encoding:UTF-8>}))

(({> 1.to_s.encoding}))
(({#=> #<Encoding:US-ASCII>}))

(({> (2/1).to_r.to_s.encoding}))
(({ => #<Encoding:US-ASCII>}))

(({> "abc".encoding}))
(({ => #<Encoding:UTF-8>}))

=end
Posted by naruse (Yui NARUSE) (Guest)
on 2013-02-01 07:18
(Received via mailing list)
Issue #7752 has been updated by naruse (Yui NARUSE).

Status changed from Open to Rejected

On current policy, strings which always include only US-ASCII characters 
are US-ASCII.
If there is a practical issue, I may change the policy in the future.

Note that US-ASCII string is faster than UTF-8 on getting length or 
index access.
----------------------------------------
Bug #7752: Rational/Float/Fixnum/Bignum `.to_s.encoding` is US-ASCII
https://bugs.ruby-lang.org/issues/7752#change-35755

Author: coffeejunk (Maximilian Haack)
Status: Rejected
Priority: Normal
Assignee:
Category: core
Target version: 2.0.0
ruby -v: 2.0.0dev


=begin
When converting an instance of Rational/Float/Fixnum/Bignum to a string 
with the (({.to_s})) method, the resulting string has the encoding 
US-ASCII. This happens for 1.9.3 as well as 2.0.0rc1.

(({> __ENCODING__}))
(({ => #<Encoding:UTF-8>}))

(({> Encoding.default_internal}))
(({ => #<Encoding:UTF-8>}))

(({> Encoding.default_external}))
(({ => #<Encoding:UTF-8>}))

(({> 1.to_s.encoding}))
(({#=> #<Encoding:US-ASCII>}))

(({> (2/1).to_r.to_s.encoding}))
(({ => #<Encoding:US-ASCII>}))

(({> "abc".encoding}))
(({ => #<Encoding:UTF-8>}))

=end
Posted by "Martin J. Dürst" <duerst@it.aoyama.ac.jp> (Guest)
on 2013-02-01 12:24
(Received via mailing list)
On 2013/01/31 18:07, coffeejunk (Maximilian Haack) wrote:
>
> Issue #7752 has been updated by coffeejunk (Maximilian Haack).
>
>
> The only problem I see is that ruby is lying to the user.

There is 0% lying if one claims that an ASCII-only string is US-ASCII.
There is also 0% lying if one claims it's UTF-8.

> It is not severe since, as you said, there are no non-ascii characters in the 
resulting string, but I think ruby should respect the set encoding.

Setting Encoding.default_internal (or something else) is not a guarantee
that all Strings will be in that encoding. Otherwise, it wouldn't be
called "default".

Regards,    Martin.
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.