Forum: Ruby-core [Bug #1681] Integer#chr Should Infer Encoding of Given Codepoint

Posted by Run Paint Run Run (Guest)
on 2009-06-23 23:43
(Received via mailing list)
Bug #1681: Integer#chr Should Infer Encoding of Given Codepoint
http://redmine.ruby-lang.org/issues/show/1681

Author: Run Paint Run Run
Status: Open, Priority: Low
Category: M17N
ruby -v: ruby 1.9.2dev (2009-06-21 trunk 23774) [i686-linux]

String#ord and Integer#chr are symmetrical operations on ASCII Strings:

    'a'.ord.chr   #=> "a"

But Integer#chr fails to round-trip when the given codepoint is outside 
the range of ASCII:

    "\u{2563}".ord.chr #=> RangeError: 9571 out of char range

To fix this, the codepoint's encoding needs to be specified:

    "\u{2563}".ord.chr('utf-8')  #=> "â•£"

This seems needlessly verbose given that Ruby already knows that my 
source encoding is UTF-8. I suggest, then, that, when invoked with no 
argument, Integer#chr displays the given codepoint w.r.t to the current 
encoding, raising a RangeError only if the codepoint is out-of-bounds 
for this inferred encoding.
Posted by Nobuyoshi Nakada (nobu)
on 2009-06-24 01:54
(Received via mailing list)
Hi,

At Wed, 24 Jun 2009 06:42:29 +0900,
Run Paint Run Run wrote in [ruby-core:23997]:
> This seems needlessly verbose given that Ruby already knows
> that my source encoding is UTF-8.

It's irrelevant to source encoding.  A possiblity would be
Encoding.default_internal?
Posted by Run Paint Run Run (Guest)
on 2009-06-24 02:55
(Received via mailing list)
>> This seems needlessly verbose given that Ruby already knows
>> that my source encoding is UTF-8.
>
> It's irrelevant to source encoding.  A possiblity would be
> Encoding.default_internal?

Indeed; my mistake. :-)
Posted by Yukihiro Matsumoto (Guest)
on 2009-06-25 11:07
(Received via mailing list)
Hi,

In message "Re: [ruby-core:24001] Re: [Bug #1681] Integer#chr Should 
Infer   Encoding of Given Codepoint"
    on Wed, 24 Jun 2009 09:54:06 +0900, Run Paint Run Run 
<runrun@runpaint.org> writes:
|
|>> This seems needlessly verbose given that Ruby already knows
|>> that my source encoding is UTF-8.
|>
|> It's irrelevant to source encoding.  A possiblity would be
|> Encoding.default_internal?
|
|Indeed; my mistake. :-)

Source encoding may be different from default internal encoding.
Since codepoint _number_ does not contain any encoding information,
there's information loss.  I am not sure it is OK to use possibly
wrong encoding information (default internal), even as a default.

I'd like to hear opinion from others.

              matz.
Posted by "Martin J. Dürst" (Guest)
on 2009-06-26 11:33
(Received via mailing list)
We have String#encode (without any arguments), which transcodes to
default_internal (and in addition, doesn't raise an exception for
invalid byte sequences,..., which may be a security issue), so I don't
think using Integer#chr with a default encoding of default_internal
would be such a big problem.

Regards,    Martin.
Posted by Yukihiro Matsumoto (Guest)
on 2009-06-26 20:49
(Received via mailing list)
Issue #1681 has been updated by Yukihiro Matsumoto.

Status changed from Open to Closed
% Done changed from 0 to 100

Applied in changeset r23865.
----------------------------------------
http://redmine.ruby-lang.org/issues/show/1681
Please log in before posting. Registration is free and takes only a minute.
Existing account (Switch to SSL-encrypted connection)
NEW: Do you have a Google/GoogleMail or Yahoo account? No registration required!
Log in with Google account | Log in with Yahoo account
No account? Register here.