What is String#ord?

fxn · April 17, 2010, 5:39pm

Ruby 1.9 docs for String#ord say:

Return the <code>Integer</code> ordinal of a one-character string.

What does that mean? Check for example

"Ã—".ord # => 215
"Ã—".bytes.to_a # => [195, 151]

– fxn

fxn · April 17, 2010, 6:42pm

On Sat, Apr 17, 2010 at 5:35 PM, Xavier N. [email protected] wrote:

Ruby 1.9 docs for String#ord say:

Â Â Return the Integer ordinal of a one-character string.

What does that mean? Check for example

Â Â “Ã—”.ord # => 215
Â Â “Ã—”.bytes.to_a # => [195, 151]

Trial and error suggests it is the code of the character in the
encoding of the string:

euro = "\u20AC"

euro.ord.to_s(16) # => "20ac"
euro.encode("iso-8859-15").ord.to_s(16) # => "a4"

That is what the source code suggests also:

VALUE
rb_str_ord(VALUE s)
{
unsigned int c;

c = rb_enc_codepoint(RSTRING_PTR(s), RSTRING_END(s),

STR_ENC_GET(s));
return UINT2NUM(c);
}

fxn · April 17, 2010, 6:49pm

On 17 April 2010 18:41, Xavier N. [email protected] wrote:

VALUE
rb_str_ord(VALUE s)
{
unsigned int c;

c = rb_enc_codepoint(RSTRING_PTR(s), RSTRING_END(s), STR_ENC_GET(s));
return UINT2NUM(c);
}

p “Ã—”.ord # => 215
p “Ã—”.bytes.to_a # => [195, 151]
p “Ã—”.encoding # => #Encoding:UTF-8
p “Ã—”.codepoints.to_a #=> [215]

In UTF-8, (and Unicode in general), one byte is not always(or even
never) a
character.
A codepoint represent a character

So, you can think of ord as codepoints[0], and that number of course
depends
of the String’s Encoding.

Regards,
B.D.

fxn · April 17, 2010, 7:48pm

Yes of course, a posteriori that’s the only thing that makes sense. I
was in a different context and the doc was not clear enough for me.

Perhaps I send a patch to define #ord in terms of the code/codepoint
in the string’s character encoding, instead of that bare “ordinal”.