Question about some octal formatted output?

eacute = “”
eacute << 0xC3 << 0xA9 #eacute<< 195 << 169 ; or é

p eacute

–output:—
“\303\251”

That ouput is in octal–although there is no leading 0.

  1. Where does that format come from, i.e. no leading 0?
  2. Why is the output in octal and not hex?

I looked up String#<< and it says it converts any Fixnum between 0-255
to a character.

  1. Using what character set?

Thanks.

On Oct 14, 2007, at 11:17 , 7stud – wrote:

  1. Why is the output in octal and not hex?
    Its at least as old as C. You’ll probably have to ask some really
    old timers for the answer.

$ cat octal.c
#include <stdio.h>

void main() { printf("\303\251\n"); }
$ gcc octal.c
octal.c: In function ‘main’:
octal.c:3: warning: return type of ‘main’ is not ‘int’
$ ./a.out
é

I looked up String#<< and it says it converts any Fixnum between 0-255
to a character.

  1. Using what character set?

ASCII. Its your terminal that controls how it gets displayed. My
terminal is set to UTF-8.

7stud – wrote:

  1. Where does that format come from, i.e. no leading 0?
  2. Why is the output in octal and not hex?

I looked up String#<< and it says it converts any Fixnum between 0-255
to a character.

  1. Using what character set?

Actually, what’s your problem with all that?

Your ints specified in hex are actually converted to bytes in the
string. That, interpreted as utf-8, may mean an é.

The conventional syntax for specifying bytes by their integer value in
string literals, used in C, shells and a number of other environments
(including Ruby) is a backslash followed by octal digits. (The leading 0
is used for specifying integer literals in octal.)

String#inspect (which I guess p is using) adopts this syntax for
displaying non-ascii and/or non-printing bytes in the string.

I really don’t get your third question. There’s no character set
involved here, beyond how you intended your two bytes to be interpreted.
Those two bytes remain the same, regardless how they are displayed. They
may mean two characters in plain old 8-bit charsets, they may mean e.g.
one é in utf-8, or they may mean what p displays for them.

mortee