[Repost, with Formatting] Trying to understand unicode character entry, goes into postgres DB backi

Apologies on unformatted send previously, i hit Enter and the web UI
posted, to my chagrin.

  1. Examine the Unicode standard’s code page collection for “Latin
    small letter a with macron”.

  2. Nets U0100.pdf

  3. “Latin small letter a with macron” appears on chart as 0101. This
    is a hexidemial number which points to U+0101 as its code point.
    Converting 0101 to decimal gets you 257, this is the same as the HTML
    entity code.
    HTML code point is 257. That is &257; gives you &257; != 325. OK, so I
    can link this guy back to the Unicode source. But here’s the question,
    what’s up with the two broken values.

  4. Put &257; character into a view via Rails that is back-ended by a
    PostGres database.

  5. Using script/console, write the collection of models that contain
    this accented character to a YAML file.

  6. “Latin small letter a with macron” is stored in a YAML dump of
    accented charcters as: \xC4\x81
    Hm, OK that’s a start. Somehow 0101 or 257 is linked to C4 81 Let’s
    convert those two to decimal and see if correlation becomes clear ( I
    know, BTW, the database that holds that entry is in UTF-8).
    C4: 196
    81: 129
    196+129=325 != 0101. Hm, look at documentation.

  7. Be stumped.


I’m working an application up that works with foreign languages and
I’m trying to make it easy to enter accented characters. I saved some
base data that I entered as a fixture ( so that I could re-load it as
a sample when needed ) and I noticed that in this yaml file my
accented characters are in this unusual \x##\x## format that bears
little link to the code-points that I’ve seen before in code point
charts.

I’ve always been scared to jump into the “How does Unicode work,
really” discussion, but maybe it’s time that I try to sort it out a
bit.

Doubtless people from a more multi-lingual environment probably
understand this much better than those of us in North America, so I’m
hoping this is a lost easier than I think!