Utf8 encoding problem

adad · June 25, 2009, 11:26pm

Hi,
I am retrieving a string from a txt file.
The file contains some utf8 characters.

I am comparing these characters against a default string.

The problem is that some of the characters are not stored in a default
format.

For example:
A is stored as ï¼¡

Naturally when I compare the character it fails.
Strangely when I unpacked the character it appears as 65313 which is the
correct utf8 number for A.

Any way around this?

thanks.

adad · June 26, 2009, 12:11am

On Jun 25, 2009, at 14:29, Ad Ad wrote:

A is stored as ï¼¡

Naturally when I compare the character it fails.
Strangely when I unpacked the character it appears as 65313 which is
the
correct utf8 number for A.

Any way around this?

Well, ï¼¡ is “Fullwidth Latin Capital Letter A” from the “Hiragana and
Katakana” category (Unicode FF21) whereas A is “Latin Capital Letter
A” from the “Latin” category (Unicode 0041).

I don’t know of a way to translate between the two categories, but
maybe that will help.

adad · June 26, 2009, 5:22am

Although I haven’t tried it myself, I did a search for
e$BA43QH>3QJQ49e(B and
found this page.
It appears people use jcode and tr to solve this problem.

http://www.eml.ele.cst.nihon-u.ac.jp/~momma/wiki/wiki.cgi/Ruby/全角半角変換.html
http://blog.grayproductions.net/articles/the_kcode_variable_and_jcode_library

2009/6/25 Eric H. [email protected]:

adad · June 26, 2009, 4:38pm

James Rubingh wrote:

Although I haven’t tried it myself, I did a search for
e$BA43QH>3QJQ49e(B and
found this page.
It appears people use jcode and tr to solve this problem.

http://www.eml.ele.cst.nihon-u.ac.jp/~momma/wiki/wiki.cgi/Ruby/全角半角変換.html
http://blog.grayproductions.net/articles/the_kcode_variable_and_jcode_library

2009/6/25 Eric H. [email protected]:

brilliant!
str.tr!(‘ï½-ï½šï¼¡-ï¼º’,‘a-zA-z’) worked like a charm.