However, it’s breaking for me: è is turned into “yy”. I think this is
to do with the number of bytes used: the first string passed to tr()
uses 2 bytes per character while the second uses 1 byte per character:
Assuming this is the problem, can anyone tell me how to get around it?
I know next to nothing about character encoding: i tried converting both
translation strings to utf8 with String#toutf8, but that didn’t make any
difference.
With ruby 1.9 your code works fine without modifications, with ruby 1.8
and
it’s support for unicode (or lack of thereof) it might be quite a
problem to
get it working.
Assuming this is the problem, can anyone tell me how to get around it?
I know next to nothing about character encoding: i tried converting both
translation strings to utf8 with String#toutf8, but that didn’t make any
difference.
UTF-8 is variable length encoding, the first half of ascii (which
includes
a-zA-Z) is not encoded at all (=1 byte), anything other is encoded as
2-4
byte chars. Both of the strings are therefore valid UTF-8, but ruby
1.8’s tr
can’t operate on character level, only on byte level.
With ruby 1.9 your code works fine without modifications, with ruby 1.8
and
it’s support for unicode (or lack of thereof) it might be quite a
problem to
get it working.
ah…i’m a bit scared to change our project over to ruby 1.9 (i didn’t
know there was a 1.9) to solve this problem. I ended up just picking
the most commonly used accents and doing individual gsubs on the strings
to swap them out. Feels dirty but it works.