I posted a similar question in the rails group but this is more specific
to ruby 1.8.2.
I read that ruby has problems with multibyte charsets. And I read that
there might be some problems with ISO-8859-15 related to REXML. And I
read that regex might have problems with ISO-8859-1.
Given the above problems (or rumors), which encoding is recommended for
use with ruby 1.8.2?
UTF-8
ISO-8859-1
ISO-8859-15
I’m certain both UTF-8 and ISO-8859-15 will support all the characters
I’ll ever use. And ISO-8859-1 only lacks a couple characters I might
use on very rare occassions so I’m just looking for a charset that will
cause fewest problems with Ruby.
I posted a similar question in the rails group but this is more specific
to ruby 1.8.2.
I read that ruby has problems with multibyte charsets. And I read that
there might be some problems with ISO-8859-15 related to REXML. And I
read that regex might have problems with ISO-8859-1.
Given the above problems (or rumors), which encoding is recommended for
use with ruby 1.8.2?
None. They all cause problems. With utf-8 most string functions won’t
work correctly (probably including regexps). There are special
extensions to work around this to some extent.
ISO-8858-1 and ISO-8859-15 should be pretty much the same. They are
simple 8-bit so the string functions that expect 1-byte characters
work. They won’t allow you to use slightly more exotic characters
(like greek letters for maths, …).
String and Regexp handles all of them for most of the cases. But
upper/lower case handling for non ASCII alphabets are not supported.
Use -Ku for UTF-8 and -Kn for ISO-8859-*.
Length and indexing do not work very well with utf-8.
In message “Re: Which encoding causes fewest problems in Ruby 1.8.2?”
on Sun, 11 Jun 2006 10:12:45 +0900, Jim S. [email protected]
writes:
|Given the above problems (or rumors), which encoding is recommended for
|use with ruby 1.8.2?
|
|UTF-8
|ISO-8859-1
|ISO-8859-15
String and Regexp handles all of them for most of the cases. But
upper/lower case handling for non ASCII alphabets are not supported.
Use -Ku for UTF-8 and -Kn for ISO-8859-*.