On 10/22/06, Wilson B. [email protected] wrote:
The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.
As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.
No, not depending on jurisdiction in France. In French French, one
would capitalize être as Etre. In Canadian French, one would
capitalize it as Être.
Also, in Turkish, there are four different cases of ‘i’, not just two… and which is
correct depends on the jurisdiction.
Not quite. There are two different ‘i’ letters: one with a dot, one
without. One is capitalized with a dot and one is capitalized without
Also, the German eszet (ß, as in Schloß) would be capitalized as
SCHLOSS, but downcasing that would be schloss, not necessarily schloß.
(Actually, and the Germans here will correct me on this I’m sure, I
think it would always be Schloss or Schloß becaus the leading S would
not be lowercased in proper German. Looking at some German webpages
Determining the locale in a correct way is really, really hard. Tim
Bray says it’s basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.
Not impossible, just fraught with errors and performance issues. One
would not only have to have the locale lookup stuff, but one would
have to do statistical analysis to get better than mostly wrong with
anything but English.