Why not? What reason did he give?
The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.
As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France. Also, in Turkish,
there are four different cases of ‘i’, not just two… and which is
correct depends on the jurisdiction.
Determining the locale in a correct way is really, really hard. Tim
Bray says it’s basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.
He shared a story about the original version of XML. At the time, it
was case-insensitive. The very first XML library was running horribly
slow. After profiling, they found that it was spending 90% of its
time in the Java downcase routine. After that, XML was made
case-sensitive.
Bray says it’s basically impossible. Also, all of these rules make
any decent upcase/downcase function ruinously slow.
He shared a story about the original version of XML. At the time, it
was case-insensitive. The very first XML library was running horribly
slow. After profiling, they found that it was spending 90% of its
time in the Java downcase routine. After that, XML was made
case-sensitive.
Thanks Wilson, that explains everything. I’d never thought about
problems like that.
Cheers, Mike
The problem is that proper upcasing and downcasing of characters is
locale-dependent, not just encoding or language-dependent.
Yes, this is basically it.
Tim B. feels that case changing is more or less impossible in the
practical sense. When you get around to downcasing that string a
user entered into your web form a month back, are you going to know
if that string was encoded in a Turkish local (critical info if it
contains an “i”)?
Even if it were possible, Tim suggests that it’s a performance
killer. See Java, which tries to address as many rules as it
possibly can, for proof.
James Edward G. II
one caveat that tim did not mention, and which is quite applicable to
many
small sites, is that you simply don’t always have to care. for
instance, if
your site is in english only to don’t have to care. now, i’m not saying
that
is a good idea - but a whole tons of successful business models work
that way:
many successful newspapers, for example, publish in english only. the
trick
is knowing if that’s what you want up front. if that’s unacceptable
then it
does seem like you’re screwed.
“no”.capitalize, Tim is right, but ruby is a “logical” language for
me. Trying to accommodate for 6800 languages with various character
types is not logical. Unfortunately, not manufacturing bikini’s
because some people can’t wear them doesn’t seem like the optimal
solution in my opinion. If someone wants to upcase Latin, write a ruby
library and share it.
“no”.capitalize, Tim is right, but ruby is a “logical” language for
me. Trying to accommodate for 6800 languages with various character
types is not logical. Unfortunately, not manufacturing bikini’s
because some people can’t wear them doesn’t seem like the optimal
solution in my opinion. If someone wants to upcase Latin, write a ruby
library and share it.
I don’t think that addresses what I was asking about, i.e., whether
French accents on capital letters differ across France.
As examples, he mentioned that the uppercase version of accented
characters varies from area to area in France.
This is way off topic, but I’d like to know where he heard that.
It’s
the first time for me, and I’m a native french speaker…
That’s very interesting. So Tim is mistaken?
I’ve been told that common usage differs in Québec. -Tim
It’s entirely possible I’m mis-remembering that part of Tim’s talk.
Anyone else remember exactly what he said? I know he had a slide that
had an accented ‘e’ character on it.
I don’t think that addresses what I was asking about, i.e., whether
French accents on capital letters differ across France.
It’s entirely possible I’m mis-remembering that part of Tim’s talk.
Anyone else remember exactly what he said? I know he had a slide that
had an accented ‘e’ character on it.
That’s the way I remember it – he said that a lowercase accented
character was sometimes uppercased differently, and it varied
“from district to district.”
Earlier tonight I think he mentioned Quebec (but with a proper accent
that I don’t know how to type).
I wouldn’t be surprised if the French sometimes sneered a little at
the French spoken in Quebec, the way (sometimes) Brits make fun of
Americans, or Spanish (or Colombians) make fun of Mexicans.
But heck: Even if he was totally mistaken, his point still stands –
that capitalization is an unholy mess and is to be avoided. (Actually
he might have stated it more strongly.) Mistaken or not on that one
point, I thought the talk was excellent and informative.
Tim: Read my ch 4 when you can and give me your opinion.