On 3/12/06, Austin Z. [email protected] wrote:
instead of LOWERCASE E ACUTE ACCENT).
simple type. You cannot have character arrays. And no library can
completely wrap this inconsistency and isolate you from dealing with
it.If you’re simply dealing with text, you don’t need arrays of characters.
Frankly, if you don’t care what Windows, OS X, and ICU use, then you’re
completely ignorant of the real world and what is useful and necessary
for Unicode.
The native encoding is bound to be different between platforms. I want
to use an encoding that I like on all platforms, and convert the
strings for filenames or whatever to fit the current platform. That is
why I do not care what a particular platform you name uses.
By the way, you are wrong – you can have arrays of characters. It’s
just that those characters are not guaranteed to be a fixed length. It
will be the same with Ruby moving forward.
Yes, you can have arrays of strings. Nice. But to turn a text string
into a string of characters you have to turn it into an array of
strings. Instead of just indexing an array of basic types that
represent the characters.
And there is a need to look at the actual characters at times.There
are programs that actually process the text, not only save what the
user entered in a web form. I can think of text editors, terminal
emulators, and linguistic tools. I am sure there are others.
Even if the library is performant with multiword characters it is
complex. That means more prone to errors. Both in itself and in the
software that interfaces it.Nice theory. What reduces the number of errors is no longer thinking in
terms of arrays of characters, but in terms of text strings.
Or strings of strings of 16-bit words, packed? No, thanks. I want to
avoid that.
You say that utf-16 is more space-conserving for languages like
Japanese. Nice. But I do not care. I guess text consumes very small
portion of memory on my system. Both ram and hardrive. I do not care
if that doubles or quadruples. In the very few cases when I want to
save space (ie when sending email attachments) I can use gzip. It can
even compress repetitive text which no encoding can.If you don’t care, then why are you arguing here? The Japanese – which
would include Matz – do care.
I do not care about the space inefficiency. Be it inefficiency in
storing Czech text, Japanese text, English text, or any other. It has
nothing to do with the fact I do not speak Japanese.
I think that most of my ram and hardrive space is consumed by other
stuff than text. For that reason I do not care about the relative
efficiency of text encoding. It will have minimal impact on the
performance or amounts of memory consumed on the system. And there is
always the possibility to compress the text.
subset of UTF-8).
Hmm, so you call the possibility to choose your encoding living in
stone age. I would call it living in reality. There are various
encodings out there.Yes, it’s the stone age. The filesystem should allow you to see things
in UTF-8 or SJIS or EUC-JP if you want, but internally it should be
using something a hell of a lot smarter than those encodings. This is
what HFS+ and Windows allow.
Well, the libc could store the strings in some utf-* encoding on the
disk, and translate that based on the current locale. I wonder if that
is against POSIX or not.
But it is not done, and it is wrong. There are problems of this kind
on Windows as well. It is still not recommended to use non-ascii
characters in filenames around here…
Thanks
Michal