On 6/28/06, Julian ‘Julik’ Tarkhanov [email protected] wrote:
On 28-jun-2006, at 20:36, Austin Z. wrote:
Except that @top is guaranteed to not have an encoding – at least it
damned well better not – and @top.bytes is redundant in this case. I
see no reason to access #bytes unless I know I’m dealing with a
You never know if you are, that’s the problem. And no, it’s NOT
redundant. You should just get used to the fact that all strings
might become multibyte.
How can you continue to be so wrong? All strings will not become
multibyte. Matz seems pretty committed to the m17n String, which means
that you’re not going to get a Unicode String. This is good.
When you’re not getting a String that is limited to Unicode, you don’t
need a separate ByteArray. This is also good.
Worse, why would “Not PNG.” be treated as Unicode under your scheme
but “\x89PNG\x0d\x0a\x1a\x0a” not be? I don’t think you’re thinking
@top[0, 8] is sufficient when you can guarantee that sizeof(char) ==
You can NEVER guarantee that. N e v e r. More languages and more
people use multibyte characters by default than all ASCII users
Again, you are wrong. Horribly so. I can guarantee that sizeof(char)
== sizeof(byte) if String#encoding is a single-byte encoding or is “raw”
(or “binary”, whichever Matz uses).
It seems very pity but you still approcah multibyte strings as
It seems very sad, but you still aren’t willing to comprehend what I’m
On “raw” strings, this is always the case.
The only way to distinguish “raw” strings from multibyte strings is to
subclass (which sucks for you as a byte user and for me as strings
Incorrect. I do not need to have:
Never have. Never will.
What you’re not understanding – and at this point, I am really
thinking that it’s willful – is that I don’t consider multibyte strings
“special.” I consider all encodings special. But I also don’t think I
need full classes to support them. (I know for a fact that I don’t.)
What’s special is the encoding, not the string. Any string – including
a UTF-32 string – is merely a sequence of bytes. The encoding tells
me how large my “characters” are in terms of bytes. The encoding can
tell me more than that, too. This means that an encoding is simply a
lens through which that sequence of bytes gains meaning.
Therefore, I can do:
s = b"Wh\xc3\xa4t f\xc3\xb6\xc3\xb6l\xc3\xafshn\xc3\xabss."
s.encoding = :utf8
s # “Whät föölïshnëss.”
Gee. No subclass involved.
A substring of a “binary” (unencoded) string is simply the bytes
We’re not talking rocket science here. We’re talking being smart,
instead of being lemmings who apparently want Ruby to be more like Java.
On all strings, @top[0, 8] would return the appropriate number of
characters – not the number of bytes. It just so happens on binary
strings that the number of characters and bytes is exactly the same.
This is a very leaky abstraction - you can never expect what you will
get. What’s the problem with having bytes as an accessor?
What’s the need, if I know that what I’m testing against is going to
be dealt with bytewise? You’re expecting programmers to be stupid. I’m
expecting them to be smarter than that. Uninformed, perhaps, but not
(And I would know in this case because the ultimate API that calls this
will have been given image data.)
What I’m arguing is that while the pragma may work for the
less-common encodings, both binary (non-)encoding and Unicode
(probably UTF-8) are going to be common enough that specific literal
constructors are probably a very good idea.
Python proved that to be wrong - both the subclassing part and the
Python proved squat. Especially since you continue to think that I’m
talking about subclassing. Which I’m not and never have been.
The fact that you have to designate Unicode strings with literals is a
bad decision and I can only suspect that it has to do with compiler
intolerance, and the need to do preprocessing.
Have to nothing. You’re simply not willing to understand anything that
doesn’t bow to the god of Unicode. This has nothing to do with your
stupid assumptions, here. This has everything to do with being smarter
than you’re apparently wanting Ruby to be.
The special literals are convenience items only. Syntax sugar. The real
magic is in the assignment of encodings. And those are always special,
whether you want to pretend such or not.
I’m through with trying to argue with you and a few others who aren’t
listening and suggesting the same backwards stuff over and over again
without considering that you might be wrong. Contrary to what you might
believe, I have looked at a lot of this stuff and have really reached
the point where Unicode-only and separate class hierarchies is a waste
of everyone’s time and energy.
Argue for first-class Unicode support. But you should do so within the
framework which Matz has said he prefers (m17n String and no separate
byte array). Think about API changes that can make this valuable. I
think that Matz has settled on the basic data structure, though, and
it’s a fight you probably won’t win with him. Since, as he pointed out
to Charles Nutter, he’s in the percentage of humanity which needs to
deal with non-Unicode more than it needs to deal with Unicode.