On 6/25/06, Phillip H. [email protected] wrote:
Here you contradict yourself. Regexes are string (character)
operations, and you want them on byte arrays. So the concepts aren’t
Similarily, when you read part of a file, and use it to determine
what kind of file it was you do not want to convert that part into
another class or re-read it because somebody decided String and
ByteVector are separate.
Why not? When I read CGI params I get them as strings, but if I want
to add them together I need to convert them to integers, because
someone decided that “1” != 1. This is a good thing, so you don’t get
“5 purple elephants”+“3 monkeys” = 7, like you do in PHP.
Sorry, but “reading” CGI params is a red herring. You may get it as one
thing and then convert it to something else.
Likewise, when you read from a file/socket/whatever you might not be
getting a real string, you might be getting a byte array. They are
fundamentally different things, a byte array may happen to contain
text at some point, but some time later it may be just a stream of
data. Conversely a String always contains human-readble text in
whatever encoding you want.
Okay. What class should I get here?
data = File.open(“file.txt”, “rb”) { |f| f.read }
Under the people who want separate ByteVector and String class, I’ll
need two APIs:
st = File.open(“file.txt”, “rb”) { |f| f.read_string }
bv = File.open(“file.txt”, “rb”) { |f| f.read_bytes }
Stupid, stupid, stupid, stupid. If I have guessed wrong about the
contents of file.txt, I have to rewind and read it again. Better to
always read as bytes and then say, “this is actually UTF-8”. This
would be as stupid in C++, Java, or C#:
class File
{
bool read(string& st);
bool read(byte_vector& bv);
}
Yes, I can’t actually read into the item, but have to call an accessor.
Moronic design, mostly because I can’t do:
class File
{
string read(void);
byte_vector read(void);
}
That would help in static languages, but they can’t do that – and Ruby
can’t do it either, since variables are just labels.
As someone who has to work with Unicode in PHP, I’d say it’s important
to separate the types. If you want to display something to a user you
have to know what it is, but when you’re reading a file you don’t
care, unless you know what’s in it.
The problem here is not unification. The problem here is that PHP is
stupid. It is generally recognised that Ruby’s API decisions are much
smarter than most other languages, and this is a good example of where
this would happen.
A Unicode String could be a subclass of the byte array with some
niceties for dealing with multibyte characters. Just a thought.
Unnecessary and overcomplex.
-austin