Ruby 1.8 vs 1.9

Peter_Pincus · November 25, 2010, 4:02pm

Phillip G. wrote in post #963815:

The IEEE standard, however, does not define how mathematics work.
Mathematics does that. In math, x_0/0 is undefined. It is not
infinity…

What psychological anomaly causes creationists keep saying that there
are no transitional fossils even after having been shown transitional
fossils? We might pass it off as mere cult indoctrination or
brainwashing, but the problem is a more general one.

We also see it happening here in Mr. Gawlowski who, after being given
mathematical facts about infinity, simply repeats his uninformed
opinion.

“The Dunning-Kruger effect is a cognitive bias in which an unskilled
person makes poor decisions and reaches erroneous conclusions, but
their incompetence denies them the metacognitive ability to realize
their mistakes.” (Dunning–Kruger effect - Wikipedia)

Here is my initial response to Mr. Gawlowski. Let’s see if he ignores
it again (as a creationist ignores transitional fossils).

It is perfectly reasonable, mathematically, to assign infinity to
1/0. To geometers and topologists, infinity is just another
point. Look up the one-point compactification of R^n. If we join
infinity to the real line, we get a circle, topologically. Joining
infinity to the real plane gives a sphere, called the Riemann
sphere. These are rigorous definitions with useful results.

I’m glad that IEEE floating point has infinity included, otherwise I
would run into needless error handling. It’s not an error to reach
one pole of a sphere (the other pole being zero).

Infinity is there for good reason; its presence was well-considered
by the quite knowledgeable IEEE designers.

Peter_Pincus · November 25, 2010, 4:08pm

On Nov 25, 2010, at 8:48 AM, Philip R. wrote:

UTF-8, UTF-16, and UTF-32 are encodings of Unicode code points. They
are all capable of representing all code points. Nothing in this
discussion is a subset of anything else.

This is all really interesting but I don’t understand what you mean by “code
points” - is what you have said expressed diagrammatically somewhere?

Do these explanations help?

http://blog.grayproductions.net/articles/what_is_a_character_encoding

http://blog.grayproductions.net/articles/the_unicode_character_set_and_encodings

James Edward G. II

Peter_Pincus · November 25, 2010, 4:18pm

Dear Yuri,

maybe being a bit more friendly and respecting would help this
discussion.

Am 25.11.10 16:02, schrieb Yuri T.:

Peter_Pincus · November 25, 2010, 4:01pm

On Thu, Nov 25, 2010 at 3:48 PM, Philip R. [email protected]
wrote:

You are confusing us.

UTF-8, UTF-16, and UTF-32 are encodings of Unicode code points. They
are all capable of representing all code points. Nothing in this
discussion is a subset of anything else.

This is all really interesting but I don’t understand what you mean by “code
points” - is what you have said expressed diagrammatically somewhere?

A “code point” is basically a unique identifier of a special symbol.

http://www.unicode.org/
http://www.unicode.org/charts/About.html
http://www.unicode.org/charts/

HTH

robert

Peter_Pincus · November 25, 2010, 6:06pm

Robert K. wrote:

Since UTF-8 is a subset of UTF-16, which in turn is a subset of
UTF-32,

I tried to find more precise statement about this but did not really
succeed. I thought all UTF-x were just different encoding forms of
the same universe of code points.

Yes this is correct. Many people don’t get the difference between a
charset and the corresponding encoding.

Unicode is a charset not with one encoding but with many encodings. So
we talk about the same characters and different mappings of this
characters to bits and bytes. This mapping is a simple table which you
can write down to a paper sheet and the side with the characters will
always be the same with UTF-8, UTF-16 and UTF-32.

The encodings UTF-8, UTF-16 and UTF-32 were build for different
purposes. The number after UTF says nothing about the maximum length in
first place it says something about the shortest length (and often about
the usual length if you use this encoding in that situation which it was
build for).

So if people coming from the ascii world use UTF-8, many encodings
(mapping of a character to a sequence of bits and bytes) of characters
will be inside one byte.

UTF-32 is a bit different, in this case 32 bits are a static size of
each encoded character.

and Unicode is future-proofed

Oh, so then ISO committee actually has a time machine? Wow!

Read as has much encoding space left and nobody of us knows how you
could fill the whole space. But humans tend to be wrong.

Regards
Oli

Peter_Pincus · November 25, 2010, 6:25pm

Phillip G. wrote:

I’m quite aware that IEEE 754 defines the result of x_0/0 as infinity.

The point is that you can’t guarantee that you has a 0 with floiting
point aithmetics. Every value you have to read as “as close as possible
to the meant value for this machine and this number size”.

The machines are not perfect they can’t work mathematical correct in
many situations.

And you have to deal with this situation - all people doing numeric
things know that. Doing numeric calculations with a computer means to
calculate something as near as possible in the given environment and
requirements.

So in this sense is dividing a number through zero in real computers
dividing something which is close to the number, which I mean through
something which is close to zero.

And in fact it’s not a bad idea to define if you have something which is
very close to zero (because you don’t know if it’s exactly zero), you
should treat it as very close to zero and not zero itself.

In a perfect world with perfect computers which has infinite registers,
infinite storage and infinite fast CPUs computers would know that zero
is equal to zero. But in this world computer knows only that zero is
close to zero.

Regards
Oli

Peter_Pincus · November 25, 2010, 5:40pm

On Thu, Nov 25, 2010 at 4:02 PM, Yuri T. [email protected]
wrote:

“The Dunning-Kruger effect is a cognitive bias in which an unskilled
person makes poor decisions and reaches erroneous conclusions, but
their incompetence denies them the metacognitive ability to realize
their mistakes.” (Dunning–Kruger effect - Wikipedia)

Your insult aside:

I’m quite aware that IEEE 754 defines the result of x_0/0 as infinity.
That is not, however, correct in a mathematical sense.
IEEE 754, for example, also defines the result of the square root of
-1 as an error. However, the result of, say the square root of -x is
x* j. That’s called complex numbers, BTW.

Anyway:
Thanks, James, for correcting me on UTF-8, etc.

–
Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

Peter_Pincus · November 25, 2010, 7:30pm

James Edward G. II wrote:

On Nov 24, 2010, at 8:40 PM, Jörg W Mittag wrote:

The only two Unicode encodings that are fixed-width are the obsolete
UCS-2 (which can only encode the lower 65536 codepoints) and UTF-32.
And even UTF-32 would have the complications of “combining characters.”

… and zero-width characters and different representations of the same
character and …

But that is a whole different can of worms.

jwm

Peter_Pincus · November 25, 2010, 6:31pm

On Thu, Nov 25, 2010 at 6:05 PM, Oliver Schad
[email protected] wrote:

implementation of Unicode though, I am not 100% sure.

Since UTF-8 is a subset of UTF-16, which in turn is a subset of
UTF-32,

I tried to find more precise statement about this but did not really
succeed. I thought all UTF-x were just different encoding forms of
the same universe of code points.

Yes this is correct. Many people don’t get the difference between a
charset and the corresponding encoding.

Btw, this happens all the time: for example, people often do not grasp
the difference between “point in time” and “representation of a point
in time in a particular time zone and locale”. This becomes an issue
if you want to calculate with timestamps at or near the DST change
time.

build for).
More precisely the number indicates the “encoding unit” (see my quote
in an earlier posting). One could think up an encoding with encoding
unit of 1 octet (8 bits, 1 byte) where the shortest length would be 2
octets. Example

1st octet: number of octets to follow
2nd and subsequent octets: encoded character

The shortest length would be 2 octets, but the length would increase
by 1 octet so the encoding unit is 1 octet.

Cheers

robert

Peter_Pincus · November 25, 2010, 8:04pm

On Thu, Nov 25, 2010 at 6:25 PM, Oliver Schad
[email protected] wrote:

Phillip G. wrote:

I’m quite aware that IEEE 754 defines the result of x_0/0 as infinity.

The point is that you can’t guarantee that you has a 0 with floiting
point aithmetics.

You can. In mathematics. The problem is, as you pointed out, that a 32
bit (or 64 bit, or n bit where n is finite) CPU isn’t able to present
floating point numbers accurately enough.

However, 0 = 0.0 (no matter how much Yuri moves the goal posts).

You run into this issue once you leave the defined space for IEEE
Floats (~10^-44 for negative floats), then you enter very wonky
areas.

But a flat 0.0 is only non-zero for computers. But not in maths. Ergo,
from a mathematical standpoint, the IEEE standard is broken.

Every value you have to read as “as close as possible
to the meant value for this machine and this number size”.

IOW: It’s a limit (and an approximate one at that, but it’s “good
enough” for pretty much all purposes).

The machines are not perfect they can’t work mathematical correct in
many situations.

Only in floats, and with integers that are larger than the total
address space. But then we have the problem of the required CPU time
to consider.

And you have to deal with this situation - all people doing numeric
things know that. Doing numeric calculations with a computer means to
calculate something as near as possible in the given environment and
requirements.

Indeed. The problem is if the desired accuracy is much more exact than
the IEEE float defines.

So in this sense is dividing a number through zero in real computers
dividing something which is close to the number, which I mean through
something which is close to zero.

Not quite. Integer devision is behaving properly, Float isn’t, even
with only one significant digit.

And in fact it’s not a bad idea to define if you have something which is
very close to zero (because you don’t know if it’s exactly zero), you
should treat it as very close to zero and not zero itself.

Well, infinity isn’t close to zero, either.

–
Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

Peter_Pincus · November 25, 2010, 7:43pm

Phillip G. wrote in post #963922:

I’m quite aware that IEEE 754 defines the result of x_0/0 as
infinity. That is not, however, correct in a mathematical sense.

Yes, it is. The creationist analogy continues, I see. Even when given
facts which refute his position a second and third time, Mr. Gawlowski
continues to ignore them while simply repeating his opinion.

He twice ignored my suggestion to look up one-point
compactifications. He twice ignored my explanation of joining oo to a
line to make a circle, and of joining oo to a plane to make a
sphere. He also ignored the same information found in the link
provided by Adam.

It’s not coincidence that the IEEE operations on oo match the rules
for the real projective line.

The only exception is the result of oo + oo, however in computing it
is convenient for that to be oo rather than undefined. Of course
there’s the +/- distinction for oo (also convenient in computing)
which is eliminated by identifying +oo with -oo, as one would expect
for a circle. There are perhaps other technicalities but generally
IEEE operations are modeled after the real projective line. This is
useful.

Peter_Pincus · November 25, 2010, 9:19pm

Phillip, regarding defining 1/0 you said,

It cannot be infinity. It does, quite literally not compute. There’s
no room for interpretation, it’s a fact of (mathematical) life that
something divided by nothing has an undefined result. It doesn’t
matter if it’s 0, 0.0, or -0.0. Undefined is undefined.

Nonsense. These claims are roundly refuted by

IEEE floating point operations on oo and -oo match those precisely.
IEEE models the extended real number line.

I’m quite aware that IEEE 754 defines the result of x_0/0 as
infinity. That is not, however, correct in a mathematical sense.

Nonsense. Infinity defined this way has solid mathematical meaning and
is established on a firm foundation, described in the link above.

The IEEE standard, however, does not define how mathematics work.
Mathematics does that. In math, x_0/0 is undefined. It is not
infinity…

Right, IEEE does not define how mathematics works. IEEE took the
mathematical definition and properties of infinity and incorporated it
into the standard. Clearly, you were unaware of this and repeatedly
ignored the information offered to you about it.

Quote Wikipedia:
“Unlike most mathematical models of the intuitive concept of ‘number’,
this structure allows division by zero [snip formula], for nonzero a.
This structure, however, is not a field, and division does not retain
its original algebraic meaning in it.”

Emphasis mine.

That sentence. You evidently do not understand what it means. It does
not mean what you think it means.

Your argument is also called “moving the goal posts”. But even if
we consider it: non-algebraic systems are not something 99% of all
non-professional-mathematicians engage in (so, we can toss in a “no
true Scotsman” fallacy into the bargain).

Nonsense. Every person who has obtained a result of +oo or -oo from a
floating point calculation has engaged in it. A result of +oo or -oo
is often a meaningful answer and not an error. And even when it is an
error, it gives us information on what went wrong (and which direction
it went wrong in). It’s entertainingly ironic that you attribute
“moving the goalposts” and the no true Scotsman fallacy to the wrong
person in this conversation.

Thanks for another great demonstration of the Dunning-Kruger effect.

Peter_Pincus · November 25, 2010, 8:10pm

On Thu, Nov 25, 2010 at 7:43 PM, Yuri T. [email protected]
wrote:

line to make a circle, and of joining oo to a plane to make a
there’s the +/- distinction for oo (also convenient in computing)
which is eliminated by identifying +oo with -oo, as one would expect
for a circle. There are perhaps other technicalities but generally
IEEE operations are modeled after the real projective line. This is
useful.

Quote Wikipedia:
“Unlike most mathematical models of the intuitive concept of ‘number’,
this structure allows division by zero [snip formula], for nonzero a.
This structure, however, is not a field, and division does not retain
its original algebraic meaning in it.”

Emphasis mine. Your argument is also called “moving the goal posts”.
But even if we consider it: non-algebraic systems are not something
99% of all non-professional-mathematicians engage in (so, we can toss
in a “no true Scotsman” fallacy into the bargain).

–
Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

Peter_Pincus · November 25, 2010, 10:19pm

This discussion has little to do with Ruby at this point. Maybe you
folks
could take it offline, please?

Happy Thanksgiving, everybody!

Jos

Peter_Pincus · November 25, 2010, 10:14pm

On Thu, Nov 25, 2010 at 9:19 PM, Yuri T. [email protected]
wrote:

Phillip, regarding defining 1/0 you said,

It cannot be infinity. It does, quite literally not compute. There’s
no room for interpretation, it’s a fact of (mathematical) life that
something divided by nothing has an undefined result. It doesn’t
matter if it’s 0, 0.0, or -0.0. Undefined is undefined.

Nonsense. These claims are roundly refuted by

Actually, they aren’t I said for “x_0/0 the result is undefined”, and
the Extended Real number has the caveat that x_0 must be != 0.

I’m quite aware that IEEE 754 defines the result of x_0/0 as
infinity. That is not, however, correct in a mathematical sense.

Nonsense. Infinity defined this way has solid mathematical meaning and
is established on a firm foundation, described in the link above.

A firm foundation that is not used in algebraic math.

The IEEE standard, however, does not define how mathematics work.
Mathematics does that. In math, x_0/0 is undefined. It is not
infinity…

Right, IEEE does not define how mathematics works. IEEE took the
mathematical definition and properties of infinity and incorporated it
into the standard. Clearly, you were unaware of this and repeatedly
ignored the information offered to you about it.

It took a definition and a set of properties. If we are splitting
hairs, let’s do it properly, at least.

Quote Wikipedia:
“Unlike most mathematical models of the intuitive concept of ‘number’,
this structure allows division by zero [snip formula], for nonzero a.
This structure, however, is not a field, and division does not retain
its original algebraic meaning in it.”

Emphasis mine.

That sentence. You evidently do not understand what it means. It does
not mean what you think it means.

You do know what algebra is, yes?

“moving the goalposts” and the no true Scotsman fallacy to the wrong
person in this conversation.

Pal, in algebraic maths, division by zero is undefined. End of story.
We are talking about algebraic math here (or we can extend this to
include complex numbers, which IEEE 754 doesn’t deal with, either),
and not special areas of maths that aren’t used in outside of research
papers. Not to mention that I established the set of Irrational
numbers as the upper bound quite early on.

The and your argument “if you use floats on a computer you use a
non-algebraic system, therefore you use a non-algebraic system when
using a computer” is circular.

The result of x_0/0.0 = infinity is as meaningful as “0/0 = NaN”. Any
feedback by a computer system is meaningful (by definition), and can
be used to act on this output:

result = “Error: Division by zero” if a / 0.0 == Infinity

Done.

Thanks for another great demonstration of the Dunning-Kruger effect.

Ah, the irony.

–
Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

Peter_Pincus · November 25, 2010, 11:28pm

James Edward G. II wrote:

UTF-8, UTF-16, and UTF-32 are encodings of Unicode code points. They are all
capable of representing all code points. Nothing in this discussion is a subset
of anything else.

To add to this, Unicode 3 uses the codespace from 0 to 0x10FFFF (not
0xFFFFFFFF),
so it does cover all the Oriental characters (unlike Unicode 2 as
implemented in
earlier Java versions, which only covers 0…0xFFFF). It even has
codepoints for
Klingon and Elvish!

UTF-8 requires four bytes to encode a 21-bit number (enough to encode
0x10FFFF)
though if you extend the pattern (as many implementations do) it has a
31-bit gamut.

UTF-16 encodes the additional codespace using surrogate pairs, which is
a pair of
16-bit numbers each carrying a 10-bit payload. Because it’s still a
variable length
encoding, it’s just as painful to work with as UTF-8, but less
space-efficient.

Both UTF-8 and UTF-16 encodings allow you to look at any location in a
string and step
forward or back to the nearest character boundary - a very important
property that
was missing from Shift-JIS and other earlier encodings.

If you go back to 2003 in the archives, you’ll see I engaged in a long
and somewhat
heated discussion about this subject with Matz and others back then. I’m
glad we
finally have a Ruby version that can at least do this stuff properly,
even though
I think it’s over-complicated.

Clifford H…

Peter_Pincus · November 25, 2010, 11:41pm

Robert K. wrote:

Btw, this happens all the time: for example, people often do not grasp
the difference between “point in time” and “representation of a point
in time in a particular time zone and locale”.

… on a particular relativistic trajectory Seriously though,
time dilation effects are accounted for in every GPS unit, because
the relative motion of the satellites gives each one its own timeline
which affects the respective position fixes.

Clifford H.

Peter_Pincus · November 26, 2010, 1:43am

On Wednesday, November 24, 2010 08:40:22 pm Jrg W Mittag wrote:

David M. wrote:

Java at least did this sanely – UTF16 is at least a fixed width. If
you’re going to force a single encoding, why wouldn’t you use
fixed-width strings?

Actually, it’s not.

Whoops, my mistake. I guess now I’m confused as to why they went with
UTF-16
– I always assumed it simply truncated things which can’t be
represented in
16 bits.

You can produce corrupt strings and slice into a half-character in
Java just as you can in Ruby 1.8.

Wait, how?

I mean, yes, you can deliberately build strings out of corrupt data, but
if
you actually work with complete strings and string concatenation, and
you
aren’t doing crazy JNI stuff, and you aren’t digging into the actual
bits of
the string, I don’t see how you can create a truncated string.

The whole point of having multiple encodings in the first place is that
other encodings make much more sense when you’re not in the US.

There’s also a lot of legacy data, even within the US. On IBM systems,
the standard encoding, even for greenfield systems that are being
written right now, is still pretty much EBCDIC all the way.

I’m really curious why anyone would go with an IBM mainframe for a
greenfield
system, let alone pick EBCDIC when ASCII is fully supported.

And now there’s a push for a One Encoding To Rule Them All in Ruby 2.
That’s literally insane! (One definition of insanity is repeating
behavior and expecting a different outcome.)

Wait, what?

I’ve been out of the loop for awhile, so it’s likely that I missed this,
but
where are these plans?

Peter_Pincus · November 26, 2010, 12:52pm

On Fri, Nov 26, 2010 at 1:42 AM, David M. [email protected]
wrote:

I’m really curious why anyone would go with an IBM mainframe for a greenfield
system, let alone pick EBCDIC when ASCII is fully supported.

Because that’s how the other applications written on the mainframe the
company bought 20, 30, 40 years ago expect their data, and the same
code still runs.

Legacy systems like that have so much money invested in them, with
code poorly understood (not necessarily because it’s bad code, but
because the original author has retired 20 years ago), and are so
mission critical, that a replacement in a more current design is out
of the question.

Want perpetual job security? Learn COBOL.

–
Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

Peter_Pincus · November 26, 2010, 1:40am

On Nov 25, 2010, at 4:25 PM, Clifford H. wrote:

Both UTF-8 and UTF-16 encodings allow you to look at any location in a string
and step forward or back to the nearest character boundary - a very important
property that was missing from Shift-JIS and other earlier encodings.

This also provides a kind of simple checksum for validating the
encoding. I love that feature.

James Edward G. II