Forum: Ruby Ruby 1.8 vs 1.9

1f7d77c75287cb178146b13548323f29?d=identicon&s=25 Peter Pincus (Guest)
on 2010-11-23 16:51
(Received via mailing list)
Hi,

how much longer will Ruby 1.8(.7) be maintained ? Is it advisable to
dive into 1.9(.2) ? What are the immediate advantages of using 1.9
over 1.8 ?

Thanks,
..
Pete Pincus
A0c079a7c3c9b2cf0bffebd84dc578b0?d=identicon&s=25 Chuck Remes (cremes)
on 2010-11-23 22:26
(Received via mailing list)
On Nov 23, 2010, at 9:43 AM, Peter Pincus wrote:

> Hi,
>
> how much longer will Ruby 1.8(.7) be maintained ? Is it advisable to
> dive into 1.9(.2) ? What are the immediate advantages of using 1.9
> over 1.8 ?

I believe the guys at EngineYard are in charge of backporting fixes to
the 1.8.7 branch. I also heard there was a 1.8.8 coming at some point to
be the final release in the 1.8 series.

I use 1.9.2p0 daily and find it to be extremely stable and fast. I would
say the biggest reason to use it is to get a performance boost. Most of
your code from 1.8 will "just work."  My code sees a 2-5x speedup on
1.9.2 versus 1.8.7.

Why not try it on your code and see for yourself? With tools like rvm
(for unix) and pik (for windows) it's a breeze to have multiple rubies
installed simultaneously.

cr
5a837592409354297424994e8d62f722?d=identicon&s=25 Ryan Davis (Guest)
on 2010-11-23 23:24
(Received via mailing list)
On Nov 23, 2010, at 13:25 , Chuck Remes wrote:

> I use 1.9.2p0 daily and find it to be extremely stable and fast. I would say the
biggest reason to use it is to get a performance boost. Most of your code from 
1.8
will "just work."  My code sees a 2-5x speedup on 1.9.2 versus 1.8.7.

That's really variable and depends on what you're doing.

All of my text processing code needed reworking, and text processing is
(was?) noticeably slower in ruby 1.9 than it is in 1.8.
55f28e9c77b35b1539af6be60986b0e4?d=identicon&s=25 Philip Rhoades (Guest)
on 2010-11-24 00:45
(Received via mailing list)
On 2010-11-24 09:23, Ryan Davis wrote:
> All of my text processing code needed reworking, and text processing
> is (was?) noticeably slower in ruby 1.9 than it is in 1.8.


Who do I talk to get 1.9 RPMs produced for Fedora?

Thanks,

Phil.
--
Philip Rhoades

GPO Box 3411
Sydney NSW  2001
Australia
E-mail:  phil@pricom.com.au
5a837592409354297424994e8d62f722?d=identicon&s=25 Ryan Davis (Guest)
on 2010-11-24 01:14
(Received via mailing list)
On Nov 23, 2010, at 15:44 , Philip Rhoades wrote:

> Who do I talk to get 1.9 RPMs produced for Fedora?

Beats me.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-24 01:30
(Received via mailing list)
On Wed, Nov 24, 2010 at 12:44 AM, Philip Rhoades <phil@pricom.com.au>
wrote:
>
> Who do I talk to get 1.9 RPMs produced for Fedora?

Just a guess: The Ruby (or Programming/Script language) maintainers of
the Fedora project.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
Be30361bb0b0c495e3077db43ad84b56?d=identicon&s=25 Aaron Patterson (Guest)
on 2010-11-24 01:58
(Received via mailing list)
On Wed, Nov 24, 2010 at 09:13:04AM +0900, Ryan Davis wrote:
>
> On Nov 23, 2010, at 15:44 , Philip Rhoades wrote:
>
> > Who do I talk to get 1.9 RPMs produced for Fedora?
>
> Beats me.

My uncle Carl is really good at IT.
A0c079a7c3c9b2cf0bffebd84dc578b0?d=identicon&s=25 Chuck Remes (cremes)
on 2010-11-24 05:44
(Received via mailing list)
On Nov 23, 2010, at 4:23 PM, Ryan Davis wrote:

>
> On Nov 23, 2010, at 13:25 , Chuck Remes wrote:
>
>> I use 1.9.2p0 daily and find it to be extremely stable and fast. I would say
the biggest reason to use it is to get a performance boost. Most of your code 
from
1.8 will "just work."  My code sees a 2-5x speedup on 1.9.2 versus 1.8.7.
>
> That's really variable and depends on what you're doing.
>
> All of my text processing code needed reworking, and text processing is (was?)
noticeably slower in ruby 1.9 than it is in 1.8.

Definitely true. That's why I was careful to say "My code sees a 2-5x
speedup..." because I have seen a few instances where 1.9 is a tad
pokier. But clearly 1.9 is the future so sticking with 1.8 seems like a
bad long-term bet.

cr
73a86b3136bfdda99045ba5de86dd9ec?d=identicon&s=25 Sayth Renshaw (flebber)
on 2010-11-24 11:35
(Received via mailing list)
On Nov 24, 10:44am, Philip Rhoades <p...@pricom.com.au> wrote:
>
> --
> Philip Rhoades
>
> GPO Box 3411
> Sydney NSW   2001
> Australia
> E-mail: p...@pricom.com.au

wiki is a start for fedora they have ruby packages in the
repositories.

http://fedoraproject.org/wiki/Features/Ruby_1.9.1

or build it...simple on linux here is a guide for fedora

http://biztech.sheprador.com/?p=81
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2010-11-24 12:14
Chuck Remes wrote in post #963430:
> I use 1.9.2p0 daily and find it to be extremely stable and fast. I would
> say the biggest reason to use it is to get a performance boost. Most of
> your code from 1.8 will "just work."  My code sees a 2-5x speedup on
> 1.9.2 versus 1.8.7.

And just to give some balance: the biggest reason not to use 1.9 is
because of the incredible complexity which has been added to the String
class, and the ability it gives you to make programs which crash under
unexpected circumstances.

For example, an expression like

   s1 = s2 + s3

where s2 and s3 are both Strings will always work and do the obvious
thing in 1.8, but in 1.9 it may raise an exception. Whether it does
depends not only on the encodings of s2 and s3 at that point, but also
their contents (properties "empty?" and "ascii_only?")

The encodings of strings you read may also be affected by the locale set
from the environment, unless you explicitly code against that. This
means the same program with the same data may work on your machine, but
crash on someone else's.

https://github.com/candlerb/string19/blob/master/string19.rb
https://github.com/candlerb/string19/blob/master/soapbox.rb
1aad4121bdf6ac299a03a2371af993ae?d=identicon&s=25 Oliver Schad (Guest)
on 2010-11-24 13:31
(Received via mailing list)
Brian Candler wrote:

> And just to give some balance: the biggest reason not to use 1.9 is
> because of the incredible complexity which has been added to the
> String class, and the ability it gives you to make programs which
> crash under unexpected circumstances.

Sounds great. ;-) Can somebody else confirm this?

Regards
Oli
86e33dee4a89a8879a26487051c216a8?d=identicon&s=25 Michael Fellinger (Guest)
on 2010-11-24 13:34
(Received via mailing list)
On Wed, Nov 24, 2010 at 8:14 PM, Brian Candler <b.candler@pobox.com>
wrote:
>
> from the environment, unless you explicitly code against that. This
> means the same program with the same data may work on your machine, but
> crash on someone else's.

And that's why I use and love 1.9.
The obvious thing isn't so obvious if you actually care about
encodings, and if you are mindful about what comes from where, it's
actually helpful to find otherwise hidden issues.
I hear nobody complain that 1 / 0 raises but 1.0 / 0.0 gives Infinity,
which I find pretty counter-intuitive, and makes me check for .nan?
and .infinite? (which also fails if I call it on Fixnum instead of
Float).

> https://github.com/candlerb/string19/blob/master/string19.rb
> https://github.com/candlerb/string19/blob/master/soapbox.rb

Many valid complaints there, but nothing that would make me long for
the everything-is-a-string-of-bytes approach of 1.8, which made
working with encodings very brittle.
I can see how this is just annoying to someone who has only dealt with
BINARY/ASCII/UTF-8 all their lives, but please consider that most of
the world actually still uses other encodings as well.
I also want to thank you for writing string19.rb, which is a very
helpful resource for me and others, along with the series from JEG II.
86e33dee4a89a8879a26487051c216a8?d=identicon&s=25 Michael Fellinger (Guest)
on 2010-11-24 14:00
(Received via mailing list)
On Wed, Nov 24, 2010 at 9:25 PM, Oliver Schad
<spam.entfernen.und.bring.gefaelligst.ein.bier.mit@oschad.de> wrote:
> Brian Candler wrote:
>
>> And just to give some balance: the biggest reason not to use 1.9 is
>> because of the incredible complexity which has been added to the
>> String class, and the ability it gives you to make programs which
>> crash under unexpected circumstances.
>
> Sounds great. ;-) Can somebody else confirm this?

iota ~ % echo ʘ | LC_ALL=ja_JP.UTF8 ruby -pe '$_[1,0] = "ʘ"'
ʘʘ
iota ~ % echo ʘ | LC_ALL=C ruby -pe '$_[1,0] = "ʘ"'
-e:1: invalid multibyte char (US-ASCII)
-e:1: invalid multibyte char (US-ASCII)
1aad4121bdf6ac299a03a2371af993ae?d=identicon&s=25 Oliver Schad (Guest)
on 2010-11-24 15:26
(Received via mailing list)
Michael Fellinger wrote:

>
> iota ~ % echo ? | LC_ALL=ja_JP.UTF8 ruby -pe '$_[1,0] = "?"'
> ??
> iota ~ % echo ? | LC_ALL=C ruby -pe '$_[1,0] = "?"'
> -e:1: invalid multibyte char (US-ASCII)
> -e:1: invalid multibyte char (US-ASCII)

So working with strings in ruby v1.9 is not supported, right?

Regards
Oli
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2010-11-24 16:15
Michael Fellinger wrote in post #963539:
>> from the environment, unless you explicitly code against that. This
>> means the same program with the same data may work on your machine, but
>> crash on someone else's.
>
> And that's why I use and love 1.9.
> The obvious thing isn't so obvious if you actually care about
> encodings, and if you are mindful about what comes from where, it's
> actually helpful to find otherwise hidden issues.

Y'know, I wouldn't mind so much if it *always* raised an exception.

For example, say I have s1 tagged UTF-8 and s2 tagged ISO-8859-1. If
"s1+s2" always raised an exception, it would be easy to find, and easy
to fix.

However the 'compatibility' rules mean that this is data-sensitive. In
many cases s1+s2 will work, if either s1 contains non-ASCII characters
but s2 doesn't, or vice-versa. It's really hard to get test coverage of
all the possible cases - rcov won't help you - or you just cross your
fingers and hope.

You also need test coverage for cases where the input data is invalid
for the given encoding. In fact s1+s2 won't raise an exception in that
case, nor will s1[i], but s1 =~ /./ will.

> I hear nobody complain that 1 / 0 raises but 1.0 / 0.0 gives Infinity,

Well, IEEE floating point is a well-established standard that has been
around for donkeys years, so I think it's reasonable to follow it.

And yes, if I see code like "c = a / b", I do think to myself "what if b
is zero?" It's easy to decide if it's expected, and whether I need to do
something other than the default behaviour. Then I move onto the next
line.

For "s3 = s1 + s2" in 1.9 I need to think to myself: "what if s1 has a
different encoding to s2, and s1 is not empty or s2 is not empty and
s1's encoding is not ASCII-compatible or s2's encoding is not
ASCII-compatible or s1 contains non-ASCII characters or s2 contains
non-ASCII characters? And what does that give as the encoding for s3 in
all those possible cases?" And then I have to carry the possible
encodings for s3 forward to the next point where it is used.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-24 16:47
(Received via mailing list)
On Wed, Nov 24, 2010 at 4:15 PM, Brian Candler <b.candler@pobox.com>
wrote:

> For example, say I have s1 tagged UTF-8 and s2 tagged ISO-8859-1. If
> "s1+s2" always raised an exception, it would be easy to find, and easy
> to fix.
>
> However the 'compatibility' rules mean that this is data-sensitive. In
> many cases s1+s2 will work, if either s1 contains non-ASCII characters
> but s2 doesn't, or vice-versa. It's really hard to get test coverage of
> all the possible cases - rcov won't help you - or you just cross your
> fingers and hope.

Convert your strings to UTF-8 at all times, and you are done. You have
to check for data integrity anyway, so you can do that in one go.

>> I hear nobody complain that 1 / 0 raises but 1.0 / 0.0 gives Infinity,
>
> Well, IEEE floating point is a well-established standard that has been
> around for donkeys years, so I think it's reasonable to follow it.

Every natural number is an element of the set of rational numbers. For
all intents and purposes, 0 == 0.0 in mathematics (unless you limit
the set of numbers you are working on to natural numbers only, and
let's just ignore irrational numbers for now). And since the 0 is
around for a bit longer than the IEEE, and the rules of math are
taught in elementary school (including "you must not and cannot divide
by zero"), Ruby exhibits inconsistent behavior for pretty much anyone
who has a little education in maths. The IEEE standards deal with
representing floating point numbers in an inherently integer-based
numerical system, but they don't supersede the rules of maths.

Ruby's behavior of returning *infinity* is the proverbial icing on the
cake, since dividing something large by something infinitely small
results in something large (so, x / 0.000000...[ad infinitum]...1 = x
; a trick used in integrals, too).

Thus, you have to exercise due diligence in this area if you want to
keep your results in the sphere of what's possible and sane.

> And yes, if I see code like "c = a / b", I do think to myself "what if b
> is zero?" It's easy to decide if it's expected, and whether I need to do
> something other than the default behaviour. Then I move onto the next
> line.

It's easy? Take a look at integrals, and infinitesimal[0] numbers.
Infinitesimal are at the same time zero and *not* zero.

> For "s3 = s1 + s2" in 1.9 I need to think to myself: "what if s1 has a
> different encoding to s2, and s1 is not empty or s2 is not empty and
> s1's encoding is not ASCII-compatible or s2's encoding is not
> ASCII-compatible or s1 contains non-ASCII characters or s2 contains
> non-ASCII characters? And what does that give as the encoding for s3 in
> all those possible cases?" And then I have to carry the possible
> encodings for s3 forward to the next point where it is used.

Then, as I suggested above, enforce a standard encoding in your code.
Convert everything into UTF-8, and you are pretty much done.

[0] http://en.wikipedia.org/wiki/Infinitesimal
--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
259f23c3b129f07b0c496b9f0495f07e?d=identicon&s=25 James Edward Gray II (Guest)
on 2010-11-24 17:07
(Received via mailing list)
On Nov 24, 2010, at 9:47 AM, Phillip Gawlowski wrote:

>> fingers and hope.
>
> Convert your strings to UTF-8 at all times, and you are done. You have
> to check for data integrity anyway, so you can do that in one go.

Thank you for being the voice of reason.

I've fought against Brian enough in the past over this issue, that I try
to stay out of it these days.  However, his arguments always strike me
as wanting to unlearn what we have learned about encodings.

We can't go back.  Different encodings exist.  At least Ruby 1.9 allows
us to work with them.

James Edward Gray II
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2010-11-24 17:09
Phillip Gawlowski wrote in post #963602:
> Convert your strings to UTF-8 at all times, and you are done.

But that basically is my point. In order to make your program
comprehensible, you have to add extra incantations so that strings are
tagged as UTF-8 everywhere (e.g. when opening files).

However this in turn adds *nothing* to your program or its logic, apart
from preventing Ruby from raising exceptions.

>> Well, IEEE floating point is a well-established standard that has been
>> around for donkeys years, so I think it's reasonable to follow it.
>
> Every natural number is an element of the set of rational numbers. For
> all intents and purposes, 0 == 0.0 in mathematics (unless you limit
> the set of numbers you are working on to natural numbers only, and
> let's just ignore irrational numbers for now). And since the 0 is
> around for a bit longer than the IEEE, and the rules of math are
> taught in elementary school (including "you must not and cannot divide
> by zero"), Ruby exhibits inconsistent behavior for pretty much anyone
> who has a little education in maths.

Maths and computation are not the same thing. Is there anything in the
above which applies only to Ruby and not to floating point computation
in another other mainstream programming language?

Yes, there are gotchas in floating point computation, as explained at
http://docs.sun.com/source/806-3568/ncg_goldberg.html
These are (or should be) well understood by programmers who feel they
need to use floating point numbers.

If you don't like IEEE floating point, Ruby also offers BigDecimal and
Rational.

If Ruby were to implement floating point following some different set of
rules other than IEEE, that would be (IMO) horrendous. The point of a
standard is that you only have to learn the gotchas once.
A0c079a7c3c9b2cf0bffebd84dc578b0?d=identicon&s=25 Chuck Remes (cremes)
on 2010-11-24 17:56
(Received via mailing list)
[snipped lots of arguments about string encodings that may or may not be
relevant to the OP]

So... I am wondering if the original poster (Peter Pincus) has tried his
code under 1.9 yet.

Peter?

cr
1bc63d01bd3fcccc36fb030a62039352?d=identicon&s=25 David Masover (Guest)
on 2010-11-24 19:13
(Received via mailing list)
On Wednesday, November 24, 2010 10:09:13 am Brian Candler wrote:
> If you don't like IEEE floating point, Ruby also offers BigDecimal and
> Rational.

And if you don't like Ruby's strings, there's nothing stopping you from
rolling your own. There's certainly nothing stopping you from using
binary
mode (whether it claims to be ASCII or not) for all strings.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-24 19:21
(Received via mailing list)
On Wed, Nov 24, 2010 at 5:09 PM, Brian Candler <b.candler@pobox.com>
wrote:
> Phillip Gawlowski wrote in post #963602:
>> Convert your strings to UTF-8 at all times, and you are done.
>
> But that basically is my point. In order to make your program
> comprehensible, you have to add extra incantations so that strings are
> tagged as UTF-8 everywhere (e.g. when opening files).
>
> However this in turn adds *nothing* to your program or its logic, apart
> from preventing Ruby from raising exceptions.

s/apart from preventing Ruby from raising exceptions/but ensures
correctness of data across different systems/;


> Maths and computation are not the same thing. Is there anything in the
> above which applies only to Ruby and not to floating point computation
> in another other mainstream programming language?

You conveniently left out that Ruby thinks dividing by 0.0 results in
infinity.
That's not just wrong, but absurd to the extreme. S, we have to
safeguard against this. Just like having to safeguard against, say,
proper string encoding. If *anyone* is to blame, it's the ANSI and the
IT industry for having a) an extremely US-centric view of the world,
and b) being too damn shortsighted to create an international, capable
standard 30 years ago.

Further, you can't do any computations without proper maths. In Ruby,
you can't do computations since it cannot divide by zero properly, or
at least *consistently*.

> Yes, there are gotchas in floating point computation, as explained at
> http://docs.sun.com/source/806-3568/ncg_goldberg.html
> These are (or should be) well understood by programmers who feel they
> need to use floating point numbers.
>
> If you don't like IEEE floating point, Ruby also offers BigDecimal and
> Rational.

Works really well with irrational numbers, that are neither large
decimals, nor can they be expressed as a fraction x/x_0.

In a nutshell, Ruby cannot deal with floating points at all, and the
IEEE standard is a means to *represent* floating point numbers in
bits. It does *not* supersede natural laws, much less rules that are
in effect for hundreds of years.

And once the accuracy that the IEEE float represents isn't good enough
anymore (which happens once you have to simulate a particle system),
you move away from scalar CPUs, and move to vector CPUs / APUs (like
the MMX and SSE instruction sets for desktops, or a GPGPU via CUDA).

> If Ruby were to implement floating point following some different set of
> rules other than IEEE, that would be (IMO) horrendous. The point of a
> standard is that you only have to learn the gotchas once.

Um, no. A standard is a means to avoid misunderstandings, and have a
well-defined system dealing with what the standard defines. You know,
like exchange text data in a standard that can cover as many of the
world's glyphs as possible.

And there is always room for improvement, otherwise I wonder why
engineers need Maple and mathematicians Mathematica.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
1bc63d01bd3fcccc36fb030a62039352?d=identicon&s=25 David Masover (Guest)
on 2010-11-24 19:51
(Received via mailing list)
On Wednesday, November 24, 2010 05:14:15 am Brian Candler wrote:
> For example, an expression like
>
>    s1 = s2 + s3
>
> where s2 and s3 are both Strings will always work and do the obvious
> thing in 1.8, but in 1.9 it may raise an exception. Whether it does
> depends not only on the encodings of s2 and s3 at that point, but also
> their contents (properties "empty?" and "ascii_only?")

In 1.8, if those strings aren't in the same encoding, it will blindly
concatenate them as binary values, which may result in a corrupt and
nonsensical string.

It seems to me that the obvious thing is to raise an error when there's
an
error, instead of silently corrupting your data.

> This
> means the same program with the same data may work on your machine, but
> crash on someone else's.

Better, again, than working on my machine, but corrupting on someone
else's.
At least if it crashes, hopefully there's a bug report and even a fix
_before_
it corrupts someone's data, not after.

> https://github.com/candlerb/string19/blob/master/string19.rb
> https://github.com/candlerb/string19/blob/master/soapbox.rb

From your soapbox.rb:

* Whether or not you can reason about whether your program works, you
will
  want to test it. 'Unit testing' is generally done by running the code
with
  some representative inputs, and checking if the output is what you
expect.

  Again, with 1.8 and the simple line above, this was easy. Give it any
two
  strings and you will have sufficient test coverage.

Nope. All that proves is that you can get a string back. It says nothing
about
whether the resultant string makes sense.

More relevantly:

* It solves a non-problem: how to write a program which can juggle
multiple
  string segments all in different encodings simultaneously.  How many
  programs do you write like that? And if you do, can't you just have
  a wrapper object which holds the string and its encoding?

Let's see... Pretty much every program, ever, particularly web apps. The
end-
user submits something in the encoding of their choice. I may have to
convert
it to store it in a database, at the very least. It may make more sense
to
store it as whatever encoding it is, in which case, the simple act of
displaying two comments on a website involves exactly this sort of
concatenation.

Or maybe I pull from multiple web services. Something as simple and
common as
a "trackback" would again involve concatenating multiple strings from
potentially different encodings.

* It's pretty much obsolete, given that the whole world is moving to
UTF-8
  anyway.  All a programming language needs is to let you handle UTF-8
and
  binary data, and for non-UTF-8 data you can transcode at the boundary.
  For stateful encodings you have to do this anyway.

Java at least did this sanely -- UTF16 is at least a fixed width. If
you're
going to force a single encoding, why wouldn't you use fixed-width
strings?

Oh, that's right -- UTF16 wastes half your RAM when dealing with mostly
ASCII
characters. So UTF-8 makes the most sense... in the US.

The whole point of having multiple encodings in the first place is that
other
encodings make much more sense when you're not in the US.

* It's ill-conceived. Knowing the encoding is sufficient to pick
characters
  out of a string, but other operations (such as collation) depend on
the
  locale.  And in any case, the encoding and/or locale information is
often
  carried out-of-band (think: HTTP; MIME E-mail; ASN1 tags), or within
the
  string content (think: <?xml charset?>)

How does any of this help me once I've read the string?

* It's too stateful. If someone passes you a string, and you need to
make
  it compatible with some other string (e.g. to concatenate it), then
you
  need to force it's encoding.

You only need to do this if the string was in the wrong encoding in the
first
place. If I pass you a UTF-16 string, it's not polite at all (whether
you dup
it first or not) to just stick your fingers in your ears, go "la la la",
and
pretend it's UTF-8 so you can concatenate it. The resultant string will
be
neither, and I can't imagine what it'd be useful for.

You do seem to have some legitimate complaints, but they are somewhat
undermined by the fact that you seem to want to pretend Unicode doesn't
exist.
As you noted:

"However I am quite possibly alone in my opinion.  Whenever this pops up
on
ruby-talk, and I speak out against it, there are two or three others who
speak out equally vociferously in favour.  They tell me I am doing the
community a disservice by warning people away from 1.9."

Warning people away from 1.9 entirely, and from character encoding in
particular, because of the problems you've pointed out, does seem
incredibly
counterproductive. It'd make a lot more sense to try to fix the real
problems
you've identified -- if it really is "buggy as hell", I imagine the
ruby-core
people could use your help.
B31e7abd14f1ceb4c4957da08933c630?d=identicon&s=25 Josh Cheek (josh-cheek)
on 2010-11-24 20:03
(Received via mailing list)
On Wed, Nov 24, 2010 at 12:20 PM, Phillip Gawlowski <
cmdjackryan@googlemail.com> wrote:

>
> > Maths and computation are not the same thing. Is there anything in the
> > above which applies only to Ruby and not to floating point computation
> > in another other mainstream programming language?
>
> You conveniently left out that Ruby thinks dividing by 0.0 results in
> infinity.
> That's not just wrong, but absurd to the extreme.


Its wrongness is an interpretation (I would also prefer that it just
break,
but I can certainly see why some would say it should be infinity). And
it
doesn't apply only to Ruby:

Java:
public class Infinity {
  public static void main(String[] args) {
    System.out.println(1.0/0.0); // prints "Infinity"
  }
}

JavaScript:
document.write(1.0/0.0) // prints "Infinity"

C:
#include <stdio.h>
int main( ) {
  printf( "%f\n" , 1.0/0.0 ); // prints "inf"
  return 0;
}
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-24 20:16
(Received via mailing list)
On Wed, Nov 24, 2010 at 8:02 PM, Josh Cheek <josh.cheek@gmail.com>
wrote:
>
> Its wrongness is an interpretation (I would also prefer that it just break,
> but I can certainly see why some would say it should be infinity). And it
> doesn't apply only to Ruby:

It cannot be infinity. It does, quite literally not compute. There's
no room for interpretation, it's a fact of (mathematical) life that
something divided by nothing has an undefined result. It doesn't
matter if it's 0, 0.0, or -0.0. Undefined is undefined.

That other languages have the same issue makes matters worse, not
better (but at least it is consistent, so there's that).

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
B31e7abd14f1ceb4c4957da08933c630?d=identicon&s=25 Josh Cheek (josh-cheek)
on 2010-11-24 20:35
(Received via mailing list)
On Wed, Nov 24, 2010 at 1:16 PM, Phillip Gawlowski <
cmdjackryan@googlemail.com> wrote:

> matter if it's 0, 0.0, or -0.0. Undefined is undefined.
>
>
From my Calculus book (goo.gl/D7PoI)

"by observing from the table of values and the graph of y = 1/x*²* in
Figure
1, that the values of 1/x*²* can be made arbitrarily large by taking x
close
enough to 0. Thus the values of f(x) do not approach a number, so
lim_(x->0)
1/x*²* does not exist. To indicate this kind of behaviour we use the
notation lim_(x->0) 1/x*²* = ∞"

Since floats define infinity, regardless of its not being a number, it
is
not "absurd to the extreme" to result in that value when doing floating
point math.



> That other languages have the same issue makes matters worse, not
> better (but at least it is consistent, so there's that).
>
>
The question was "Is there anything in the above which applies only to
Ruby
and not to floating point computation in another other mainstream
programming language?" the answer isn't "other languages have the same
issue", it's "no".
Be9b2cbe9586595b16838980a882661e?d=identicon&s=25 Yuri Tzara (yuritzara)
on 2010-11-24 21:22
Phillip Gawlowski wrote in post #963658:
>
> It cannot be infinity. It does, quite literally not compute. There's
> no room for interpretation, it's a fact of (mathematical) life that
> something divided by nothing has an undefined result. It doesn't
> matter if it's 0, 0.0, or -0.0. Undefined is undefined.

It is perfectly reasonable, mathematically, to assign infinity to 1/0.
To geometers and topologists, infinity is just another point. Look up
the one-point compactification of R^n. If we join infinity to the real
line, we get a circle, topologically. Joining infinity to the real plane
gives a sphere, called the Riemann sphere. These are rigorous
definitions with useful results.

I'm glad that IEEE floating point has infinity included, otherwise I
would run into needless error handling. It's not an error to reach one
pole of a sphere (the other pole being zero).

Infinity is there for good reason; its presence was well-considered by
the quite knowledgeable IEEE designers.
1bc63d01bd3fcccc36fb030a62039352?d=identicon&s=25 David Masover (Guest)
on 2010-11-25 00:17
(Received via mailing list)
On Wednesday, November 24, 2010 01:35:12 pm Josh Cheek wrote:
> >
> taking x close enough to 0. Thus the values of f(x) do not approach a
> number, so lim_(x->0) 1/x*²* does not exist. To indicate this kind of
> behaviour we use the notation lim_(x->0) 1/x*²* = ∞"

Specifically, the _limit_ is denoted as infinity, which is not a real
number.

> Since floats define infinity, regardless of its not being a number, it is
> not "absurd to the extreme" to result in that value when doing floating
> point math.

Ah, but it is, for two reasons:

First, floats represent real numbers. Having exceptions to that, like
NaN or
Infinity, is pointless and confusing -- it would be like making nil an
integer. And having float math produce something which isn't a float
doesn't
really make sense.

Second, 1/0 is just undefined, not infinity. It's the _limit_ of 1/x as
x goes
to 0 which is infinity. This only has meaning in the context of limits,
because limits are just describing behavior -- all the limit says is
that as x
gets arbitrarily close to 0, 1/x gets arbitrarily large, but you still
can't
_actually_ divide x by 0.

They didn't teach me that in Calculus, they're teaching me that in
proofs.

> > That other languages have the same issue makes matters worse, not
> > better (but at least it is consistent, so there's that).
>
> The question was "Is there anything in the above which applies only to Ruby
> and not to floating point computation in another other mainstream
> programming language?" the answer isn't "other languages have the same
> issue", it's "no".

I don't know that there's anything in the above that applies only to
Ruby.
However, Ruby does a number of things differently, and arguably better,
than
other languages -- for example, Ruby's integer types transmute into
Bignum
rather than overflowing.
7eb00814549b9a2bb8ec28547d448be5?d=identicon&s=25 Adam Ms. (e148759)
on 2010-11-25 02:01
Phillip Gawlowski wrote in post #963658:

> It cannot be infinity. It does, quite literally not compute. There's
> no room for interpretation, it's a fact of (mathematical) life that
> something divided by nothing has an undefined result. It doesn't
> matter if it's 0, 0.0, or -0.0. Undefined is undefined.
>
> That other languages have the same issue makes matters worse, not
> better (but at least it is consistent, so there's that).
>
> --
> Phillip Gawlowski

This is not even wrong.

From the definitive source:
http://en.wikipedia.org/wiki/Division_by_zero

The IEEE floating-point standard, supported by almost all modern
floating-point units, specifies that every floating point arithmetic
operation, including division by zero, has a well-defined result. The
standard supports signed zero, as well as infinity and NaN (not a
number). There are two zeroes, +0 (positive zero) and −0 (negative zero)
and this removes any ambiguity when dividing. In IEEE 754 arithmetic, a
÷ +0 is positive infinity when a is positive, negative infinity when a
is negative, and NaN when a = ±0. The infinity signs change when
dividing by −0 instead.
Ef4b0d0341ea9118061b0b468d04757d?d=identicon&s=25 "Jörg W Mittag" <JoergWMittag+Ruby@GoogleMail.Com> (Guest)
on 2010-11-25 03:40
(Received via mailing list)
David Masover wrote:
> Java at least did this sanely -- UTF16 is at least a fixed width. If you're
> going to force a single encoding, why wouldn't you use fixed-width strings?

Actually, it's not. It's simply mathematically impossible, given that
there are more than 65536 Unicode codepoints. AFAIK, you need (at the
moment) at least 21 Bits to represent all Unicode codepoints. UTF-16
is *not* fixed-width, it encodes every Unicode codepoint as either one
or two UTF-16 "characters", just like UTF-8 encodes every Unicode
codepoint as 1, 2, 3 or 4 octets.

The only two Unicode encodings that are fixed-width are the obsolete
UCS-2 (which can only encode the lower 65536 codepoints) and UTF-32.

You can produce corrupt strings and slice into a half-character in
Java just as you can in Ruby 1.8.

> Oh, that's right -- UTF16 wastes half your RAM when dealing with mostly ASCII
> characters. So UTF-8 makes the most sense... in the US.

Of course, that problem is even more pronounced with UTF-32.

German text blows up about 5%-10% when encoded in UTF-8 instead of
ISO8859-15. Arabic, Persian, Indian, Asian text (which is, after all,
much more than European) is much worse. (E.g. Chinese blows up *at
least* 50% when encoding UTF-8 instead of Big5 or GB2312.) Given that
the current tendency is that devices actually get *smaller*, bandwidth
gets *lower* and latency gets *higher*, that's simply not a price
everybody is willing to pay.

> The whole point of having multiple encodings in the first place is that other
> encodings make much more sense when you're not in the US.

There's also a lot of legacy data, even within the US. On IBM systems,
the standard encoding, even for greenfield systems that are being
written right now, is still pretty much EBCDIC all the way.

There simply does not exist a single encoding which would be
appropriate for every case, not even the majority of cases. In fact,
I'm not even sure that there is even a single encoding which is
appropriate for a significant minority of cases.

We tried that One Encoding To Rule Them All in Java, and it was a
failure. We tried it again with a different encoding in Java 5, and it
was a failure. We tried it in .NET, and it was a failure. The Python
community is currently in the process of realizing it was a failure. 5
years of work on PHP 6 were completely destroyed because of this. (At
least they realized it *before* releasing it into the wild.)

And now there's a push for a One Encoding To Rule Them All in Ruby 2.
That's *literally* insane! (One definition of insanity is repeating
behavior and expecting a different outcome.)

jwm
259f23c3b129f07b0c496b9f0495f07e?d=identicon&s=25 James Edward Gray II (Guest)
on 2010-11-25 04:39
(Received via mailing list)
On Nov 24, 2010, at 8:40 PM, Jrg W Mittag wrote:

> The only two Unicode encodings that are fixed-width are the obsolete
> UCS-2 (which can only encode the lower 65536 codepoints) and UTF-32.

And even UTF-32 would have the complications of "combining characters."

James Edward Gray II
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2010-11-25 10:46
(Received via mailing list)
On Wed, Nov 24, 2010 at 5:09 PM, Brian Candler <b.candler@pobox.com>
wrote:
> Phillip Gawlowski wrote in post #963602:
>> Convert your strings to UTF-8 at all times, and you are done.

This may be true for the western world but I believe I remember one of
our Japanese friends state that Unicode does not cover all Asian
character sets completely; it could have been a remark about Java's
implementation of Unicode though, I am not 100% sure.

> But that basically is my point. In order to make your program
> comprehensible, you have to add extra incantations so that strings are
> tagged as UTF-8 everywhere (e.g. when opening files).
>
> However this in turn adds *nothing* to your program or its logic, apart
> from preventing Ruby from raising exceptions.

Checking input and ensuring that data reaches the program in proper
ways is generally good practice for robust software.  IMHO dealing
explicitly with encodings falls into the same area as checking whether
an integer entered by a user is strictly positive or a string is not
empty.

And I don't think you have to do it for one off scripts or when
working in your local environment only.  So there is no effort
involved.

Brian, it seems you want to avoid the complex matter of i18n - by
ignoring it.  But if you work in a situation where multiple encodings
are mixed you will be forced to deal with it - sooner or later.  With
1.9 you get proper feedback while 1.8 may simply stop working at some
point - and you may not even notice it quickly enough to avoid damage.

Kind regards

robert
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-25 11:10
(Received via mailing list)
On Thu, Nov 25, 2010 at 2:05 AM, Adam Ms. <e148759@bsnow.net> wrote:
>
> This is not even wrong.
>
> From the definitive source:
> http://en.wikipedia.org/wiki/Division_by_zero

For certain values of "definitive", anyway.

> The IEEE floating-point standard, supported by almost all modern
> floating-point units, specifies that every floating point arithmetic
> operation, including division by zero, has a well-defined result. The
> standard supports signed zero, as well as infinity and NaN (not a
> number). There are two zeroes, +0 (positive zero) and -0 (negative zero)
> and this removes any ambiguity when dividing. In IEEE 754 arithmetic, a
>  +0 is positive infinity when a is positive, negative infinity when a
> is negative, and NaN when a = 0. The infinity signs change when
> dividing by -0 instead.

Yes, the IEEE 754 standard defines it that way.

The IEEE standard, however, does *not* define how mathematics work.
Mathematics does that. In math, x_0/0 is *undefined*. It is not
infinity (David kindly explained the difference between limits and
numbers), it is not negative infinity, it is undefined. Division by
zero *cannot* happen. If it would, we would be able to build, for
example, perpetual motion machines.

So, from a purely mathematical standpoint, the IEEE 754 standard is
wrong by treating the result of division by 0.0 any different than
dividing by 0 (since floats are only different in their nature to
*computers* representing everything in binary [which cannot represent
floating point numbers at all, much less any given irrational
number]).


--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-25 11:13
(Received via mailing list)
On Thu, Nov 25, 2010 at 10:45 AM, Robert Klemme
<shortcutter@googlemail.com> wrote:
>
> This may be true for the western world but I believe I remember one of
> our Japanese friends state that Unicode does not cover all Asian
> character sets completely; it could have been a remark about Java's
> implementation of Unicode though, I am not 100% sure.

Since UTF-8 is a subset of UTF-16, which in turn is a subset of
UTF-32, and Unicode is future-proofed (at least, ISO learned from the
mess created in the 1950s to 1960s) so that new glyphs won't ever
collide with existing glyphs, my point still stands. ;)

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2010-11-25 12:56
(Received via mailing list)
On Thu, Nov 25, 2010 at 11:12 AM, Phillip Gawlowski
<cmdjackryan@googlemail.com> wrote:
> On Thu, Nov 25, 2010 at 10:45 AM, Robert Klemme
> <shortcutter@googlemail.com> wrote:
>>
>> This may be true for the western world but I believe I remember one of
>> our Japanese friends state that Unicode does not cover all Asian
>> character sets completely; it could have been a remark about Java's
>> implementation of Unicode though, I am not 100% sure.
>
> Since UTF-8 is a subset of UTF-16, which in turn is a subset of
> UTF-32,

I tried to find more precise statement about this but did not really
succeed.  I thought all UTF-x were just different encoding forms of
the same universe of code points.

> and Unicode is future-proofed

Oh, so then ISO committee actually has a time machine?  Wow! ;-)

> (at least, ISO learned from the
> mess created in the 1950s to 1960s) so that new glyphs won't ever
> collide with existing glyphs, my point still stands. ;)

Well, I support your point anyway.  That was just meant as a caveat so
people are watchful (and test rather than believe). :-)  But as I
think about it it more likely was a statement about Java's
implementation (because a char has only 16 bits which is not
sufficient for all Unicode code points).

Kind regards

robert
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-25 13:42
(Received via mailing list)
On Thu, Nov 25, 2010 at 12:56 PM, Robert Klemme
<shortcutter@googlemail.com> wrote:
>
>> Since UTF-8 is a subset of UTF-16, which in turn is a subset of
>> UTF-32,
>
> I tried to find more precise statement about this but did not really
> succeed. I thought all UTF-x were just different encoding forms of
> the same universe of code points.

It's an implicit feature, rather than an explicit one:
Wester languages get the first 8 bits for encoding. Glyphs going
beyond the Latin alphabet get the next 8 bits. If that isn't enough, n
additional 16 bits are used for encoding purposes.

Thus, UTF-8 is a subset of UTF-16 is a subset of UTF-16. Thus, also,
the future-proofing, in case even more glyphs are needed.


>> (at least, ISO learned from the
>> mess created in the 1950s to 1960s) so that new glyphs won't ever
>> collide with existing glyphs, my point still stands. ;)
>
> Well, I support your point anyway. That was just meant as a caveat so
> people are watchful (and test rather than believe). :-) But as I
> think about it it more likely was a statement about Java's
> implementation (because a char has only 16 bits which is not
> sufficient for all Unicode code points).

Of course, test your assumptions. But first, you need an assumption to
start from. ;)

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2010-11-25 14:30
(Received via mailing list)
On Thu, Nov 25, 2010 at 1:37 PM, Phillip Gawlowski
<cmdjackryan@googlemail.com> wrote:
> It's an implicit feature, rather than an explicit one:
> Wester languages get the first 8 bits for encoding. Glyphs going
> beyond the Latin alphabet get the next 8 bits. If that isn't enough, n
> additional 16 bits are used for encoding purposes.

What bits are you talking about here, bits of code points or bits in
the encoding?  It seems you are talking about bits of code points.
However, how these are put into any UTF-x encoding is a different
story and also because UTF-8 knows multibyte sequences it's not
immediately clear whether UTF-8 can only hold a subset of what UTF-16
can hold.

> Thus, UTF-8 is a subset of UTF-16 is a subset of UTF-16. Thus, also,
> the future-proofing, in case even more glyphs are needed.

Quoting from http://tools.ietf.org/html/rfc3629#section-3

Char. number range  |        UTF-8 octet sequence
   (hexadecimal)    |              (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

So we have for code point encoding

7 bits
6 + 5 = 11 bits
2 * 6 + 4 = 16 bits
3 * 6 + 3 = 21 bits

This makes 2164864 (0x210880) possible code points in UTF-8.  And the
pattern can be extended.

Looking at http://tools.ietf.org/html/rfc2781#section-2.1 we see that
UTF-16 (at least this version) supports code points up to 0x10FFFF.
This is less than what UTF-8 can hold theoretically.

Coincidentally 0x10FFFF has 21 bits which is what fits into UTF-8.

I stay unconvinced that UTF-8 can handle a subset of code points of
the set UTF-16 can handle.

I also stay unconvinced that UTF-8 encodings are a subset of UTF-16
encodings.  This cannot be true because in UTF-8 the encoding unit is
one octet, while in UTF-16 it's two octets.  As a practical example
the sequence "a" will have length 1 octet in UTF-8 (because it happens
to be an ASCII character) and length 2 octets in UTF-16.

"All standard UCS encoding forms except UTF-8 have an encoding unit
larger than one octet, [...]"
http://tools.ietf.org/html/rfc3629#section-1

> Of course, test your assumptions. But first, you need an assumption to
> start from. ;)

:-)

Cheers

robert
259f23c3b129f07b0c496b9f0495f07e?d=identicon&s=25 James Edward Gray II (Guest)
on 2010-11-25 14:55
(Received via mailing list)
On Nov 25, 2010, at 6:37 AM, Phillip Gawlowski wrote:

> Thus, UTF-8 is a subset of UTF-16 is a subset of UTF-16. Thus, also,
> the future-proofing, in case even more glyphs are needed.

You are confusing us.

UTF-8, UTF-16, and UTF-32 are encodings of Unicode code points.  They
are all capable of representing all code points.  Nothing in this
discussion is a subset of anything else.

James Edward Gray II
259f23c3b129f07b0c496b9f0495f07e?d=identicon&s=25 James Edward Gray II (Guest)
on 2010-11-25 15:08
(Received via mailing list)
On Nov 25, 2010, at 5:56 AM, Robert Klemme wrote:

> But as I think about it it more likely was a statement about Java's
> implementation (because a char has only 16 bits which is not
> sufficient for all Unicode code points).

I believe you are referring to the complaints the Asian cultures
sometimes raise against Unicode.  If so, I'll try to recap the issues,
as I understand them.

First, Unicode is a bit larger than their native encodings.  Typically
they get everything they need into two bytes where Unicode requires more
for their languages.

The Unicode team also made some controversial decisions that affected
the Asian languages, like Han Unification
(http://en.wikipedia.org/wiki/Han_unification).

Finally, they have a lot of legacy data in their native encodings and
perfect conversion is sometimes tricky due to some context sensitive
issues.

I think the Asian cultures have warmed a bit to Unicode over time (my
opinion only), but it's important to remember that adopting it involved
more challenges for them.

James Edward Gray II
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2010-11-25 15:24
(Received via mailing list)
On Thu, Nov 25, 2010 at 3:07 PM, James Edward Gray II
<james@graysoftinc.com> wrote:
> The Unicode team also made some controversial decisions that affected the Asian
languages, like Han Unification (http://en.wikipedia.org/wiki/Han_unification).
>
> Finally, they have a lot of legacy data in their native encodings and perfect
conversion is sometimes tricky due to some context sensitive issues.

James, thanks for the summary.  It is much appreciated.

> I think the Asian cultures have warmed a bit to Unicode over time (my opinion
only), but it's important to remember that adopting it involved more challenges
for them.

I believe that is in part due to our western ignorance.  If we would
deal with encodings properly we would probably feel a similar pain -
at least it would cause more pain for us.  I have frequently seen i18n
aspects being ignored (my pet peeve is time zones).  Usually this
breaks your neck as soon as people from other cultures start using
your application - or such simple things happen as a change of a
database server's timezone which then differs from the application
server's. :-)

Kind regards

robert
55f28e9c77b35b1539af6be60986b0e4?d=identicon&s=25 Philip Rhoades (Guest)
on 2010-11-25 15:49
(Received via mailing list)
James,


On 2010-11-26 00:55, James Edward Gray II wrote:
> On Nov 25, 2010, at 6:37 AM, Phillip Gawlowski wrote:
>
>> Thus, UTF-8 is a subset of UTF-16 is a subset of UTF-16. Thus,
>> also, the future-proofing, in case even more glyphs are needed.
>
> You are confusing us.
>
> UTF-8, UTF-16, and UTF-32 are encodings of Unicode code points.  They
> are all capable of representing all code points.  Nothing in this
> discussion is a subset of anything else.


This is all really interesting but I don't understand what you mean by
"code points" - is what you have said expressed diagrammatically
somewhere?

Thanks,

Phil.
--
Philip Rhoades

GPO Box 3411
Sydney NSW  2001
Australia
E-mail:  phil@pricom.com.au
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2010-11-25 16:01
(Received via mailing list)
On Thu, Nov 25, 2010 at 3:48 PM, Philip Rhoades <phil@pricom.com.au>
wrote:
>> You are confusing us.
>>
>> UTF-8, UTF-16, and UTF-32 are encodings of Unicode code points. They
>> are all capable of representing all code points. Nothing in this
>> discussion is a subset of anything else.
>
>
> This is all really interesting but I don't understand what you mean by "code
> points" - is what you have said expressed diagrammatically somewhere?

A "code point" is basically a unique identifier of a special symbol.

http://en.wikipedia.org/wiki/Unicode

http://www.unicode.org/
http://www.unicode.org/charts/About.html
http://www.unicode.org/charts/

HTH

robert
Be9b2cbe9586595b16838980a882661e?d=identicon&s=25 Yuri Tzara (yuritzara)
on 2010-11-25 16:02
Phillip Gawlowski wrote in post #963815:
> The IEEE standard, however, does *not* define how mathematics work.
> Mathematics does that. In math, x_0/0 is *undefined*. It is not
> infinity...

What psychological anomaly causes creationists keep saying that there
are no transitional fossils even after having been shown transitional
fossils? We might pass it off as mere cult indoctrination or
brainwashing, but the problem is a more general one.

We also see it happening here in Mr. Gawlowski who, after being given
mathematical facts about infinity, simply repeats his uninformed
opinion.

"The Dunning-Kruger effect is a cognitive bias in which an unskilled
person makes poor decisions and reaches erroneous conclusions, but
their incompetence denies them the metacognitive ability to realize
their mistakes." (http://en.wikipedia.org/wiki/Dunning-Kruger_effect)

Here is my initial response to Mr. Gawlowski. Let's see if he ignores
it again (as a creationist ignores transitional fossils).

> It is perfectly reasonable, mathematically, to assign infinity to
> 1/0.  To geometers and topologists, infinity is just another
> point. Look up the one-point compactification of R^n. If we join
> infinity to the real line, we get a circle, topologically. Joining
> infinity to the real plane gives a sphere, called the Riemann
> sphere. These are rigorous definitions with useful results.
>
> I'm glad that IEEE floating point has infinity included, otherwise I
> would run into needless error handling. It's not an error to reach
> one pole of a sphere (the other pole being zero).
>
> Infinity is there for good reason; its presence was well-considered
> by the quite knowledgeable IEEE designers.
259f23c3b129f07b0c496b9f0495f07e?d=identicon&s=25 James Edward Gray II (Guest)
on 2010-11-25 16:08
(Received via mailing list)
On Nov 25, 2010, at 8:48 AM, Philip Rhoades wrote:

>>
>> UTF-8, UTF-16, and UTF-32 are encodings of Unicode code points.  They
>> are all capable of representing all code points.  Nothing in this
>> discussion is a subset of anything else.
>
>
> This is all really interesting but I don't understand what you mean by "code
points" - is what you have said expressed diagrammatically somewhere?

Do these explanations help?

  http://blog.grayproductions.net/articles/what_is_a...

  http://blog.grayproductions.net/articles/the_unico...

James Edward Gray II
8b50d5d8199a12f00cfbfd1061e224a9?d=identicon&s=25 Manuel Kiessling (Guest)
on 2010-11-25 16:18
(Received via mailing list)
Dear Yuri,

maybe being a bit more friendly and respecting would help this
discussion.


Am 25.11.10 16:02, schrieb Yuri Tzara:
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-25 17:40
(Received via mailing list)
On Thu, Nov 25, 2010 at 4:02 PM, Yuri Tzara <yuri.tzara@gmail.com>
wrote:
>
> "The Dunning-Kruger effect is a cognitive bias in which an unskilled
> person makes poor decisions and reaches erroneous conclusions, but
> their incompetence denies them the metacognitive ability to realize
> their mistakes." (http://en.wikipedia.org/wiki/Dunning-Kruger_effect)

Your insult aside:
http://www.wolframalpha.com/input/?i=1/x

I'm quite aware that IEEE 754 defines the result of x_0/0 as infinity.
That is not, however, correct *in a mathematical sense*.
IEEE 754, for example, also defines the result of the square root of
-1 as an error. However, the result of, say the square root of -x is
x* j. That's called complex numbers, BTW.


Anyway:
Thanks, James, for correcting me on UTF-8, etc.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
1aad4121bdf6ac299a03a2371af993ae?d=identicon&s=25 Oliver Schad (Guest)
on 2010-11-25 18:06
(Received via mailing list)
Robert Klemme wrote:

>> Since UTF-8 is a subset of UTF-16, which in turn is a subset of
>> UTF-32,
>
> I tried to find more precise statement about this but did not really
> succeed.  I thought all UTF-x were just different encoding forms of
> the same universe of code points.

Yes this is correct. Many people don't get the difference between a
charset and the corresponding encoding.

Unicode is a charset not with one encoding but with many encodings. So
we talk about the same characters and different mappings of this
characters to bits and bytes. This mapping is a simple table which you
can write down to a paper sheet and the side with the characters will
always be the same with UTF-8, UTF-16 and UTF-32.

The encodings UTF-8, UTF-16 and UTF-32 were build for different
purposes. The number after UTF says nothing about the maximum length in
first place it says something about the shortest length (and often about
the usual length if you use this encoding in that situation which it was
build for).

So if people coming from the ascii world use UTF-8, many encodings
(mapping of a character to a sequence of bits and bytes) of characters
will be inside one byte.

UTF-32 is a bit different, in this case 32 bits are a static size of
each encoded character.

>> and Unicode is future-proofed
>
> Oh, so then ISO committee actually has a time machine?  Wow! ;-)

Read as has much encoding space left and nobody of us knows how you
could fill the whole space. But humans tend to be wrong.

Regards
Oli
1aad4121bdf6ac299a03a2371af993ae?d=identicon&s=25 Oliver Schad (Guest)
on 2010-11-25 18:25
(Received via mailing list)
Phillip Gawlowski wrote:

> I'm quite aware that IEEE 754 defines the result of x_0/0 as infinity.

The point is that you can't guarantee that you has a 0 with floiting
point aithmetics. Every value you have to read as "as close as possible
to the meant value for this machine and this number size".

The machines are not perfect they can't work mathematical correct in
many situations.

And you have to deal with this situation - all people doing numeric
things know that. Doing numeric calculations with a computer means to
calculate something as near as possible in the given environment and
requirements.

So in this sense is dividing a number through zero in real computers
dividing something which is close to the number, which I mean through
something which is close to zero.

And in fact it's not a bad idea to define if you have something which is
very close to zero (because you don't know if it's exactly zero), you
should treat it as very close to zero and not zero itself.

In a perfect world with perfect computers which has infinite registers,
infinite storage and infinite fast CPUs computers would know that zero
is equal to zero. But in this world computer knows only that zero is
close to zero.

Regards
Oli
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2010-11-25 18:31
(Received via mailing list)
On Thu, Nov 25, 2010 at 6:05 PM, Oliver Schad
<spam.entfernen.und.bring.gefaelligst.ein.bier.mit@oschad.de> wrote:
>>>> implementation of Unicode though, I am not 100% sure.
>>>
>>> Since UTF-8 is a subset of UTF-16, which in turn is a subset of
>>> UTF-32,
>>
>> I tried to find more precise statement about this but did not really
>> succeed. I thought all UTF-x were just different encoding forms of
>> the same universe of code points.
>
> Yes this is correct. Many people don't get the difference between a
> charset and the corresponding encoding.

Btw, this happens all the time: for example, people often do not grasp
the difference between "point in time" and "representation of a point
in time in a particular time zone and locale".  This becomes an issue
if you want to calculate with timestamps at or near the DST change
time. :-)

> build for).
More precisely the number indicates the "encoding unit" (see my quote
in an earlier posting).  One could think up an encoding with encoding
unit of 1 octet (8 bits, 1 byte) where the shortest length would be 2
octets.  Example

1st octet: number of octets to follow
2nd and subsequent octets: encoded character

The shortest length would be 2 octets, but the length would increase
by 1 octet so the encoding unit is 1 octet.

Cheers

robert
Ef4b0d0341ea9118061b0b468d04757d?d=identicon&s=25 "Jörg W Mittag" <JoergWMittag+Ruby@GoogleMail.Com> (Guest)
on 2010-11-25 19:30
(Received via mailing list)
James Edward Gray II wrote:
> On Nov 24, 2010, at 8:40 PM, Jörg W Mittag wrote:
>> The only two Unicode encodings that are fixed-width are the obsolete
>> UCS-2 (which can only encode the lower 65536 codepoints) and UTF-32.
> And even UTF-32 would have the complications of "combining characters."

.. and zero-width characters and different representations of the same
character and ...

But that is a whole different can of worms.

jwm
Be9b2cbe9586595b16838980a882661e?d=identicon&s=25 Yuri Tzara (yuritzara)
on 2010-11-25 19:43
Phillip Gawlowski wrote in post #963922:
> I'm quite aware that IEEE 754 defines the result of x_0/0 as
> infinity. That is not, however, correct *in a mathematical sense*.

Yes, it is. The creationist analogy continues, I see. Even when given
facts which refute his position a second and third time, Mr. Gawlowski
continues to ignore them while simply repeating his opinion.

He twice ignored my suggestion to look up one-point
compactifications. He twice ignored my explanation of joining oo to a
line to make a circle, and of joining oo to a plane to make a
sphere. He also ignored the same information found in the link
provided by Adam.

It's not coincidence that the IEEE operations on oo match the rules
for the real projective line.

http://en.wikipedia.org/wiki/Real_projective_line#...

The only exception is the result of oo + oo, however in computing it
is convenient for that to be oo rather than undefined. Of course
there's the +/- distinction for oo (also convenient in computing)
which is eliminated by identifying +oo with -oo, as one would expect
for a circle. There are perhaps other technicalities but generally
IEEE operations are modeled after the real projective line. This is
useful.

http://en.wikipedia.org/wiki/One-point_compactification
http://en.wikipedia.org/wiki/Riemann_sphere
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-25 20:04
(Received via mailing list)
On Thu, Nov 25, 2010 at 6:25 PM, Oliver Schad
<spam.entfernen.und.bring.gefaelligst.ein.bier.mit@oschad.de> wrote:
> Phillip Gawlowski wrote:
>
>> I'm quite aware that IEEE 754 defines the result of x_0/0 as infinity.
>
> The point is that you can't guarantee that you has a 0 with floiting
> point aithmetics.

You can. In mathematics. The problem is, as you pointed out, that a 32
bit (or 64 bit, or n bit where n is finite) CPU isn't able to present
floating point numbers accurately enough.

However, 0 = 0.0 (no matter how much Yuri moves the goal posts).

You run into this issue once you leave the defined space for IEEE
Floats (~10^-44 for negative floats), *then* you enter very wonky
areas.

But a flat 0.0 is only non-zero for computers. But not in maths. Ergo,
from a mathematical standpoint, the IEEE standard is broken.

> Every value you have to read as "as close as possible
> to the meant value for this machine and this number size".

IOW: It's a limit (and an approximate one at that, but it's "good
enough" for pretty much all purposes).

> The machines are not perfect they can't work mathematical correct in
> many situations.

Only in floats, and with integers that are larger than the total
address space. But then we have the problem of the required CPU time
to consider.

> And you have to deal with this situation - all people doing numeric
> things know that. Doing numeric calculations with a computer means to
> calculate something as near as possible in the given environment and
> requirements.

Indeed. The problem is if the desired accuracy is much more exact than
the IEEE float defines.

> So in this sense is dividing a number through zero in real computers
> dividing something which is close to the number, which I mean through
> something which is close to zero.

Not quite. Integer devision is behaving properly, Float isn't, even
with only one significant digit.

> And in fact it's not a bad idea to define if you have something which is
> very close to zero (because you don't know if it's exactly zero), you
> should treat it as very close to zero and not zero itself.

Well, infinity isn't close to zero, either. ;)

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-25 20:10
(Received via mailing list)
On Thu, Nov 25, 2010 at 7:43 PM, Yuri Tzara <yuri.tzara@gmail.com>
wrote:
> line to make a circle, and of joining oo to a plane to make a
> there's the +/- distinction for oo (also convenient in computing)
> which is eliminated by identifying +oo with -oo, as one would expect
> for a circle. There are perhaps other technicalities but generally
> IEEE operations are modeled after the real projective line. This is
> useful.

Quote Wikipedia:
"Unlike most mathematical models of the intuitive concept of 'number',
this structure allows division by zero [snip formula], for nonzero a.
This structure, however, is not a field, and *division does not retain
its original algebraic meaning in it*."

Emphasis mine. Your argument is also called "moving the goal posts".
But even if we consider it: non-algebraic systems are not something
99% of all non-professional-mathematicians engage in (so, we can toss
in a "no true Scotsman" fallacy into the bargain).


--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
Be9b2cbe9586595b16838980a882661e?d=identicon&s=25 Yuri Tzara (yuritzara)
on 2010-11-25 21:19
Phillip, regarding defining 1/0 you said,

> It cannot be infinity. It does, quite literally not compute. There's
> no room for interpretation, it's a fact of (mathematical) life that
> something divided by nothing has an undefined result. It doesn't
> matter if it's 0, 0.0, or -0.0. Undefined is undefined.

Nonsense. These claims are roundly refuted by

http://en.wikipedia.org/wiki/Extended_real_number_...

IEEE floating point operations on oo and -oo match those *precisely*.
IEEE models the extended real number line.

> I'm quite aware that IEEE 754 defines the result of x_0/0 as
> infinity.  That is not, however, correct *in a mathematical sense*.

Nonsense. Infinity defined this way has solid mathematical meaning and
is established on a firm foundation, described in the link above.

> The IEEE standard, however, does *not* define how mathematics work.
> Mathematics does that. In math, x_0/0 is *undefined*. It is not
> infinity...

Right, IEEE does not define how mathematics works. IEEE took the
mathematical definition and properties of infinity and incorporated it
into the standard. Clearly, you were unaware of this and repeatedly
ignored the information offered to you about it.

> Quote Wikipedia:
> "Unlike most mathematical models of the intuitive concept of 'number',
> this structure allows division by zero [snip formula], for nonzero a.
> This structure, however, is not a field, and *division does not retain
> its original algebraic meaning in it*."
>
> Emphasis mine.

That sentence. You evidently do not understand what it means. It does
not mean what you think it means.

> Your argument is also called "moving the goal posts".  But even if
> we consider it: non-algebraic systems are not something 99% of all
> non-professional-mathematicians engage in (so, we can toss in a "no
> true Scotsman" fallacy into the bargain).

Nonsense. Every person who has obtained a result of +oo or -oo from a
floating point calculation has engaged in it. A result of +oo or -oo
is often a meaningful answer and not an error. And even when it is an
error, it gives us information on what went wrong (and which direction
it went wrong in). It's entertainingly ironic that you attribute
"moving the goalposts" and the no true Scotsman fallacy to the wrong
person in this conversation.

Thanks for another great demonstration of the Dunning-Kruger effect.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-25 22:14
(Received via mailing list)
On Thu, Nov 25, 2010 at 9:19 PM, Yuri Tzara <yuri.tzara@gmail.com>
wrote:
> Phillip, regarding defining 1/0 you said,
>
>> It cannot be infinity. It does, quite literally not compute. There's
>> no room for interpretation, it's a fact of (mathematical) life that
>> something divided by nothing has an undefined result. It doesn't
>> matter if it's 0, 0.0, or -0.0. Undefined is undefined.
>
> Nonsense. These claims are roundly refuted by

Actually, they aren't I said for "x_0/0 the result is undefined", and
the Extended Real number has the caveat that x_0 must be != 0.

>> I'm quite aware that IEEE 754 defines the result of x_0/0 as
>> infinity. That is not, however, correct *in a mathematical sense*.
>
> Nonsense. Infinity defined this way has solid mathematical meaning and
> is established on a firm foundation, described in the link above.

A firm foundation that is not used in algebraic math.

>> The IEEE standard, however, does *not* define how mathematics work.
>> Mathematics does that. In math, x_0/0 is *undefined*. It is not
>> infinity...
>
> Right, IEEE does not define how mathematics works. IEEE took the
> mathematical definition and properties of infinity and incorporated it
> into the standard. Clearly, you were unaware of this and repeatedly
> ignored the information offered to you about it.

It took *a* definition and *a* set of properties. If we are splitting
hairs, let's do it properly, at least.

>> Quote Wikipedia:
>> "Unlike most mathematical models of the intuitive concept of 'number',
>> this structure allows division by zero [snip formula], for nonzero a.
>> This structure, however, is not a field, and *division does not retain
>> its original algebraic meaning in it*."
>>
>> Emphasis mine.
>
> That sentence. You evidently do not understand what it means. It does
> not mean what you think it means.

You do know what algebra is, yes?

> "moving the goalposts" and the no true Scotsman fallacy to the wrong
> person in this conversation.

Pal, in algebraic maths, division by zero is undefined. End of story.
We are talking about algebraic math here (or we can extend this to
include complex numbers, which IEEE 754 doesn't deal with, either),
and not special areas of maths that aren't used in outside of research
papers. Not to mention that I established the set of Irrational
numbers as the upper bound quite early on.

The and your argument "if you use floats on a computer you use a
non-algebraic system, therefore you use a non-algebraic system when
using a computer" is circular.

The result of x_0/0.0 = infinity is as meaningful as "0/0 = NaN". Any
feedback by a computer system is meaningful (by definition), and can
be used to act on this output:

result = "Error: Division by zero" if a / 0.0 == Infinity

Done.

> Thanks for another great demonstration of the Dunning-Kruger effect.

Ah, the irony.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
0107ef1bc42d0626a706ca6af9a43060?d=identicon&s=25 Jos Backus (Guest)
on 2010-11-25 22:19
(Received via mailing list)
This discussion has little to do with Ruby at this point. Maybe you
folks
could take it offline, please?

Happy Thanksgiving, everybody!

Jos
163755a5d3a5c57bd79c4f41bdda7a22?d=identicon&s=25 Clifford Heath (Guest)
on 2010-11-25 23:28
(Received via mailing list)
James Edward Gray II wrote:
> UTF-8, UTF-16, and UTF-32 are encodings of Unicode code points.  They are all
capable of representing all code points.  Nothing in this discussion is a subset
of anything else.

To add to this, Unicode 3 uses the codespace from 0 to 0x10FFFF (not
0xFFFFFFFF),
so it does cover all the Oriental characters (unlike Unicode 2 as
implemented in
earlier Java versions, which only covers 0..0xFFFF). It even has
codepoints for
Klingon and Elvish!

UTF-8 requires four bytes to encode a 21-bit number (enough to encode
0x10FFFF)
though if you extend the pattern (as many implementations do) it has a
31-bit gamut.

UTF-16 encodes the additional codespace using surrogate pairs, which is
a pair of
16-bit numbers each carrying a 10-bit payload. Because it's still a
variable length
encoding, it's just as painful to work with as UTF-8, but less
space-efficient.

Both UTF-8 and UTF-16 encodings allow you to look at any location in a
string and step
forward or back to the nearest character boundary - a very important
property that
was missing from Shift-JIS and other earlier encodings.

If you go back to 2003 in the archives, you'll see I engaged in a long
and somewhat
heated discussion about this subject with Matz and others back then. I'm
glad we
finally have a Ruby version that can at least do this stuff properly,
even though
I think it's over-complicated.

Clifford Heath.
163755a5d3a5c57bd79c4f41bdda7a22?d=identicon&s=25 Clifford Heath (Guest)
on 2010-11-25 23:41
(Received via mailing list)
Robert Klemme wrote:
> Btw, this happens all the time: for example, people often do not grasp
> the difference between "point in time" and "representation of a point
> in time in a particular time zone and locale".

... on a particular relativistic trajectory ;-) Seriously though,
time dilation effects are accounted for in every GPS unit, because
the relative motion of the satellites gives each one its own timeline
which affects the respective position fixes.

Clifford Heath
259f23c3b129f07b0c496b9f0495f07e?d=identicon&s=25 James Edward Gray II (Guest)
on 2010-11-26 01:40
(Received via mailing list)
On Nov 25, 2010, at 4:25 PM, Clifford Heath wrote:

> Both UTF-8 and UTF-16 encodings allow you to look at any location in a string
and step forward or back to the nearest character boundary - a very important
property that was missing from Shift-JIS and other earlier encodings.

This also provides a kind of simple checksum for validating the
encoding.  I love that feature.

James Edward Gray II
1bc63d01bd3fcccc36fb030a62039352?d=identicon&s=25 David Masover (Guest)
on 2010-11-26 01:43
(Received via mailing list)
On Wednesday, November 24, 2010 08:40:22 pm Jrg W Mittag wrote:
> David Masover wrote:
> > Java at least did this sanely -- UTF16 is at least a fixed width. If
> > you're going to force a single encoding, why wouldn't you use
> > fixed-width strings?
>
> Actually, it's not.

Whoops, my mistake. I guess now I'm confused as to why they went with
UTF-16
-- I always assumed it simply truncated things which can't be
represented in
16 bits.

> You can produce corrupt strings and slice into a half-character in
> Java just as you can in Ruby 1.8.

Wait, how?

I mean, yes, you can deliberately build strings out of corrupt data, but
if
you actually work with complete strings and string concatenation, and
you
aren't doing crazy JNI stuff, and you aren't digging into the actual
bits of
the string, I don't see how you can create a truncated string.

> > The whole point of having multiple encodings in the first place is that
> > other encodings make much more sense when you're not in the US.
>
> There's also a lot of legacy data, even within the US. On IBM systems,
> the standard encoding, even for greenfield systems that are being
> written right now, is still pretty much EBCDIC all the way.

I'm really curious why anyone would go with an IBM mainframe for a
greenfield
system, let alone pick EBCDIC when ASCII is fully supported.

> And now there's a push for a One Encoding To Rule Them All in Ruby 2.
> That's *literally* insane! (One definition of insanity is repeating
> behavior and expecting a different outcome.)

Wait, what?

I've been out of the loop for awhile, so it's likely that I missed this,
but
where are these plans?
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-26 12:52
(Received via mailing list)
On Fri, Nov 26, 2010 at 1:42 AM, David Masover <ninja@slaphack.com>
wrote:
>
> I'm really curious why anyone would go with an IBM mainframe for a greenfield
> system, let alone pick EBCDIC when ASCII is fully supported.

Because that's how the other applications written on the mainframe the
company bought 20, 30, 40 years ago expect their data, and the same
code *still runs*.

Legacy systems like that have so much money invested in them, with
code poorly understood (not necessarily because it's *bad* code, but
because the original author has retired 20 years ago), and are so
mission critical, that a replacement in a more current design is out
of the question.

Want perpetual job security? Learn COBOL.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
Be9b2cbe9586595b16838980a882661e?d=identicon&s=25 Yuri Tzara (yuritzara)
on 2010-11-26 16:32
The big picture is that IEEE floating point is solidly grounded in
mathematics regarding infinity. Phillip wants to convince us that this
is not the case. He wants us to believe that the design of floating
point regarding infinity is wrong and that he knows better. He is
mistaken. That is all you need to know. Details follow.

The most direct refutation of his claims comes from the actual reason
why infinity was included in floating point:

http://docs.sun.com/source/806-3568/ncg_goldberg.html#918

Infinity prevents wildly incorrect results. It also removes the need
to check certain special cases.

Now it happens that floating point is backed by a mathematical model:
the extended real line. Phillip tells us that the extended real line
is only useful for the 1% of programmers who are mathematicians. He is
wrong. It is used every time infinity prevents an incorrect result or
simplifies a calculation. The mathematics behind floating point design
is slightly more than elementary, but that does not mean every
programmer is required to have full knowledge of it.

What follows is an examination of Phillip's descent into absurdity,
apparently caused by a compelling need justify the mantras he learned
in high school. If you are interested in the psychological phenomenon
of cognitive dissonance, or if you still think that Phillip is being
coherent, then keep reading.

This conversation began when Phillip said about 1/0,

> It cannot be infinity. It does, quite literally not compute. There's
> no room for interpretation, it's a fact of (mathematical) life that
> something divided by nothing has an undefined result. It doesn't
> matter if it's 0, 0.0, or -0.0. Undefined is undefined.
>
> That other languages have the same issue makes matters worse, not
> better (but at least it is consistent, so there's that).

It's clear here that Phillip is unaware that IEEE floating point was
designed to approximate the affinely extended real numbers, which has
a rigorous definition of infinity along with operations upon it.

http://mathworld.wolfram.com/AffinelyExtendedRealN...

Floating point infinity obeys all the rules laid out there. Also
notice the last paragraph in that link.

> The IEEE standard, however, does *not* define how mathematics work.
> Mathematics does that. In math, x_0/0 is *undefined*. It is not
> infinity (David kindly explained the difference between limits and
> numbers), it is not negative infinity, it is undefined. Division by
> zero *cannot* happen...
>
> So, from a purely mathematical standpoint, the IEEE 754 standard is
> wrong by treating the result of division by 0.0 any different than
> dividing by 0...

Here Phillip further confirms that he is unaware that IEEE used the
mathematical definition of the extended reals. He thinks infinity was
defined on the whim of the IEEE designers. No, mathematics told them
how it worked.

This conversation only continues because Phillip is trying desperately
cover up his ignorance rather than simply acknowledging it and moving
on.

I was polite when I corrected him the first time, however when he
ignored this correction along with a similar one by Adam, obstinately
repeating his mistaken belief instead, that's when directness is
required. For whatever reason he is compelled to "fake" expertise in
this area despite being repeatedly exposed for doing so. To wit:

> > Infinity defined this way has solid mathematical meaning and is
> > established on a firm foundation, described in the link above.
>
> A firm foundation that is not used in algebraic math.

This sentence is not even meaningful. What is "algebraic math"? That
phrase makes no sense, especially to a mathematician. The extended
reals is of course an algebraic structure with algebraic properties,
so whatever "algebraic math" means here must apply to the extended
reals.

> > Right, IEEE does not define how mathematics works. IEEE took the
> > mathematical definition and properties of infinity and
> > incorporated it into the standard. Clearly, you were unaware of
> > this and repeatedly ignored the information offered to you about
> > it.
>
> It took *a* definition and *a* set of properties. If we are
> splitting hairs, let's do it properly, at least.

The affinely extended reals is the two-point compactification of the
real line. The real projective line is the one-point compactification
of the real line. These compactifications are _unique_. In a desperate
display of backpedaling, Phillip only succeeds confirming his
ignorance of the topic about which he claims expertise.

> Pal, in algebraic maths, division by zero is undefined. End of
> story.  We are talking about algebraic math here...

More nonsensical "algebraic math" terminology. What is this? Do you
mean algebraic numbers? No, you can't mean that, since floating point
is used for approximating transcendentals as well. Again, the extended
reals is an algebraic structure with algebraic properties. Your
introduction of the term "algebraic math" is just more backpedaling
done in a manifestly incompetent way. In trying to move the goalposts,
the goalposts fell on your head. As if that wasn't bad enough, you
absurdly claim that I was moving goalposts.

For more on what happened to Phillip, see the Dunning-Kruger
effect. Don't let it happen to you.

http://en.wikipedia.org/wiki/Dunning-Kruger_effect
1bc63d01bd3fcccc36fb030a62039352?d=identicon&s=25 David Masover (Guest)
on 2010-11-27 09:05
(Received via mailing list)
On Friday, November 26, 2010 05:51:38 am Phillip Gawlowski wrote:
> On Fri, Nov 26, 2010 at 1:42 AM, David Masover <ninja@slaphack.com> wrote:
> > I'm really curious why anyone would go with an IBM mainframe for a
> > greenfield system, let alone pick EBCDIC when ASCII is fully supported.
>
> Because that's how the other applications written on the mainframe the
> company bought 20, 30, 40 years ago expect their data, and the same
> code *still runs*.

In other words, not _quite_ greenfield, or at least, a somewhat
different
sense of greenfield.

But I guess that explains why you're on a mainframe at all. Someone put
their
data there 20, 30, 40 years ago, and you need to get at that data,
right?

> Legacy systems like that have so much money invested in them, with
> code poorly understood (not necessarily because it's *bad* code, but
> because the original author has retired 20 years ago),

Which implies bad code, bad documentation, or both. Yes, having the
original
author available tends to make things easier, but I'm not sure I'd know
what
to do with the code I wrote 1 year ago, let alone 20, unless I document
the
hell out of it.

> Want perpetual job security? Learn COBOL.

I considered that...

It'd have to be job security plus a large enough paycheck I could either
work
very part-time, or retire in under a decade. Neither of these seems
likely, so
I'd rather work with something that gives me job satisfaction, which is
why
I'm doing Ruby.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-27 18:42
(Received via mailing list)
On Sat, Nov 27, 2010 at 9:04 AM, David Masover <ninja@slaphack.com>
wrote:
> sense of greenfield.
You don't expect anyone to throw their older mainframes away, do you? ;)

> But I guess that explains why you're on a mainframe at all. Someone put their
> data there 20, 30, 40 years ago, and you need to get at that data, right?

Oh, don't discard mainframes. For a corporation the size of SAP (or
needing SAP software), a mainframe is still the ideal hardware to
manage the enormous databases collected over the years.

And mainframes with vector CPUs are ideal for all sorts of simulations
engineers have to do (like aerodynamics), or weather research.

>> Legacy systems like that have so much money invested in them, with
>> code poorly understood (not necessarily because it's *bad* code, but
>> because the original author has retired 20 years ago),
>
> Which implies bad code, bad documentation, or both. Yes, having the original
> author available tends to make things easier, but I'm not sure I'd know what
> to do with the code I wrote 1 year ago, let alone 20, unless I document the
> hell out of it.

It gets worse 20 years down the line: The techniques used and state of
the art then are forgotten now, for example (nobody uses GOTO, or
should use it, anyway) any more, and error handling is done with
exceptions these days, instead of error codes, for example. And TDD
didn't even *exist* as a technique.

Together with a very, very conservative attitude, changes are
difficult to deal with, if they can be implemented at all.

Assuming the source code still exists, anyway.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
1bc63d01bd3fcccc36fb030a62039352?d=identicon&s=25 David Masover (Guest)
on 2010-11-27 19:51
(Received via mailing list)
On Saturday, November 27, 2010 11:41:59 am Phillip Gawlowski wrote:
> On Sat, Nov 27, 2010 at 9:04 AM, David Masover <ninja@slaphack.com> wrote:
> > On Friday, November 26, 2010 05:51:38 am Phillip Gawlowski wrote:
> >> On Fri, Nov 26, 2010 at 1:42 AM, David Masover <ninja@slaphack.com>
wrote:
>
> You don't expect anyone to throw their older mainframes away, do you? ;)

I suppose I expected people to be developing modern Linux apps that just
happen to compile on that hardware.

> > But I guess that explains why you're on a mainframe at all. Someone put
> > their data there 20, 30, 40 years ago, and you need to get at that data,
> > right?
>
> Oh, don't discard mainframes. For a corporation the size of SAP (or
> needing SAP software), a mainframe is still the ideal hardware to
> manage the enormous databases collected over the years.

Well, now that it's been collected, sure -- migrations are painful.

But then, corporations the size of Google tend to store their
information
distributed on cheap PC hardware.

> And mainframes with vector CPUs are ideal for all sorts of simulations
> engineers have to do (like aerodynamics), or weather research.

When you say "ideal", do you mean they actually beat out the cluster of
commodity hardware I could buy for the same price?

> the art then are forgotten now, for example (nobody uses GOTO, or
> should use it, anyway) any more, and error handling is done with
> exceptions these days, instead of error codes, for example. And TDD
> didn't even *exist* as a technique.
>
> Together with a very, very conservative attitude, changes are
> difficult to deal with, if they can be implemented at all.
>
> Assuming the source code still exists, anyway.

All three of which suggest to me that in many cases, an actual
greenfield
project would be worth it. IIRC, there was a change to the California
minimum
wage that would take 6 months to implement and 9 months to revert
because it
was written in COBOL -- but could the same team really write a new
payroll
system in 15 months? Maybe, but doubtful.

But it's still absurdly wasteful. A rewrite would pay for itself with
only a
few minor changes that'd be trivial in a sane system, but major
year-long
projects with the legacy system.

So, yeah, job security. I'd just hate my job.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-27 21:48
(Received via mailing list)
On Sat, Nov 27, 2010 at 7:50 PM, David Masover <ninja@slaphack.com>
wrote:
>
> I suppose I expected people to be developing modern Linux apps that just
> happen to compile on that hardware.

Linux is usually not the OS the vendor supports. Keep in mind, a day
of lost productivity on this kind of systems means losses in the
millions of dollars area.

> But then, corporations the size of Google tend to store their information
> distributed on cheap PC hardware.

If they were incorporated where there was such a thing as "cheap PC
hardware". Google is a young corporation, even in IT. And they need
loads of custom code to make their search engine and datacenters
perform and scale, too.

>> And mainframes with vector CPUs are ideal for all sorts of simulations
>> engineers have to do (like aerodynamics), or weather research.
>
> When you say "ideal", do you mean they actually beat out the cluster of
> commodity hardware I could buy for the same price?

Sure, if you can shell out for about 14 000 Xeon CPUs and 7 000 Tesla
GPGPUs (Source: http://en.wikipedia.org/wiki/Tianhe-I ).

> All three of which suggest to me that in many cases, an actual greenfield
> project would be worth it. IIRC, there was a change to the California minimum
> wage that would take 6 months to implement and 9 months to revert because it
> was written in COBOL -- but could the same team really write a new payroll
> system in 15 months? Maybe, but doubtful.

So, you'd bet the corporation on the size of Exxon Mobile, Johnson &
Johnson, General Electric and similar, just because you *think* it is
easier to do changes 40 years later in an unproven, unused, upstart
language?

The clocks in the sort of shops that still run mainframes tick very
different from what you or I are used to.

> But it's still absurdly wasteful. A rewrite would pay for itself with only a
> few minor changes that'd be trivial in a sane system, but major year-long
> projects with the legacy system.

If the rewrite would pay for itself in the short term, then why hasn't
it been done?

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2010-11-27 22:35
Robert Klemme wrote in post #963807:
>> But that basically is my point. In order to make your program
>> comprehensible, you have to add extra incantations so that strings are
>> tagged as UTF-8 everywhere (e.g. when opening files).
>>
>> However this in turn adds *nothing* to your program or its logic, apart
>> from preventing Ruby from raising exceptions.
>
> Checking input and ensuring that data reaches the program in proper
> ways is generally good practice for robust software.

But that's not what Ruby does!.

If you do
  s1 = File.open("foo","r:UTF-8").gets
it does *not* check that the data is UTF-8. It just adds a tag saying
that it is.

Then later, when you get s2 from somewhere else, and have a line like s3
= s1 + s2, it *might* raise an exception if the encodings are different.
Or it might not, depending on the actual content of the strings at that
time.

Say s2 is a string read from a template. It may work just fine, as long
as s2 contains only ASCII characters. But later, when you decide to
translate the program and add some non-ASCII characters into the
template, it may blow up.

If it blew up on the invalid data, I'd accept that. If it blew up
whenever two strings of different encodings encounter, I'd accept that.
But to have your program work through sheer chance, only to blow up some
time later when it encounters a different input stream - no, that sucks.

In that case, I would much rather the program didn't crash, but at least
carried on working (even in the garbage-in-garbage-out sense).

> Brian, it seems you want to avoid the complex matter of i18n - by
> ignoring it.  But if you work in a situation where multiple encodings
> are mixed you will be forced to deal with it - sooner or later.

But you're never going to want to combine two strings of different
encodings without transcoding them to a common encoding, as that
wouldn't make sense.

So either:

1. Your program deals with the same encoding from input through to
output, in which case there's nothing to do

2. You transcode at the edges into and out of your desired common
encoding

Neither approach requires each individual string to carry its encoding
along with it.
C5be24289f1471f3da84864a6677af12?d=identicon&s=25 Garance A Drosehn (Guest)
on 2010-11-28 00:06
(Received via mailing list)
On Wed, Nov 24, 2010 at 11:07 AM, James Edward Gray II
<james@graysoftinc.com> wrote:
> On Nov 24, 2010, at 9:47 AM, Phillip Gawlowski wrote:
>>
>> Convert your strings to UTF-8 at all times, and you are done. You have
>> to check for data integrity anyway, so you can do that in one go.
>
> Thank you for being the voice of reason.
>
> I've fought against Brian enough in the past over this issue, that I try to stay
out of it these days. However, his arguments always strike me as wanting to
unlearn what we have learned about encodings.
>
> We can't go back. Different encodings exist. At least Ruby 1.9 allows us to work
with them.


My experience with 1.9 so far is that some of my ruby scripts have
become much faster. I have other scripts which have needed to deal
with a much wider range of characters than "standard ascii". I got
those string-related scripts working fine in 1.8. They all seem to
break in 1.9.

In my own opinion, the problem isn't 1.9, is that I wrote these
string-handling scripts in ruby before ruby really supported all the
characters I had to deal with. I look forward to getting my scripts
switched over to 1.9, but there's no question that *getting* to 1.9 is
going to require a bunch of work from me. That's just the way it is.
Not the fault of ruby 1.9, but it's still some work to fix the
scripts.
1bc63d01bd3fcccc36fb030a62039352?d=identicon&s=25 David Masover (Guest)
on 2010-11-28 01:57
(Received via mailing list)
On Saturday, November 27, 2010 02:47:12 pm Phillip Gawlowski wrote:
> On Sat, Nov 27, 2010 at 7:50 PM, David Masover <ninja@slaphack.com> wrote:
> > I suppose I expected people to be developing modern Linux apps that just
> > happen to compile on that hardware.
>
> Linux is usually not the OS the vendor supports. Keep in mind, a day
> of lost productivity on this kind of systems means losses in the
> millions of dollars area.

In other words, you need someone who will support it, and maybe someone
who'll
accept that kind of risk. None of the Linux vendors are solid enough? Or
is it
that they don't support mainframes?

> >> And mainframes with vector CPUs are ideal for all sorts of simulations
> >> engineers have to do (like aerodynamics), or weather research.
> >
> > When you say "ideal", do you mean they actually beat out the cluster of
> > commodity hardware I could buy for the same price?
>
> Sure, if you can shell out for about 14 000 Xeon CPUs and 7 000 Tesla
> GPGPUs (Source: http://en.wikipedia.org/wiki/Tianhe-I ).

From that page:

"Both the original Tianhe-1 and Tianhe-1A use a Linux-based operating
system... Each blade is composed of two compute nodes, with each compute
node
containing two Xeon X5670 6-core processors and one Nvidia M2050 GPU
processor."

I'm not really seeing a difference in terms of hardware.

> > All three of which suggest to me that in many cases, an actual greenfield
> > project would be worth it. IIRC, there was a change to the California
> > minimum wage that would take 6 months to implement and 9 months to
> > revert because it was written in COBOL -- but could the same team really
> > write a new payroll system in 15 months? Maybe, but doubtful.
>
> So, you'd bet the corporation

Nope, which is why I said "doubtful."

> just because you *think* it is
> easier to do changes 40 years later in an unproven, unused, upstart
> language?

Sorry, "unproven, unused, upstart"? Which language are you talking
about?

> > But it's still absurdly wasteful. A rewrite would pay for itself with
> > only a few minor changes that'd be trivial in a sane system, but major
> > year-long projects with the legacy system.
>
> If the rewrite would pay for itself in the short term, then why hasn't
> it been done?

The problem is that it doesn't. What happens is that those "few minor
changes"
get written off as "too expensive", so they don't happen. Every now and
then,
it's actually worth the expense to make a "drastic" change anyway, but
at that
point, again, 15 months versus a greenfield rewrite -- the 15 months
wins.

So it very likely does pay off in the long run -- being flexible makes
good
business sense, and sooner or later, you're going to have to push
another of
those 15-month changes. But it doesn't pay off in the short run, and
it's hard
to predict how long it will be until it does pay off. The best you can
do is
say that it's very likely to pay off someday, but modern CEOs get
rewarded in
the short term, then take their pensions and let the next guy clean up
the
mess, so there isn't nearly enough incentive for long-term thinking.

And I'm not sure I could make a solid case that it'd pay for itself
eventually. I certainly couldn't do so without looking at the individual
situation. Still wasteful, but maybe not worth fixing.

Also, think about the argument you're using here. Why hasn't it been
done? I
can think of a few reasons, some saner than others, but sometimes the
answer
to "Why hasn't it been done?" is "Everybody was wrong." Example: "If it
was
possible to give people gigabytes of email storage for free, why hasn't
it
been done?" Then Gmail did, and the question became "Clearly it's
possible to
give people gigabytes of email storage for free. Why isn't Hotmail doing
it?"
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-28 15:01
(Received via mailing list)
On Sun, Nov 28, 2010 at 1:56 AM, David Masover <ninja@slaphack.com>
wrote:
>
> In other words, you need someone who will support it, and maybe someone who'll
> accept that kind of risk. None of the Linux vendors are solid enough? Or is it
> that they don't support mainframes?

Both, and the Linux variant you use has to be certified by the
hardware vendor, too. Essentially, a throwback to the UNIX
workstations of yore: if you run something uncertified, you don't get
the support you paid for in the first place.

> "Both the original Tianhe-1 and Tianhe-1A use a Linux-based operating
> system... Each blade is composed of two compute nodes, with each compute node
> containing two Xeon X5670 6-core processors and one Nvidia M2050 GPU
> processor."
>
> I'm not really seeing a difference in terms of hardware.

We are probably talking on cross purposes here:
You *can* build a vector CPU cluster out of commodity hardware, but it
involves a) a lot of hardware and b) a lot of customization work to
get them to play well with each other (like concurrency, and avoiding
bottlenecks that leads to a hold up in several nodes of you cluster).

> Sorry, "unproven, unused, upstart"? Which language are you talking about?

Anything that isn't C, ADA or COBOL. Or even older. This is a very,
very conservative mindset, where not even Java has a chance.

> So it very likely does pay off in the long run -- being flexible makes good
> business sense, and sooner or later, you're going to have to push another of
> those 15-month changes. But it doesn't pay off in the short run, and it's hard
> to predict how long it will be until it does pay off. The best you can do is
> say that it's very likely to pay off someday, but modern CEOs get rewarded in
> the short term, then take their pensions and let the next guy clean up the
> mess, so there isn't nearly enough incentive for long-term thinking.

Don't forget the engineering challenge. Doing the Great Rewrite for
software that's 20 years in use (or even longer), isn't something that
is done on a whim, or because this new-fangled "agile movement" is
something the programmers like.

Unless there is a very solid business case (something on the level of
"if we don't do this, we will go bankrupt in 10 days" or similarly
drastic), there is no incentive to fix what ain't broke (for certain
values of "ain't broke", anyway).

> Also, think about the argument you're using here. Why hasn't it been done? I
> can think of a few reasons, some saner than others, but sometimes the answer
> to "Why hasn't it been done?" is "Everybody was wrong." Example: "If it was
> possible to give people gigabytes of email storage for free, why hasn't it
> been done?" Then Gmail did, and the question became "Clearly it's possible to
> give people gigabytes of email storage for free. Why isn't Hotmail doing it?"

Google has a big incentive, and a big benefit going for it:
a) Google wants your data, so they can sell you more and better ads.
b) The per MB cost of hard drives came down *significantly* in the
last 10 years. For my external 1TB HD I paid about 50 bucks, and for
my internal 500GB 2.5" HD I paid about 50 bucks. For that kind of
money, you couldn't buy a 500 GB HD 5 years ago.

Without cheap storage, free email accounts with Gigabytes of storage
are pretty much impossible.

CUDA and GPGPUs have become available only in the last few years, and
only because GPUs have become insanely powerful and insanely cheap at
the same time.

If you were building the architecture that requires mainframes today,
I doubt anyone would buy a Cray without some very serious
considerations (power consumption, ease of maintenance, etc) in favor
of the Cray.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
1bc63d01bd3fcccc36fb030a62039352?d=identicon&s=25 David Masover (Guest)
on 2010-11-28 17:36
(Received via mailing list)
On Sunday, November 28, 2010 08:00:18 am Phillip Gawlowski wrote:
> On Sun, Nov 28, 2010 at 1:56 AM, David Masover <ninja@slaphack.com> wrote:
> > In other words, you need someone who will support it, and maybe someone
> > who'll accept that kind of risk. None of the Linux vendors are solid
> > enough? Or is it that they don't support mainframes?
>
> Both, and the Linux variant you use has to be certified by the
> hardware vendor, too. Essentially, a throwback to the UNIX
> workstations of yore: if you run something uncertified, you don't get
> the support you paid for in the first place.

Must be some specific legacy systems, because IBM does seem to be
supporting,
or at least advertising, Linux on System Z.

> get them to play well with each other (like concurrency, and avoiding
> bottlenecks that leads to a hold up in several nodes of you cluster).

Probably. You originally called this a "Mainframe", and that's what
confused
me -- it definitely seems to be more a cluster than a mainframe, in
terms of
hardware and software.

> > Sorry, "unproven, unused, upstart"? Which language are you talking about?
>
> Anything that isn't C, ADA or COBOL. Or even older.

Lisp, then?

> This is a very,
> very conservative mindset, where not even Java has a chance.

If age is the only consideration, Java is only older than Ruby by a few
months, depending how you count.

I'm not having a problem with it being a conservative mindset, but it
seems
irrationally so. Building a mission-critical system which is not allowed
to
fail out of a language like C, where an errant pointer can corrupt data
in an
entirely different part of the program (let alone expose
vulnerabilities),
seems much riskier than the alternatives.

About the strongest argument I can see in favor of something like C over
something like Lisp for a greenfield project is that it's what everyone
knows,
it's what the schools are teaching, etc. Of course, the entire reason
the
schools are teaching COBOL is that the industry demands it.

> Don't forget the engineering challenge. Doing the Great Rewrite for
> software that's 20 years in use (or even longer), isn't something that
> is done on a whim, or because this new-fangled "agile movement" is
> something the programmers like.

I'm not disputing that.

> Unless there is a very solid business case (something on the level of
> "if we don't do this, we will go bankrupt in 10 days" or similarly
> drastic), there is no incentive to fix what ain't broke (for certain
> values of "ain't broke", anyway).

This is what I'm disputing. This kind of thinking is what allows
companies
like IBM to be completely blindsided by companies like Microsoft.

> Google has a big incentive, and a big benefit going for it:

Which doesn't change my core point. After all:

> a) Google wants your data, so they can sell you more and better ads.

What's Microsoft's incentive for running Hotmail at all? I have to
imagine
it's a similar business model.

> b) The per MB cost of hard drives came down *significantly* in the
> last 10 years.

Yes, but Google was the first to offer this. And while it makes sense in
hindsight, when it first came out, people were astonished. No one
immediately
said "Oh, this makes business sense." They were too busy rushing to
figure out
how they could use this for their personal backup, since gigabytes of
online
storage for free was unprecedented.

Then, relatively quickly, everyone else did the same thing, because
people
were leaving Hotmail for Gmail for the storage alone, and no one wanted
to be
the "10 mb free" service when everyone else was offering over a hundred
times
as much.

I'm certainly not saying people should do things just because they're
cool, or
because programmers like them. Clearly, there has to be a business
reason. But
the fact that no one's doing it isn't a reason to assume it's a bad
idea.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-28 18:19
(Received via mailing list)
On Sun, Nov 28, 2010 at 5:33 PM, David Masover <ninja@slaphack.com>
wrote:
>
> Must be some specific legacy systems, because IBM does seem to be supporting,
> or at least advertising, Linux on System Z.

Oh, they do. But it's this specific Linux, and you get locked into it.
Compile the kernel yourself, and you lose support.

And, of course, IBM does that to keep their customers locked in. While
Linux is open source, it's another angle for IBM to stay in the game.
Not all that successful, considering that mainframes are pretty much a
dying breed, but it keeps this whole sector on life support.

> Probably. You originally called this a "Mainframe", and that's what confused
> me -- it definitely seems to be more a cluster than a mainframe, in terms of
> hardware and software.

Oh, it is. You can't build a proper mainframe out of off the shelf
components, but a mainframe is a cluster of CPUs and memory, anyway,
so you can "mimic" the architecture.

>> > Sorry, "unproven, unused, upstart"? Which language are you talking about?
>>
>> Anything that isn't C, ADA or COBOL. Or even older.
>
> Lisp, then?

If there's commercial support, then, yes. The environment LISP comes
from is the AI research in MIT, which was done on mainframes, way back
when.

>> This is a very,
>> very conservative mindset, where not even Java has a chance.
>
> If age is the only consideration, Java is only older than Ruby by a few
> months, depending how you count.

It isn't. Usage on mainframes is a component, too. And perceived
stability and roadmap safety (a clear upgrade path is desired quite a
bit, I wager).

And, well, Java and Ruby are young languages, all told. Mainframes
exist since the 1940s at the very least, and that's the perspective
that enabled "Nobody ever got fired for buying IBM [mainframes]".

> I'm not having a problem with it being a conservative mindset, but it seems
> irrationally so. Building a mission-critical system which is not allowed to
> fail out of a language like C, where an errant pointer can corrupt data in an
> entirely different part of the program (let alone expose vulnerabilities),
> seems much riskier than the alternatives.

That is a problem of coding standards and practices. Another reason
why change in these sorts of systems is difficult to achieve. Now
imagine a language like Ruby that comes with things like reflection,
duck typing, and dynamic typing.

> About the strongest argument I can see in favor of something like C over
> something like Lisp for a greenfield project is that it's what everyone knows,
> it's what the schools are teaching, etc. Of course, the entire reason the
> schools are teaching COBOL is that the industry demands it.

A vicious cycle, indeed. Mind, for system level stuff C is still the
goto language, but not for anything that sits above that. At least,
IMO.

>> Unless there is a very solid business case (something on the level of
>> "if we don't do this, we will go bankrupt in 10 days" or similarly
>> drastic), there is no incentive to fix what ain't broke (for certain
>> values of "ain't broke", anyway).
>
> This is what I'm disputing. This kind of thinking is what allows companies
> like IBM to be completely blindsided by companies like Microsoft.

Assuming that the corporation is actually an IT shop. Proctor &
Gamble, or ThyssenKrupp aren't. For them, IT is supporting the actual
business, and is much more of a cost center than a way to stay
competitive.

Or do you care if the steel beams you buy by the ton, or the cleaner
you buy are produced by a company that does its ERP on a mainframe or
a beowulf cluster?

>> Google has a big incentive, and a big benefit going for it:
>
> Which doesn't change my core point. After all:
>
>> a) Google wants your data, so they can sell you more and better ads.
>
> What's Microsoft's incentive for running Hotmail at all? I have to imagine
> it's a similar business model.

Since MS doesn't seem to have a clue, either...

Historically, MS bought hotmail, because every body else started
offering free email accounts, and not just ISPs.

And Hotmail still smells of "me, too"-ism.

>> b) The per MB cost of hard drives came down *significantly* in the
>> last 10 years.
>
> Yes, but Google was the first to offer this. And while it makes sense in
> hindsight, when it first came out, people were astonished. No one immediately
> said "Oh, this makes business sense." They were too busy rushing to figure out
> how they could use this for their personal backup, since gigabytes of online
> storage for free was unprecedented.

Absolutely. And Google managed to give possible AdWords customers
another reason to use AdSense: "Look, there's a million affluent,
tech-savvy people using our mail service, which allows us to mine the
data and to show your ads that much more effectvely!"

> Then, relatively quickly, everyone else did the same thing, because people
> were leaving Hotmail for Gmail for the storage alone, and no one wanted to be
> the "10 mb free" service when everyone else was offering over a hundred times
> as much.

That, and Google was the cool kid on the block back then. Which counts
for quite a bit, too. And the market of freemail offerings was rather
stale, until GMail shook it up, and got lots of mind share really
fast.

But most people stuck with their AOL mail addresses, since they didn't
care about storage, but cared about stuff working. The technorati
quickly switched (I'm guilty as charged), but aunts, and granddads
kept their AOL, EarthLink, or Yahoo! accounts.

> I'm certainly not saying people should do things just because they're cool, or
> because programmers like them. Clearly, there has to be a business reason. But
> the fact that no one's doing it isn't a reason to assume it's a bad idea.

Of course. But if a whole sector, a whole user base, says "Thanks, but
no thanks", it has its reasons, too. Cost is one, and the human nature
of liking stability and disliking change plays into it, as well.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2010-11-28 18:20
(Received via mailing list)
On 26.11.2010 01:42, David Masover wrote:
> 16 bits.
The JLS is a bit difficult to read IMHO.  Characters are 16 bit and a
single character covers the range of code points 0000 to FFFF.

http://java.sun.com/docs/books/jls/third_edition/h...

Characters with code points greater than FFFF are called "supplementary
characters" and while UTF-16 provides encodings for them as well, these
need two code units (four bytes).  They write "The Java programming
language represents text in sequences of 16-bit code units, using the
UTF-16 encoding.":

http://java.sun.com/docs/books/jls/third_edition/h...

IMHO this is not very precise: all calculations based on char can not
directly represent the supplementary characters.  These use just a
subset of UTF-16.  If you want to work with supplementary characters
things get really awful.  Then you need methods like this one

http://download.oracle.com/javase/6/docs/api/java/...)

And if you stuff this sequence into a String all of a sudden
String.length() does no longer return the length in characters what is
in line with what the JavaDocs states

http://download.oracle.com/javase/6/docs/api/java/...)

Unfortunately the majority of programs I have seen never takes this into
account and uses String.length() as "length in characters".  This awful
mixture becomes apparent in the JavaDoc of class Character, which
explicitly states that there are two ways to deal with characters:

1. type char (no supplementary supported)
2. type int (with supplementary)

http://download.oracle.com/javase/6/docs/api/java/...

>> You can produce corrupt strings and slice into a half-character in
>> Java just as you can in Ruby 1.8.
>
> Wait, how?

You can convert a code point above FFFF via Character.toChars() (which
returns a char[] of length 2) and truncate it to 1.  But: the resulting
sequence isn't actually invalid since all values in the range 0000 to
FFFF are valid characters.  This isn't really robust.  Even though the
docs say that the longest matching sequence is to be considered during
decoding there is no reliably way to determine whether d80d dd53
represents a single character (code point 013553) or two separate
characters (code points d80d and dd53).

If you like you can play around a bit with this:
https://gist.github.com/719100

> I mean, yes, you can deliberately build strings out of corrupt data, but if
> you actually work with complete strings and string concatenation, and you
> aren't doing crazy JNI stuff, and you aren't digging into the actual bits of
> the string, I don't see how you can create a truncated string.

Well, you can (see above) but unfortunately it is still valid.  It just
happens to represent a different sequence.

Kind regards

  robert
1bc63d01bd3fcccc36fb030a62039352?d=identicon&s=25 David Masover (Guest)
on 2010-11-28 21:20
(Received via mailing list)
On Sunday, November 28, 2010 11:19:06 am Phillip Gawlowski wrote:
> > Probably. You originally called this a "Mainframe", and that's what
> > confused me -- it definitely seems to be more a cluster than a
> > mainframe, in terms of hardware and software.
>
> Oh, it is. You can't build a proper mainframe out of off the shelf
> components, but a mainframe is a cluster of CPUs and memory, anyway,
> so you can "mimic" the architecture.

When I hear "mainframe", I think of a combination of hardware and
software
(zOS) which you actually can't get anywhere else, short of an emulator
(like
Hercules).

> >> This is a very,
> >> very conservative mindset, where not even Java has a chance.
> >
> > If age is the only consideration, Java is only older than Ruby by a few
> > months, depending how you count.
>
> It isn't. Usage on mainframes is a component, too.

IBM does seem to be aggressively promoting not just Linux on mainframes,
but a
Unix subsystem and support for things like Java.

> And perceived
> stability and roadmap safety (a clear upgrade path is desired quite a
> bit, I wager).

Is there "roadmap safety" in C, though?

> And, well, Java and Ruby are young languages, all told. Mainframes
> exist since the 1940s at the very least, and that's the perspective
> that enabled "Nobody ever got fired for buying IBM [mainframes]".

Right, that's why I mentioned Lisp. They're old enough that I'd argue
the time
to be adopting is now, but I can see someone with a mainframe several
times
older wanting to wait and see.

> > I'm not having a problem with it being a conservative mindset, but it
> > seems irrationally so. Building a mission-critical system which is not
> > allowed to fail out of a language like C, where an errant pointer can
> > corrupt data in an entirely different part of the program (let alone
> > expose vulnerabilities), seems much riskier than the alternatives.
>
> That is a problem of coding standards and practices.

There's a limit to what you can do with that, though.

> Another reason
> why change in these sorts of systems is difficult to achieve. Now
> imagine a language like Ruby that comes with things like reflection,
> duck typing, and dynamic typing.

In practice, it doesn't seem like any of these are as much of a problem
as the
static-typing people fear. Am I wrong?

Given the same level of test coverage, a bug that escapes through a Ruby
test
suite (particularly unit tests) might lead to something like an
"undefined
method" exception from a nil -- relatively easy to track down. In Java,
it
might lead to NullPointerExceptions and the like. In C, it could lead to
_anything_, including silently corrupting other parts of the program.

Technically, it's _possible_ Ruby could do anything to any other part of
the
program via things like reflection -- but this is trivial to enforce.
People
generally don't monkey-patch core stuff, and monkey-patching is easy to
avoid,
easy to catch, and relatively easy to do safely in one place, and avoid
throughout the rest of your program.

Contrast to C -- it's not like you can avoid pointers, arrays, pointer
arithmetic, etc. And Ruby at least has encapsulation and namespacing --
I
really wouldn't want to manage a large project in C.

> > About the strongest argument I can see in favor of something like C over
> > something like Lisp for a greenfield project is that it's what everyone
> > knows, it's what the schools are teaching, etc. Of course, the entire
> > reason the schools are teaching COBOL is that the industry demands it.
>
> A vicious cycle, indeed.

I have to wonder if it would be worth it for any of these companies to
start
demanding Lisp. Ah, well.

> Mind, for system level stuff C is still the
> goto language, but not for anything that sits above that. At least,
> IMO.

For greenfield system-level stuff, I'd be seriously considering
something like
Google's Go. But my opinion probably isn't worth much here, as I don't
really
do system-level stuff if I can avoid it (which is almost always). If I
had to,
I'd pass as much off to userland as I could get away with.

>
> Or do you care if the steel beams you buy by the ton, or the cleaner
> you buy are produced by a company that does its ERP on a mainframe or
> a beowulf cluster?

Not particularly, but I do care if someone else can sell me those beams
cheaper. Even just as a cost center, it matters how much it costs.

And who knows? Maybe someone else just implemented a feature that
actually
does matter to me. Maybe they anticipate when their customers need more
steel
and make them an offer then, or maybe they provide better and tighter
estimates as to when it'll be ready and how long it'll take to ship --
maybe
it's an emergency, I need several tons RIGHT NOW, and someone else
manages
their inventory just a bit better, so they can get it to me days
earlier.

Granted, it's a slower industry, so maybe spending years (or decades!)
on
changes like the above makes sense. Maybe no one is offering or asking
for the
features I've suggested -- I honestly don't know. But this is why it can
matter than one organization can implement a change in a few weeks, even
a few
months, while another would take years and will likely just give up.

> But most people stuck with their AOL mail addresses, since they didn't
> care about storage, but cared about stuff working. The technorati
> quickly switched (I'm guilty as charged), but aunts, and granddads
> kept their AOL, EarthLink, or Yahoo! accounts.

Most of them, for awhile.

But even granddads have grandkids emailing them photos, so there goes
that 10
megs. Now they have to delete stuff, possibly download it and then
delete it.
A grandkid hears them complaining and suggests switching to Gmail.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-28 23:32
(Received via mailing list)
On Sun, Nov 28, 2010 at 9:19 PM, David Masover <ninja@slaphack.com>
wrote:
>
> Is there "roadmap safety" in C, though?

Since it is, technically, a standardized language, with defined
behavior in all cases (as if), it is.

Though, considering C++0x was supposed to be finished two years ago...

>> That is a problem of coding standards and practices.
>
> There's a limit to what you can do with that, though.

One of cost. Nobody wants to spend the amount of money that NASA
spends on the source for the Space Shuttle, but that code is
guaranteed bug free. Not sure which language is used, though, but I
guess it's ADA.


> In practice, it doesn't seem like any of these are as much of a problem as the
> static-typing people fear. Am I wrong?

Nope. But perceived risk outweighs actual risk. See also: US policy
since 2001 vis a vis terrorism.

> throughout the rest of your program.
You know that, I know that, but the CTO of Johnson and Johnson
doesn't, and probably doesn't care. Together with the usual
bureaucratic infighting and processes to change *anything*, you'll be
SOL most of the time. Alas.

> Contrast to C -- it's not like you can avoid pointers, arrays, pointer
> arithmetic, etc. And Ruby at least has encapsulation and namespacing -- I
> really wouldn't want to manage a large project in C.

Neither would I. But then again, there's a lot of knowledge for
managing large C code bases. Just look at the Linux kernel, or Windows
NT.

> and make them an offer then, or maybe they provide better and tighter
> estimates as to when it'll be ready and how long it'll take to ship -- maybe
> it's an emergency, I need several tons RIGHT NOW, and someone else manages
> their inventory just a bit better, so they can get it to me days earlier.

Production, these days, is Just In Time. To stay with our steel
example: Long before the local county got around to nodding your
project through so that you can begin building, you already know what
components you need, and when (since *you* want to be under budget,
and on time, too), so you order 100 beams of several kinds of steel,
and your aggregates, and bribe the local customs people, long before
you actually need the hardware.

There's (possibly) prototyping, testing (few 100MW turbines can be
built in series, because demands change with every application), and
nobody keeps items like steel beams (or even cars!) in storage
anymore. ;)

Similar with just about anything that is bought in large quantities
and / or with loads of lead time (like the 787, or A380).

In a nutshell: being a day early, or even a month, doesn't pay off
enough to make it worthwhile to restructure the whole company's
production processes, just because J. Junior Developer found a way to
shave a couple of seconds off of the DB query to send off ordering
iron ore. ;)

> Granted, it's a slower industry, so maybe spending years (or decades!) on
> changes like the above makes sense. Maybe no one is offering or asking for the
> features I've suggested -- I honestly don't know. But this is why it can
> matter than one organization can implement a change in a few weeks, even a few
> months, while another would take years and will likely just give up.

Since it takes *years* to build a modern production facility, it *is*
a slower industry, all around. IT is special in that iterates through
hardware, software, and techniques much faster that the rest of the
world.

And an anecdote:
A large-ish steel works corp introduced a PLC system to monitor their
furnaces down to the centidegree Celsius, and the "recipe" down to the
gram. After a week, they deactivated the stuff, since the steel
produced wasn't up to spec, and the veteran cookers created much
better steel, and cheaper.

> Most of them, for awhile.
>
> But even granddads have grandkids emailing them photos, so there goes that 10
> megs. Now they have to delete stuff, possibly download it and then delete it.
> A grandkid hears them complaining and suggests switching to Gmail.

Except that AOhoo! upgraded their storage. And you'd be surprised
how... stubborn non-techies can be. One reason why I don't do family
support anymore. :P

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2010-11-29 09:12
(Received via mailing list)
On Sun, Nov 28, 2010 at 6:20 PM, Robert Klemme
<shortcutter@googlemail.com> wrote:
> On 26.11.2010 01:42, David Masover wrote:
>>
>> On Wednesday, November 24, 2010 08:40:22 pm Jrg W Mittag wrote:

>> I mean, yes, you can deliberately build strings out of corrupt data, but
>> if
>> you actually work with complete strings and string concatenation, and you
>> aren't doing crazy JNI stuff, and you aren't digging into the actual bits
>> of
>> the string, I don't see how you can create a truncated string.
>
> Well, you can (see above) but unfortunately it is still valid. It just
> happens to represent a different sequence.

After reading http://tools.ietf.org/html/rfc2781#section-2.2 I am not
sure any more whether the last statement still holds.  It seems the
presented algorithm can only work reliable if certain code points are
unused.  And indeed checking with
http://www.unicode.org/charts/charindex.html shows that D800 and DC00
are indeed reserved.  Interestingly enough Java's
Character.isDefined() returns true for D800 and DC00:

https://gist.github.com/719100

Cheers

robert
1bc63d01bd3fcccc36fb030a62039352?d=identicon&s=25 David Masover (Guest)
on 2010-11-29 10:49
(Received via mailing list)
On Sunday, November 28, 2010 04:29:34 pm Phillip Gawlowski wrote:
> On Sun, Nov 28, 2010 at 9:19 PM, David Masover <ninja@slaphack.com> wrote:
> > In practice, it doesn't seem like any of these are as much of a problem
> > as the static-typing people fear. Am I wrong?
>
> Nope. But perceived risk outweighs actual risk. See also: US policy
> since 2001 vis a vis terrorism.

Sounds like we don't actually disagree.

> > monkey-patching is easy to avoid, easy to catch, and relatively easy to
> > do safely in one place, and avoid throughout the rest of your program.
>
> You know that, I know that, but the CTO of Johnson and Johnson
> doesn't,

Then why the fsck is he CTO of anything?

> and probably doesn't care.

This is the part I don't get.
How do you get to be CTO by not caring about technology?

> Together with the usual
> bureaucratic infighting and processes to change *anything*, you'll be
> SOL most of the time. Alas.

Which is, again, a point I'd hope the free market would resolve. If
there's a
way to build a relatively large corporation without bureaucracy and
process
crippling actual progress, you'd think that'd be a competitive
advantage.

> > Contrast to C -- it's not like you can avoid pointers, arrays, pointer
> > arithmetic, etc. And Ruby at least has encapsulation and namespacing -- I
> > really wouldn't want to manage a large project in C.
>
> Neither would I. But then again, there's a lot of knowledge for
> managing large C code bases. Just look at the Linux kernel, or Windows
> NT.

In each case, there wasn't really a better option, and likely still
isn't.

Still, I don't know about Windows, but on Linux, there seems to be a
push to
keep the kernel as small as it can be without losing speed or
functionality.
There were all sorts of interesting ideas in filesystems, but now we
have
fuse, so there's no need for ftpfs in the kernel. Once upon a time,
there was
a static HTTP server in the kernel, but even a full apache in userspace
is
fast enough.

And the reason is clear: Something blows up in a C program, it can
affect
anything else in that program, or any memory it's connected to.
Something
blows up in the kernel, it can affect _anything_.

I'm also not sure how much of that knowledge really translates. After
all, if
an organization is choosing C because it's the "safe" choice, what are
the
chances they'll use Git, or open development, or any of the other ways
the
Linux kernel is managed?

> Production, these days, is Just In Time. To stay with our steel
> example: Long before the local county got around to nodding your
> project through so that you can begin building, you already know what
> components you need, and when (since you want to be under budget,
> and on time, too), so you order 100 beams of several kinds of steel,

So what happens if they cancel your project?

> In a nutshell: being a day early, or even a month, doesn't pay off
> enough to make it worthwhile to restructure the whole company's
> production processes, just because J. Junior Developer found a way to
> shave a couple of seconds off of the DB query to send off ordering
> iron ore. ;)

Shaving a couple seconds off is beside the point. The question is
whether
there's some fundamental way in which the process can be improved --
something
which can be automated which actually costs a large amount of time, or
some
minor shift in process, or small amount of knowledge...

Another contrived example: Suppose financial records were kept as text
fields
and balanced by hand. The computer still helps, because you have all the
data
in one place, easily backed up, multiple people can be looking at the
same
data simultaneously, and every record is available to everyone who needs
it
instantly.

But as soon as you want to analyze any sort of financial trend, as soon
as you
want to mine that data in any meaningful way, you have a huge problem.
The
query running slowly because it's text is probably minor enough. The
problem
is that your data is mangled -- it's got points where there should be
commas,
commas where there should be points, typo after typo, plus a few
"creative"
entries like "a hundred dollars." None of these were issues before --
the
system did work, and had no bugs. But clearly, you want to at least
start
validating new data entered, even if you don't change how it's stored or
processed just yet.

In a modern system, adding a validation is a one-liner. Some places,
that
could take a week to go through the process. Some places, it could be
pushed
to production the same day. (And some places arguably don't have enough
process, and could see that one-liner in production thirty seconds after
someone thought of it.)

To retrofit that onto an ancient COBOL app could take a lot more work.

I don't know enough about steel to say whether it's relevant here, but I
have
to imagine that even here, there are opportunities to dramatically
improve
things. Given an opportunity to make the change, in choosing whether to
rewrite or not, I'd have to consider that this isn't likely to be the
last
change anyone ever makes.

The depressing thing is that in a modern corporation, this sort of
discussion
would be killed reflexively by that conservative-yet-short-term
mentality. A
rewrite may or may not be a sound investment down the road, but if it
costs
money and doesn't pay off pretty immediately, it's not worth the risk at
pretty much any level of the company. Not so much because it might not
pay off
ever, but more because investors will see you've cost the company money
(if
only in the short term) and want you gone.

> And an anecdote:
> A large-ish steel works corp introduced a PLC system to monitor their
> furnaces down to the centidegree Celsius, and the "recipe" down to the
> gram. After a week, they deactivated the stuff, since the steel
> produced wasn't up to spec, and the veteran cookers created much
> better steel, and cheaper.

Cool.

I can only wonder how well that works when the veteran cookers retire.
Does
that knowledge translate?

I've definitely learned something about steel today, though. Interesting
stuff. Also good to know what I want to avoid...

> > Most of them, for awhile.
> >
> > But even granddads have grandkids emailing them photos, so there goes
> > that 10 megs. Now they have to delete stuff, possibly download it and
> > then delete it. A grandkid hears them complaining and suggests switching
> > to Gmail.
>
> Except that AOhoo! upgraded their storage.

Which is kind of my point. Why did they upgrade? While it's true that
it's
relatively cheap, and they may also be monetizing their customer's data,
I
have to imagine at least part of the reason is that they were feeling
the
pressure from Gmail.

> And you'd be surprised
> how... stubborn non-techies can be.

Not terribly. I'm more surprised how stubborn techies can be.
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2010-11-29 11:53
>> And you'd be surprised
>> how... stubborn non-techies can be.
>
> Not terribly. I'm more surprised how stubborn techies can be.

IME the main problems are:

* Operational. You have a whole workforce trained up to use
mainframe-based system A; getting them all to change to working with new
system B can be expensive. This is in addition to their "business as
usual" work.

* Change resistance. If system B makes even minor aspects of life for
some of the users more difficult than it was before, those users will
complain very loudly.

* Functional. System A embodies in its code a whole load of knowledge
about business processes, some of which is probably obsolete, but much
is still current. It's probably either not documented, or there are
errors and omissions in the documentation. Re-implementing A as B needs
to reverse-engineer the behaviour *and* decide which is current and
which is obsolete, or else re-specify it from scratch.

And to be honest, over time new System B is likely to become as
undocumented and hard to maintain as System A was, unless you have a
highly skilled and strongly directed development team.

So, unless System B delivers some killer feature which could not instead
be implemented as new system C alongside existing system A, it's hard to
make a business case for reimplementing A as B.

The market ensures that IBM prices their mainframe solutions just at the
level where the potential cost saving of moving away from A is
outweighed by the development and rollout cost of B, for most users
(i.e. those who have not migrated away already)
1bc63d01bd3fcccc36fb030a62039352?d=identicon&s=25 David Masover (Guest)
on 2010-11-29 23:32
(Received via mailing list)
On Monday, November 29, 2010 04:53:48 am Brian Candler wrote:
> usual" work.
This is what I was talking about, mostly. I'm not even talking about
stuff
like switching to Linux or Dvorak, but I'm constantly surprised by
techies who
use IE because it's there and they can't be bothered to change, or C++
because
it's what they know and they don't want to learn a new language -- yet
they're
perfectly willing to learn a new framework, which is a lot more work.

> * Functional. System A embodies in its code a whole load of knowledge
> about business processes, some of which is probably obsolete, but much
> is still current. It's probably either not documented, or there are
> errors and omissions in the documentation. Re-implementing A as B needs
> to reverse-engineer the behaviour *and* decide which is current and
> which is obsolete, or else re-specify it from scratch.

This is probably the largest legitimate reason not to rewrite. In fact,
if
it's just a bad design on otherwise good technology, an iterative
approach is
slow, torturous, but safe.

> And to be honest, over time new System B is likely to become as
> undocumented and hard to maintain as System A was, unless you have a
> highly skilled and strongly directed development team.

Well, technologies _do_ improve. I'd much rather have an undocumented
and hard
to maintain Ruby script than C program any day, let alone COBOL.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-11-30 09:31
(Received via mailing list)
On Mon, Nov 29, 2010 at 10:38 AM, David Masover <ninja@slaphack.com>
wrote:
> Then why the fsck is he CTO of anything?
>
>> and probably doesn't care.
>
> This is the part I don't get.
> How do you get to be CTO by not caring about technology?

Because C-level execs working for any of the S&P 500 don't deal with
minutiae, and details. They set *policy*. Whether or not to even look
into the cloud services, if and how to centralize IT support, etc.

The CTO supports the CEO, and you hardly expect the CEO to be
well-versed with a tiny customer, either, would you?

Oh, and he's the fall guy in case the database gets deleted. :P

>> Together with the usual
>> bureaucratic infighting and processes to change *anything*, you'll be
>> SOL most of the time. Alas.
>
> Which is, again, a point I'd hope the free market would resolve. If there's a
> way to build a relatively large corporation without bureaucracy and process
> crippling actual progress, you'd think that'd be a competitive advantage.

There isn't. The bureaucratic overhead is a result of keeping a) a
distributed workforce on the same page, and b) to provde consistent
results, and c) to keep the business running even if the original
first five employees have long since quit.

It's why McD and BK can scale, but a Michelin star restaurant can't.

> And the reason is clear: Something blows up in a C program, it can affect
> anything else in that program, or any memory it's connected to. Something
> blows up in the kernel, it can affect _anything_.
>
> I'm also not sure how much of that knowledge really translates. After all, if
> an organization is choosing C because it's the "safe" choice, what are the
> chances they'll use Git, or open development, or any of the other ways the
> Linux kernel is managed?

None to zero. But C is older than Linux or Git, too. It's around for
quite a few years now, and well understood.

>> Production, these days, is Just In Time. To stay with our steel
>> example: Long before the local county got around to nodding your
>> project through so that you can begin building, you already know what
>> components you need, and when (since you want to be under budget,
>> and on time, too), so you order 100 beams of several kinds of steel,
>
> So what happens if they cancel your project?

At that late a stage, a project doesn't get canceled anymore. It can
be postponed, or paused, but it rarely gets canceled.

You don't order a power plant or a skyscraper on a whim, but because
it is something that is *necessary*.

And the postponing (or cancelling, as rarely as it happens), has
extreme repercussions. But that's why there's breach of contract fees
and such included, to cover the work already done.


> Shaving a couple seconds off is beside the point. The question is whether
> there's some fundamental way in which the process can be improved -- something
> which can be automated which actually costs a large amount of time, or some
> minor shift in process, or small amount of knowledge...

That assumes that anything *can* be optimized. Considering the
accounting standards and practices that are needed, the ISO
certification for ISO 900x, etc. There is little in the way of
optimizing the actual processes of selling goods. Keep in mind, that
IT isn't he lifeblood of any non-IT corporation, but a means to an
end.

> commas where there should be points, typo after typo, plus a few "creative"
>
> To retrofit that onto an ancient COBOL app could take a lot more work.

Why do you think the Waterfall Process was invented? Or IT processes
in the first place? To discover and deliver the features required.

That's also why new software generally is preferred to change existing
software: It's easier to implement changes that way, and to plug into
the ERP systems that already exist.

> I don't know enough about steel to say whether it's relevant here, but I have
> to imagine that even here, there are opportunities to dramatically improve
> things. Given an opportunity to make the change, in choosing whether to
> rewrite or not, I'd have to consider that this isn't likely to be the last
> change anyone ever makes.

If a steel cooker goes down, it takes 24 to 48 hours to get it going
again. It takes about a week for the ore to smelt, and to produce
iron. Adding in carbon to create steel makes this process take even
longer.

So, what'd be the point of improving a detail, when it doesn't speed
up the whole process *significantly*?

> The depressing thing is that in a modern corporation, this sort of discussion
> would be killed reflexively by that conservative-yet-short-term mentality. A
> rewrite may or may not be a sound investment down the road, but if it costs
> money and doesn't pay off pretty immediately, it's not worth the risk at
> pretty much any level of the company. Not so much because it might not pay off
> ever, but more because investors will see you've cost the company money (if
> only in the short term) and want you gone.

Agreed.


> I can only wonder how well that works when the veteran cookers retire. Does
> that knowledge translate?

Yup. Their subordinates acquire the knowledge. That's how trades are
taught in Europe (in general): In a master-apprentice system, where an
accomplished tradesman teaches their apprentice what they know (used
to be that a freshly minted "Geselle", as we call non-Masters,
non-apprentices in Germany, went on a long walk through Europe, to
acquire new and refine their skills, before settling down and have
their own apprentices; that's how the French style of Cathedral
building came to England, for example.)

> I've definitely learned something about steel today, though. Interesting
> stuff. Also good to know what I want to avoid...

Just take what I say with a grain of salt. The closest I got to an
iron smelter was being 500 yards away from one when it had to do an
emergency shutdown because the power cable broke.

> Which is kind of my point. Why did they upgrade? While it's true that it's
> relatively cheap, and they may also be monetizing their customer's data, I
> have to imagine at least part of the reason is that they were feeling the
> pressure from Gmail.

Absolutely! GMail did lots of good at reinvograting a stagnat market.

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2010-11-30 13:36
David Masover wrote in post #964917:
>> And to be honest, over time new System B is likely to become as
>> undocumented and hard to maintain as System A was, unless you have a
>> highly skilled and strongly directed development team.
>
> Well, technologies _do_ improve. I'd much rather have an undocumented
> and hard
> to maintain Ruby script than C program any day, let alone COBOL.

But rewriting COBOL in Perl may be a bad idea :-) Even rewriting it in
Ruby may be a bad idea if the programmers concerned don't have a lot of
experience of Ruby.

(There is some truly awful legacy Ruby code that I have to look at from
time to time. It is jammed full of @@class variables, has no tests, and
the logic is a tortuous maze. We're aiming to retire the whole system)
1bc63d01bd3fcccc36fb030a62039352?d=identicon&s=25 David Masover (Guest)
on 2010-12-01 02:40
(Received via mailing list)
On Tuesday, November 30, 2010 02:31:29 am Phillip Gawlowski wrote:
> into the cloud services, if and how to centralize IT support, etc.
To do that effectively would require some understanding of these,
however. In
particular, "cloud" has several meanings, some of which might make
perfect
sense, and some of which might be dropped on the floor.

> The CTO supports the CEO, and you hardly expect the CEO to be
> well-versed with a tiny customer, either, would you?

I'd expect the CEO to know and care at least about management, and
hopefully
marketing and the company itself.

> Oh, and he's the fall guy in case the database gets deleted. :P

Ideally, the person who actually caused the database to get deleted
would be
responsible -- though management should also bear some responsibility.

> distributed workforce on the same page,
Yet Google seems to manage with less than half the, erm, org-chart-depth
that
Microsoft has. Clearly, there's massive room for improvement.

> b) to provde consistent
> results,

This almost makes sense.

> c) to keep the business running even if the original
> first five employees have long since quit.

This really doesn't. How does _bureaucracy_ ensure that more than, say,
the
apprenticeship you described in the steel industry?

>
> You don't order a power plant or a skyscraper on a whim, but because
> it is something that is *necessary*.

Nothing's stopping you from switching contractors, or switching to a
different
approach entirely -- there's more than one way to get power.

> And the postponing (or cancelling, as rarely as it happens), has
> extreme repercussions. But that's why there's breach of contract fees
> and such included, to cover the work already done.

Then what's the point of the "final approval" that you're waiting for?

> end.
That seems to be true almost by definition, but major improvements in IT
do
affect non-IT companies. Shipping companies and airlines benefit from
improved
ways to find routes, track packages or flights, and adapt quickly to
changing
conditions (like weather). Supermarkets and retail outlets benefit from
improved ways to manage inventory -- to track it, anticipate spikes and
problems, and react to things like a late shipment.

It may be that all the important problems in these areas are solved, but
again, it seems risky to assume that.

> > But as soon as you want to analyze any sort of financial trend, as soon
> > as you want to mine that data in any meaningful way, you have a huge
> > problem....
>
> Why do you think the Waterfall Process was invented? Or IT processes
> in the first place? To discover and deliver the features required.

The point of this example is that you don't necessarily know up front
what the
"requirements" are. It's not required that you be able to perform such
analysis, and it might not have been feasible when the original program
was
written, but it's certainly valuable today.
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-12-01 15:46
(Received via mailing list)
On Wed, Dec 1, 2010 at 2:40 AM, David Masover <ninja@slaphack.com>
wrote:
>
> To do that effectively would require some understanding of these, however. In
> particular, "cloud" has several meanings, some of which might make perfect
> sense, and some of which might be dropped on the floor.

Ideally, yes. However, I contend that on the C-level with the size of
corporations we are talking about, issues become very abstract, and
management reads the abstract of reports, much like the Joint Chiefs
of Staff don't deal with the After Action Reports of a platoon, but
with the state of the whole theater of engagement. Those require
different skills and a different thinking (more big picture vs
details).

>> The CTO supports the CEO, and you hardly expect the CEO to be
>> well-versed with a tiny customer, either, would you?
>
> I'd expect the CEO to know and care at least about management, and hopefully
> marketing and the company itself.

Absolutely. But it's the big picture, rather than the performance of a
single salesperson / developer.

>> >> Together with the usual
>> There isn't. The bureaucratic overhead is a result of keeping a) a
>> distributed workforce on the same page,
>
> Yet Google seems to manage with less than half the, erm, org-chart-depth that
> Microsoft has. Clearly, there's massive room for improvement.

That's neither here nor there. You can debate endlessly how much
bureaucracy is needed, but it is quite clear that some bureaucracy is
needed.

>> c) to keep the business running even if the original
>> first five employees have long since quit.
>
> This really doesn't. How does _bureaucracy_ ensure that more than, say, the
> apprenticeship you described in the steel industry?

Because it enforces a uniformity of process. An apprenticeship in a
trade teaches said *trade*, but not management skills, or sales, or a
host of other things that are necessary to run a company.

> Nothing's stopping you from switching contractors, or switching to a different
> approach entirely -- there's more than one way to get power.

That has been decided before the first contractor is hired. Consider
the cost of even planning a power plant or a skyscraper. That's a lot
of sunk cost if you switch horses mid race.

>> And the postponing (or cancelling, as rarely as it happens), has
>> extreme repercussions. But that's why there's breach of contract fees
>> and such included, to cover the work already done.
>
> Then what's the point of the "final approval" that you're waiting for?

When the plans for the product are agreed upon. A project can fail
before that, but not after the bid for contracts has ended, at least
not easily, and certainly not cheap.

> That seems to be true almost by definition, but major improvements in IT do
> affect non-IT companies. Shipping companies and airlines benefit from improved
> ways to find routes, track packages or flights, and adapt quickly to changing
> conditions (like weather). Supermarkets and retail outlets benefit from
> improved ways to manage inventory -- to track it, anticipate spikes and
> problems, and react to things like a late shipment.

Absolutely. The microprocessor revolution *was* a revolution, and
VisiCalc changed the way the game was played, just as DTP reduced the
cost, and made it easier, to create marketing brochures, for example.

> It may be that all the important problems in these areas are solved, but
> again, it seems risky to assume that.

At this point, change is more incremental and evolutionary, than
revolutionary. Of course, something can change that, but it's not
something I'd bet on (or against) to happen, and I certainly wouldn't
base a business plan off of that.

> The point of this example is that you don't necessarily know up front what the
> "requirements" are. It's not required that you be able to perform such
> analysis, and it might not have been feasible when the original program was
> written, but it's certainly valuable today.

Most large IT projects fail because requirements are unclear, or
change. XP, agile, and scrum want to change that, but that doesn't
necessarily happen.

You have to keep in mind that the problem domain is well understood by
the stakeholders, and well enough to formulate processes, and to run a
business. Ancillaries might change (like how accounting standards, if,
how, and when payroll taxes are collected, and so on), but in general,
these sorts of problems can be anticipated, and planned against. It's
not rare that, say, GM would switch health insurance providers, or had
to deal with different health insurance providers, so you keep your
software flexible enough to manage these changes. Production costs can
change, as can the parts required to produce, but, to put it bluntly,
only the variables that you plug into the equations to know how many X
you need to produce Y change.

Determining what a production run would cost, and what you need to
deal with isn't that difficult (it comes down to multiplying matrices
with each other, more or less), and thus you know how many units you
can sell at which price.

Computers make this easier, but they didn't change the essentials of
business economics.


Similar with an airline that has to deal with weather events: Weather
always existed, and pilots deal with it. Routing is, oddly enough,
less important: Air traffic travels in pre-determined traffic lanes,
to minimize the risk, and to minimize the required personnel to
control this traffic.

And while these can change (IIRC, New York is reworking its approach
and departure vectors for the NY airports), it takes a long time to
arrive, and then implement these changes. There's plenty of time to
adjust, and it's not, in a sense, business critical for United
Airlines from where the planes reach their destination. What is
important is that more planes can put into the same space without
increasing the risk, but that's not something new, either (TWA and
PanAm were locked in a furious war over trans-Atlantic routes after
WW2, for example).

Where things get absolutely tricky, is when you have to integrate
business processes of different corporations, after a merger or an
acquisition, but that's something that takes years, too (like Google's
acquisition of FeedBurner or YouTube, or AOL buying TimeWarner, or
Thyssen and Krupp merging).

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
1bc63d01bd3fcccc36fb030a62039352?d=identicon&s=25 David Masover (Guest)
on 2010-12-01 23:42
(Received via mailing list)
On Wednesday, December 01, 2010 08:44:28 am Phillip Gawlowski wrote:
> with the state of the whole theater of engagement. Those require
> different skills and a different thinking (more big picture vs
> details).

It's a good analogy -- I would hope the Joint Chiefs would:

 - Be actual generals with actual experience as colonels, majors,
captains,
etc -- so they have some concept of what's actually going on at every
level.

 - Keep abreast of improvements in existing technology -- if your rifles
can
suddenly shoot twice as far, that changes things considerably.

 - Keep abreast of entirely new directions -- UAVs could, again, change
strategies considerably.

If I were to present them with a better weapon, better form of armor,
etc, I'd
hope they wouldn't just decide not to care. Maybe I'd have to talk to a
subordinate first, but you'd hope bureaucracy wouldn't prevent new tech
from
getting on the battlefield at all.

> needed.
My original point was that I'd hope there's a way to build a relatively
large
corporation without bureaucracy and process _crippling_ actual progress.
I'm
not claiming that all bureaucracy or process cripples progress or is
unnecessary, or that none of it carries any overhead, but certainly
there's a
difference between a mostly self-organizing, motivated workforce and one
with
two managers for every three developers.

Having a manager at all helps things, and I'm much more effective with
someone
to report to setting direction. Having so many managers that Office
Space's
five bosses becomes a reality is much more of a hindrance than having no
managers at all.

> >> c) to keep the business running even if the original
> >> first five employees have long since quit.
> >
> > This really doesn't. How does _bureaucracy_ ensure that more than, say,
> > the apprenticeship you described in the steel industry?
>
> Because it enforces a uniformity of process.

Good or bad, I'm still not seeing the connection to c above.

> An apprenticeship in a
> trade teaches said *trade*, but not management skills, or sales, or a
> host of other things that are necessary to run a company.

Wait, why wouldn't an apprenticeship work in management, sales, or any
of the
other fields necessary to have a company? It seems to me that this would
be a
_better_ way to teach someone to be an effective CEO than to add so much
process as to automate the position away.

> > It may be that all the important problems in these areas are solved, but
> > again, it seems risky to assume that.
>
> At this point, change is more incremental and evolutionary, than
> revolutionary. Of course, something can change that, but it's not
> something I'd bet on (or against) to happen, and I certainly wouldn't
> base a business plan off of that.

But if the cost of incremental change is sufficiently high, you're not
evolving, you're stagnating.

And maybe I'm naive, but I keep seeing small changes that have a large
impact,
especially at scale. They don't happen all the time, but they happen
often
enough that if you plan to be around for the next 50 years, you're going
to
see a few of them.

> Similar with an airline that has to deal with weather events: Weather
> always existed, and pilots deal with it. Routing is, oddly enough,
> less important: Air traffic travels in pre-determined traffic lanes,
> to minimize the risk, and to minimize the required personnel to
> control this traffic.

Given the same lanes, however, it would be useful to be able to divert
either
planes or passengers around a weather problem. If there's a thunderstorm
in
your hub, do you delay every flight through there until it's gone, or do
you
redirect people to a different hub?

Weather always existed, but that kind of thing wasn't always practical.
(I'm
not sure if it is now.)
Ac0085dae0703db56ad7f8cb9e1798ba?d=identicon&s=25 Phillip Gawlowski (Guest)
on 2010-12-02 00:12
(Received via mailing list)
On Wed, Dec 1, 2010 at 11:42 PM, David Masover <ninja@slaphack.com>
wrote:
>
> If I were to present them with a better weapon, better form of armor, etc, I'd
> hope they wouldn't just decide not to care. Maybe I'd have to talk to a
> subordinate first, but you'd hope bureaucracy wouldn't prevent new tech from
> getting on the battlefield at all.

Since the Chiefs would put out a request for bids, and do try outs,
technological improvement isn't all that important to the Staff
itself. As I said, it deals with the big picture.

Subordinate specialists (like purchasing, and Training & Doctrine
Command) take care of the details.

What the chiefs get to do, though, is point the military (or a
corporation, if our JCOS were C-Level executives) into a future
direction.

More or less: What will be future markets, and how to exploit them.
What'll be possible technologies that could be used? Stuff more like
that rather than what dynamic typing is.

>> >> c) to keep the business running even if the original
>> >> first five employees have long since quit.
>> >
>> > This really doesn't. How does _bureaucracy_ ensure that more than, say,
>> > the apprenticeship you described in the steel industry?
>>
>> Because it enforces a uniformity of process.
>
> Good or bad, I'm still not seeing the connection to c above.

To rephrase yet again: The process a bureaucracy imposes (I include
"Process Diagrams" into bureaucracy), enables people who aren't
intimately familiar with the minutiae of the corporation to make
decisions, and to route events the correct way.

Ideally.

>> An apprenticeship in a
>> trade teaches said *trade*, but not management skills, or sales, or a
>> host of other things that are necessary to run a company.
>
> Wait, why wouldn't an apprenticeship work in management, sales, or any of the
> other fields necessary to have a company? It seems to me that this would be a
> _better_ way to teach someone to be an effective CEO than to add so much
> process as to automate the position away.

An apprenticeship in management teaches management, yes. But that
comes at the cost of not having a trade. Thanks to division of labor,
however, the focus of training is somewhere else. A sales trainee
can't be expected to learn about, say, welding, electronics,
programming, or steel cooking, just to be able to write a sales
report, agreed?

> But if the cost of incremental change is sufficiently high, you're not
> evolving, you're stagnating.

And corporation that can make these changes will knock you out of the
market, yes.

> And maybe I'm naive, but I keep seeing small changes that have a large impact,
> especially at scale. They don't happen all the time, but they happen often
> enough that if you plan to be around for the next 50 years, you're going to
> see a few of them.

Oh, these changes certainly are there. Question is if it is worth it
to upgrade your datawarehousing to the next Oracle version, just
because it is a little better at doing OLAP.

I'll elaborate below.

> Given the same lanes, however, it would be useful to be able to divert either
> planes or passengers around a weather problem. If there's a thunderstorm in
> your hub, do you delay every flight through there until it's gone, or do you
> redirect people to a different hub?

Yes, you delay the flights, except in a few, special circumstances.

It's a question of risk. A thunderstorm over an airport is risky for
a) ground personnel who are exposed on flat tarmac, b) plane
electronics (a spike of 116V can fry the electronics enough that the
plane has to be towed into a hanger to be rebooted; the usual power
supply is 115V), c) a lightning strike on a plane in the last phase of
descent kills 70 to 500 people, same with take off, d) not every
airport is equipped with ILS, which enables planes to land even if the
pilots don't see anything, e) diversion isn't always an option due to
fuel reserves on a plane, and traffic jams (If you ever flew via
O'Hare or ATL, or FFM you know what I mean). And, since airports are
quite a ways from each other, shuttling passengers from one hub to
another is highly impractical.


What is being done are trade offs: Is the benefit of action A high
enough to offset risk R, and what is the opportunity cost compared to
action B, with risk Q?

> Weather always existed, but that kind of thing wasn't always practical. (I'm
> not sure if it is now.)

Fortunately, planes travel too high to have to care about clouds and
their weather effects (once they passed through the cloud levels,
anyway). :)

Though, I suggest taking this off-list by now. I'm sure we are boring
everybody to death for a long, long time now. :)

--
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.