Ruby 1.8 vs 1.9

Dobai-Pataky_BSSSSl · November 23, 2010, 4:51pm

Hi,

how much longer will Ruby 1.8(.7) be maintained ? Is it advisable to
dive into 1.9(.2) ? What are the immediate advantages of using 1.9
over 1.8 ?

Thanks,
…
Pete Pincus

Peter_Pincus · November 23, 2010, 10:26pm

On Nov 23, 2010, at 9:43 AM, Peter Pincus wrote:

Hi,

how much longer will Ruby 1.8(.7) be maintained ? Is it advisable to
dive into 1.9(.2) ? What are the immediate advantages of using 1.9
over 1.8 ?

I believe the guys at EngineYard are in charge of backporting fixes to
the 1.8.7 branch. I also heard there was a 1.8.8 coming at some point to
be the final release in the 1.8 series.

I use 1.9.2p0 daily and find it to be extremely stable and fast. I would
say the biggest reason to use it is to get a performance boost. Most of
your code from 1.8 will “just work.” My code sees a 2-5x speedup on
1.9.2 versus 1.8.7.

Why not try it on your code and see for yourself? With tools like rvm
(for unix) and pik (for windows) it’s a breeze to have multiple rubies
installed simultaneously.

cr

Peter_Pincus · November 23, 2010, 11:24pm

On Nov 23, 2010, at 13:25 , Chuck R. wrote:

I use 1.9.2p0 daily and find it to be extremely stable and fast. I would say the
biggest reason to use it is to get a performance boost. Most of your code from 1.8
will “just work.” My code sees a 2-5x speedup on 1.9.2 versus 1.8.7.

That’s really variable and depends on what you’re doing.

All of my text processing code needed reworking, and text processing is
(was?) noticeably slower in ruby 1.9 than it is in 1.8.

Peter_Pincus · November 24, 2010, 12:45am

On 2010-11-24 09:23, Ryan D. wrote:

All of my text processing code needed reworking, and text processing
is (was?) noticeably slower in ruby 1.9 than it is in 1.8.

Who do I talk to get 1.9 RPMs produced for Fedora?

Thanks,

Phil.

Philip R.

GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]

Peter_Pincus · November 24, 2010, 1:30am

On Wed, Nov 24, 2010 at 12:44 AM, Philip R. [email protected]
wrote:

Who do I talk to get 1.9 RPMs produced for Fedora?

Just a guess: The Ruby (or Programming/Script language) maintainers of
the Fedora project.

–
Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

Peter_Pincus · November 24, 2010, 1:14am

On Nov 23, 2010, at 15:44 , Philip R. wrote:

Who do I talk to get 1.9 RPMs produced for Fedora?

Beats me.

Peter_Pincus · November 24, 2010, 5:44am

On Nov 23, 2010, at 4:23 PM, Ryan D. wrote:

On Nov 23, 2010, at 13:25 , Chuck R. wrote:

I use 1.9.2p0 daily and find it to be extremely stable and fast. I would say
the biggest reason to use it is to get a performance boost. Most of your code from
1.8 will “just work.” My code sees a 2-5x speedup on 1.9.2 versus 1.8.7.

That’s really variable and depends on what you’re doing.

All of my text processing code needed reworking, and text processing is (was?)
noticeably slower in ruby 1.9 than it is in 1.8.

Definitely true. That’s why I was careful to say “My code sees a 2-5x
speedup…” because I have seen a few instances where 1.9 is a tad
pokier. But clearly 1.9 is the future so sticking with 1.8 seems like a
bad long-term bet.

cr

Peter_Pincus · November 24, 2010, 1:58am

On Wed, Nov 24, 2010 at 09:13:04AM +0900, Ryan D. wrote:

On Nov 23, 2010, at 15:44 , Philip R. wrote:

Who do I talk to get 1.9 RPMs produced for Fedora?

Beats me.

My uncle Carl is really good at IT.

Peter_Pincus · November 24, 2010, 11:35am

On Nov 24, 10:44am, Philip R. [email protected] wrote:

–
Philip R.

GPO Box 3411
Sydney NSW 2001
Australia
E-mail: [email protected]

wiki is a start for fedora they have ruby packages in the
repositories.

http://fedoraproject.org/wiki/Features/Ruby_1.9.1

or build it…simple on linux here is a guide for fedora

Peter_Pincus · November 24, 2010, 1:31pm

Brian C. wrote:

And just to give some balance: the biggest reason not to use 1.9 is
because of the incredible complexity which has been added to the
String class, and the ability it gives you to make programs which
crash under unexpected circumstances.

Sounds great. Can somebody else confirm this?

Regards
Oli

Peter_Pincus · November 24, 2010, 12:14pm

Chuck R. wrote in post #963430:

I use 1.9.2p0 daily and find it to be extremely stable and fast. I would
say the biggest reason to use it is to get a performance boost. Most of
your code from 1.8 will “just work.” My code sees a 2-5x speedup on
1.9.2 versus 1.8.7.

And just to give some balance: the biggest reason not to use 1.9 is
because of the incredible complexity which has been added to the String
class, and the ability it gives you to make programs which crash under
unexpected circumstances.

For example, an expression like

s1 = s2 + s3

where s2 and s3 are both Strings will always work and do the obvious
thing in 1.8, but in 1.9 it may raise an exception. Whether it does
depends not only on the encodings of s2 and s3 at that point, but also
their contents (properties “empty?” and “ascii_only?”)

The encodings of strings you read may also be affected by the locale set
from the environment, unless you explicitly code against that. This
means the same program with the same data may work on your machine, but
crash on someone else’s.

github.com

candlerb/string19/blob/master/string19.rb

#!/usr/bin/env ruby19
# encoding: UTF-8
# This document is Copyright (C) Brian Candler 2009 and released under a
# Creative Commons Attribution-NonCommercial 3.0 Unported License.

############# CONTENTS ###################

# -1. PREAMBLE
#  0. INTRODUCTION
#  1. ENCODINGS
#  2. PROPERTIES OF ENCODINGS
#  3. STRING, FILE AND REGEXP ENCODINGS
#  4. VALID AND FIXED ENCODINGS
#  5. COMPATIBLE OBJECTS
#  6. STRING CONCATENATION
#  7. THE BINARY / ASCII-8BIT ENCODING
#  8. SINGLE CHARACTERS
#  9. EQUALITY AND COLLATION
# 10. HASH AND EQL?
# 11. UPPER AND LOWER CASE

This file has been truncated. show original

github.com

candlerb/string19/blob/master/soapbox.rb

=begin rant

More discussion, and examples of problems, at
* http://www.ruby-forum.com/topic/173380
* http://www.ruby-forum.com/topic/179303
* http://www.ruby-forum.com/topic/192218
* http://www.ruby-forum.com/topic/216873

For me, I absolutely hate all this encoding stuff in ruby 1.9, and I'll try
to explain why here.

* As a programmer, the most important thing for me is to be able to reason
  about the code I write.  Reasoning tells me whether the code I write is
  likely to run, terminate, and give the result I want.
  
  In ruby 1.8, if I write an expression like "s3 = s1 + s2", where s1 and s2
  are strings, this is easy because it's a one-dimensional space.
  
               s3     =     s1   +   s2

This file has been truncated. show original

Peter_Pincus · November 24, 2010, 2:00pm

On Wed, Nov 24, 2010 at 9:25 PM, Oliver Schad
[email protected] wrote:

Brian C. wrote:

And just to give some balance: the biggest reason not to use 1.9 is
because of the incredible complexity which has been added to the
String class, and the ability it gives you to make programs which
crash under unexpected circumstances.

Sounds great. Can somebody else confirm this?

iota ~ % echo ʘ | LC_ALL=ja_JP.UTF8 ruby -pe ‘$[1,0] = “ʘ”’
ʘʘ
iota ~ % echo ʘ | LC_ALL=C ruby -pe '$[1,0] = “ʘ”’
-e:1: invalid multibyte char (US-ASCII)
-e:1: invalid multibyte char (US-ASCII)

Peter_Pincus · November 24, 2010, 1:34pm

On Wed, Nov 24, 2010 at 8:14 PM, Brian C. [email protected]
wrote:

from the environment, unless you explicitly code against that. This
means the same program with the same data may work on your machine, but
crash on someone else’s.

And that’s why I use and love 1.9.
The obvious thing isn’t so obvious if you actually care about
encodings, and if you are mindful about what comes from where, it’s
actually helpful to find otherwise hidden issues.
I hear nobody complain that 1 / 0 raises but 1.0 / 0.0 gives Infinity,
which I find pretty counter-intuitive, and makes me check for .nan?
and .infinite? (which also fails if I call it on Fixnum instead of
Float).

string19/string19.rb at master · candlerb/string19 · GitHub
string19/soapbox.rb at master · candlerb/string19 · GitHub

Many valid complaints there, but nothing that would make me long for
the everything-is-a-string-of-bytes approach of 1.8, which made
working with encodings very brittle.
I can see how this is just annoying to someone who has only dealt with
BINARY/ASCII/UTF-8 all their lives, but please consider that most of
the world actually still uses other encodings as well.
I also want to thank you for writing string19.rb, which is a very
helpful resource for me and others, along with the series from JEG II.

Peter_Pincus · November 24, 2010, 4:15pm

Michael F. wrote in post #963539:

from the environment, unless you explicitly code against that. This
means the same program with the same data may work on your machine, but
crash on someone else’s.

And that’s why I use and love 1.9.
The obvious thing isn’t so obvious if you actually care about
encodings, and if you are mindful about what comes from where, it’s
actually helpful to find otherwise hidden issues.

Y’know, I wouldn’t mind so much if it always raised an exception.

For example, say I have s1 tagged UTF-8 and s2 tagged ISO-8859-1. If
“s1+s2” always raised an exception, it would be easy to find, and easy
to fix.

However the ‘compatibility’ rules mean that this is data-sensitive. In
many cases s1+s2 will work, if either s1 contains non-ASCII characters
but s2 doesn’t, or vice-versa. It’s really hard to get test coverage of
all the possible cases - rcov won’t help you - or you just cross your
fingers and hope.

You also need test coverage for cases where the input data is invalid
for the given encoding. In fact s1+s2 won’t raise an exception in that
case, nor will s1[i], but s1 =~ /./ will.

I hear nobody complain that 1 / 0 raises but 1.0 / 0.0 gives Infinity,

Well, IEEE floating point is a well-established standard that has been
around for donkeys years, so I think it’s reasonable to follow it.

And yes, if I see code like “c = a / b”, I do think to myself “what if b
is zero?” It’s easy to decide if it’s expected, and whether I need to do
something other than the default behaviour. Then I move onto the next
line.

For “s3 = s1 + s2” in 1.9 I need to think to myself: “what if s1 has a
different encoding to s2, and s1 is not empty or s2 is not empty and
s1’s encoding is not ASCII-compatible or s2’s encoding is not
ASCII-compatible or s1 contains non-ASCII characters or s2 contains
non-ASCII characters? And what does that give as the encoding for s3 in
all those possible cases?” And then I have to carry the possible
encodings for s3 forward to the next point where it is used.

Peter_Pincus · November 24, 2010, 3:26pm

Michael F. wrote:

iota ~ % echo ? | LC_ALL=ja_JP.UTF8 ruby -pe ‘$[1,0] = “?”’
??
iota ~ % echo ? | LC_ALL=C ruby -pe '$[1,0] = “?”’
-e:1: invalid multibyte char (US-ASCII)
-e:1: invalid multibyte char (US-ASCII)

So working with strings in ruby v1.9 is not supported, right?

Regards
Oli

Peter_Pincus · November 24, 2010, 5:07pm

On Nov 24, 2010, at 9:47 AM, Phillip G. wrote:

fingers and hope.

Convert your strings to UTF-8 at all times, and you are done. You have
to check for data integrity anyway, so you can do that in one go.

Thank you for being the voice of reason.

I’ve fought against Brian enough in the past over this issue, that I try
to stay out of it these days. However, his arguments always strike me
as wanting to unlearn what we have learned about encodings.

We can’t go back. Different encodings exist. At least Ruby 1.9 allows
us to work with them.

James Edward G. II

Peter_Pincus · November 24, 2010, 4:47pm

On Wed, Nov 24, 2010 at 4:15 PM, Brian C. [email protected]
wrote:

For example, say I have s1 tagged UTF-8 and s2 tagged ISO-8859-1. If
“s1+s2” always raised an exception, it would be easy to find, and easy
to fix.

However the ‘compatibility’ rules mean that this is data-sensitive. In
many cases s1+s2 will work, if either s1 contains non-ASCII characters
but s2 doesn’t, or vice-versa. It’s really hard to get test coverage of
all the possible cases - rcov won’t help you - or you just cross your
fingers and hope.

Convert your strings to UTF-8 at all times, and you are done. You have
to check for data integrity anyway, so you can do that in one go.

I hear nobody complain that 1 / 0 raises but 1.0 / 0.0 gives Infinity,

Well, IEEE floating point is a well-established standard that has been
around for donkeys years, so I think it’s reasonable to follow it.

Every natural number is an element of the set of rational numbers. For
all intents and purposes, 0 == 0.0 in mathematics (unless you limit
the set of numbers you are working on to natural numbers only, and
let’s just ignore irrational numbers for now). And since the 0 is
around for a bit longer than the IEEE, and the rules of math are
taught in elementary school (including “you must not and cannot divide
by zero”), Ruby exhibits inconsistent behavior for pretty much anyone
who has a little education in maths. The IEEE standards deal with
representing floating point numbers in an inherently integer-based
numerical system, but they don’t supersede the rules of maths.

Ruby’s behavior of returning infinity is the proverbial icing on the
cake, since dividing something large by something infinitely small
results in something large (so, x / 0.000000…[ad infinitum]…1 = x
; a trick used in integrals, too).

Thus, you have to exercise due diligence in this area if you want to
keep your results in the sphere of what’s possible and sane.

And yes, if I see code like “c = a / b”, I do think to myself “what if b
is zero?” It’s easy to decide if it’s expected, and whether I need to do
something other than the default behaviour. Then I move onto the next
line.

It’s easy? Take a look at integrals, and infinitesimal[0] numbers.
Infinitesimal are at the same time zero and not zero.

For “s3 = s1 + s2” in 1.9 I need to think to myself: “what if s1 has a
different encoding to s2, and s1 is not empty or s2 is not empty and
s1’s encoding is not ASCII-compatible or s2’s encoding is not
ASCII-compatible or s1 contains non-ASCII characters or s2 contains
non-ASCII characters? And what does that give as the encoding for s3 in
all those possible cases?” And then I have to carry the possible
encodings for s3 forward to the next point where it is used.

Then, as I suggested above, enforce a standard encoding in your code.
Convert everything into UTF-8, and you are pretty much done.

[0] Infinitesimal - Wikipedia

Phillip G.

Though the folk I have met,
(Ah, how soon!) they forget
When I’ve moved on to some other place,
There may be one or two,
When I’ve played and passed through,
Who’ll remember my song or my face.

Peter_Pincus · November 24, 2010, 5:56pm

[snipped lots of arguments about string encodings that may or may not be
relevant to the OP]

So… I am wondering if the original poster (Peter Pincus) has tried his
code under 1.9 yet.

Peter?

cr

Peter_Pincus · November 24, 2010, 7:13pm

On Wednesday, November 24, 2010 10:09:13 am Brian C. wrote:

If you don’t like IEEE floating point, Ruby also offers BigDecimal and
Rational.

And if you don’t like Ruby’s strings, there’s nothing stopping you from
rolling your own. There’s certainly nothing stopping you from using
binary
mode (whether it claims to be ASCII or not) for all strings.

Peter_Pincus · November 24, 2010, 5:09pm

Phillip G. wrote in post #963602:

Convert your strings to UTF-8 at all times, and you are done.

But that basically is my point. In order to make your program
comprehensible, you have to add extra incantations so that strings are
tagged as UTF-8 everywhere (e.g. when opening files).

However this in turn adds nothing to your program or its logic, apart
from preventing Ruby from raising exceptions.

Well, IEEE floating point is a well-established standard that has been
around for donkeys years, so I think it’s reasonable to follow it.

Every natural number is an element of the set of rational numbers. For
all intents and purposes, 0 == 0.0 in mathematics (unless you limit
the set of numbers you are working on to natural numbers only, and
let’s just ignore irrational numbers for now). And since the 0 is
around for a bit longer than the IEEE, and the rules of math are
taught in elementary school (including “you must not and cannot divide
by zero”), Ruby exhibits inconsistent behavior for pretty much anyone
who has a little education in maths.

Maths and computation are not the same thing. Is there anything in the
above which applies only to Ruby and not to floating point computation
in another other mainstream programming language?

Yes, there are gotchas in floating point computation, as explained at
http://docs.sun.com/source/806-3568/ncg_goldberg.html
These are (or should be) well understood by programmers who feel they
need to use floating point numbers.

If you don’t like IEEE floating point, Ruby also offers BigDecimal and
Rational.

If Ruby were to implement floating point following some different set of
rules other than IEEE, that would be (IMO) horrendous. The point of a
standard is that you only have to learn the gotchas once.