Bizarre Range behavior

Hi,

In message “Re: Bizarre Range behavior”
on Thu, 6 Aug 2009 00:35:41 +0900, Rob B.
[email protected] writes:

|I’d actually prefer less magic here. (And I hope that your second
|example with “2”…“20” wouldn’t include “20” since the range excludes
|its end.)

May bad for the error in the example. It wouldn’t include “20” in the
actual code.

|The surprising aspect is that the Range#to_a can give an array that
|has just the #begin. String#upto perhaps needs more magic to know that
|#succ will eventually work (“9”.upto(“10”) as well as
|“cat”.upto(“bird”))

We have to define the “magic” behavior first, but I love magic if we
can make specific definition.

          matz.

On Wed, Aug 5, 2009 at 12:12 PM, Yukihiro M.[email protected]
wrote:

actual code.

|The surprising aspect is that the Range#to_a can give an array that
|has just the #begin. String#upto perhaps needs more magic to know that
|#succ will eventually work (“9”.upto(“10”) as well as
|“cat”.upto(“bird”))

We have to define the “magic” behavior first, but I love magic if we
can make specific definition.

This particular magic I don’t like, since it would treat strings as
strings sometimes and numbers otherwise. I’m in agreement with David
A. Black here.

If anything should be changed here, IMHO, it would be addressing Rob’s
surprise (which I share) that

("100…“11”) produces [“100”] instead of [“100”, “101”, “102”, “103”,
“104”, “105”, “106”, “107”, “108”, “109”]

i.e. I’d expect it to act as if it used the succ and comparison methods.


Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

Hi,

In message “Re: Bizarre Range behavior”
on Wed, 5 Aug 2009 23:29:08 +0900, “David A. Black”
[email protected] writes:

|If you make this change, how would you then accomplish the old
|version? In other words, if you wanted:
|
| “2”…“19”
|
|to obey ASCII/character code logic, would there still be a way?

The point is the current behavior is not really obeying
ASCII/character code logic. It’s a half-cooked magic (comparison done
by dictionary order, but increment done by numerical order), so if we
can come up with the better logic, we can override, I think. The
current logic is too weird so that I believe none uses it in the real
code.

Since no one is using this logic, it doesn’t matter if we change it,
or keep it if compatibility matters most. I prefer moving forward.

          matz.

On Wed, Aug 5, 2009 at 12:21 PM, Yukihiro M.[email protected]
wrote:

|to obey ASCII/character code logic, would there still be a way?

The point is the current behavior is not really obeying
ASCII/character code logic. It’s a half-cooked magic (comparison done
by dictionary order, but increment done by numerical order), so if we
can come up with the better logic, we can override, I think. The
current logic is too weird so that I believe none uses it in the real
code.

Since no one is using this logic, it doesn’t matter if we change it,
or keep it if compatibility matters most. I prefer moving forward.

Well my vote is to cook it to use the methods of the endpoints rather
than doing something special IF the endpoints are strings and both
happen to be numerical. That seems as half cooked to me as the
current situation.

And if you follow the “if they’re numerical” path what do you do about
things like

(“0x32”…“0xFE”).to_a
(“032”…“0100”).to_a
(“032”…“0x32”).to_a

I say let strings be strings and numbers be numbers!


Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

Hi,

In message “Re: Bizarre Range behavior”
on Thu, 6 Aug 2009 01:44:56 +0900, James C.
[email protected] writes:

|Quite. I think consistency is important, and this is currently broken in
|1.9:
|
|$ irb1.8
|>> (‘!’…‘]’).to_a
|=> [“!”, “"”, “#”, “$”, “%”, “&”, “'”, “(”, “)”, ““, “+”, “,”, “-”, “.”,
|”/“, “0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9”]
|>> ‘9’.succ
|=> “10”
|
|$ irb1.9
|>> (‘!’…‘]’).to_a
|=> [”!", “"”, “#”, “$”, “%”, “&”, “'”, “(”, “)”, "
”, “+”, “,”, “-”, “.”,
|“/”, “0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9”, “:”, “;”, “<”, “=”,
|“>”, “?”, “@”, “A”, “B”, “C”, “D”, “E”, “F”, “G”, “H”, “I”, “J”, “K”, “L”,
|“M”, “N”, “O”, “P”, “Q”, “R”, “S”, “T”, “U”, “V”, “W”, “X”, “Y”, “Z”, “[”,
|“\”, “]”]
|>> ‘9’.succ
|=> “10”

|There is obviously room for debate over the above examples: the 1.9 one
|could be viewed as more complete even though it doesn’t apply #succ
|consistently. A lot of these odd examples are edge cases that are unlikely
|to come up in production code, but personally I find the rules used by 1.8
|easier to understand. If I define my own class, Range will use its #succ and
|#<=> methods to carry out its logic and I expect the same for built-in
|classes.

I admit 1.8 behavior in above example is simpler to implement, but 1.9
behavior follows our (or more precisely “my”) unwritten expectation of
character iteration. Simple #upto (a la 1.8) can act as weirdly,
since #succ don’t know about the edge values. I want to make #upto
work smart. I know it’s magical, but #upto (and Range#each) already
contains magic. So I consider the weird (even unexpected for OP)
behavior is a result from less magic. I am not sure the new behavior
should be back-ported to 1.8 or not.

          matz.

Hi,

In message “Re: Bizarre Range behavior”
on Thu, 6 Aug 2009 01:30:57 +0900, Rick DeNatale
[email protected] writes:

|Well my vote is to cook it to use the methods of the endpoints rather
|than doing something special IF the endpoints are strings and both
|happen to be numerical. That seems as half cooked to me as the
|current situation.

Well, since it’s a magic, we need to draw some line according to
the trade-offs.

|And if you follow the “if they’re numerical” path what do you do about
|things like
|
|(“0x32”…“0xFE”).to_a
|(“032”…“0100”).to_a
|(“032”…“0x32”).to_a

Since #succ does not treat 0x32 etc. specially, I don’t think we need
to consider them as numerical.

I am not going to add magic anywhere but it has already existed.

          matz.

On Sun, Aug 16, 2009 at 6:20 PM, Yukihiro M.[email protected]
wrote:

I admit 1.8 behavior in above example is simpler to implement, but 1.9
behavior follows our (or more precisely “my”) unwritten expectation of
character iteration.

Matz, FWIW, and I don’t think I’m alone, my expectations are based on
Ruby being object oriented meaning, for example, that things related
to iteration/enumeration should be understandable based on messages
being sent to the elements.

Any magic which can’t be explained in these terms ‘astonishes’ me.


Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

2009/8/5 Rick DeNatale [email protected]

|

(“0x32”…“0xFE”).to_a
(“032”…“0100”).to_a
(“032”…“0x32”).to_a

I say let strings be strings and numbers be numbers!

Quite. I think consistency is important, and this is currently broken in
1.9:

$ irb1.8

(‘!’…‘]’).to_a
=> [“!”, “"”, “#”, “$”, “%”, “&”, “'”, “(”, “)”, “*”, “+”, “,”, “-”,
“.”,
“/”, “0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9”]
‘9’.succ
=> “10”

$ irb1.9

(‘!’…‘]’).to_a
=> [“!”, “"”, “#”, “$”, “%”, “&”, “'”, “(”, “)”, “*”, “+”, “,”, “-”,
“.”,
“/”, “0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9”, “:”, “;”, “<”,
“=”,
“>”, “?”, “@”, “A”, “B”, “C”, “D”, “E”, “F”, “G”, “H”, “I”, “J”, “K”,
“L”,
“M”, “N”, “O”, “P”, “Q”, “R”, “S”, “T”, “U”, “V”, “W”, “X”, “Y”, “Z”,
“[”,
“\”, “]”]
‘9’.succ
=> “10”

In 1.8, String#succ is consistently used for iteration; when ‘9’.succ
yields
‘10’, it spots that the endpoint is no longer reachable and ends the
iteration. 1.9 seems to embody more special cases, as ‘9’ is followed by
‘:’
in the above range even though ‘9’.succ is still ‘10’.

I think treating the alphanumerics as special is useful, e.g. I like
that
‘9’.succ is ‘10’ and ‘z’.succ is ‘aa’ rather than using dumb
charcode-based
sequencing, but it is only useful if it is applied consistently and
predictably.

There is obviously room for debate over the above examples: the 1.9 one
could be viewed as more complete even though it doesn’t apply #succ
consistently. A lot of these odd examples are edge cases that are
unlikely
to come up in production code, but personally I find the rules used by
1.8
easier to understand. If I define my own class, Range will use its #succ
and
#<=> methods to carry out its logic and I expect the same for built-in
classes.

On Wed, Aug 5, 2009 at 6:12 PM, Yukihiro M.[email protected]
wrote:

We have to define the “magic” behavior first, but I love magic if we
can make specific definition.
– So do I, said the wizard, and disappeared into thin air.
But I grant you that he was just the victim of yet another “I just
implemented the specs” bug. :wink:

R.

+1 for that. and -1 for the change.

It just makes the developer look too stupid. Can’t we let the developers
understand the difference between a string and an integer ?

I vote against. If people want numeric ranges, it’s their job to use

On Sun, Aug 16, 2009 at 5:20 PM, Yukihiro M.
[email protected]wrote:

Since #succ does not treat 0x32 etc. specially, I don’t think we need
to consider them as numerical.

I am not going to add magic anywhere but it has already existed.

                                                   matz.

I prefer predictable behaviour, if I test it with “1”…“10” then I will be
very surprised when it breaks on “2”…“10” So surprised that I probably
won’t even look there for many hours. (well, now that I’ve read this
thread
I would, but this bug is so esoteric that I would probably need to pull
out
the debugger, otherwise).

There is surely noone who uses this now since it’s behaviour is
inconsistent. So what do we gain from changing it? Perhaps someone can
use
it. What do we gain from leaving it? An obscure nuance known to a few.

One of the nice things about this language is that there are not many of
these quirks. I think the pride of knowing something that can’t be
easily
discovered or tested for is the reason others object to the change. But
in
my mind, that contradicts the principles that make Ruby so beautiful.

Piyush R. wrote:

+1 for that. and -1 for the change.

It just makes the developer look too stupid. Can’t we let the developers
understand the difference between a string and an integer ?

If it was 20 years ago, I’d understand this sentiment. What I don’t
understand is why programming languages seem to insist on using
semantics that don’t adapt to the natural ways that humans interact or
think. It’s almost as if people prefer to fight against the inevitable
evolution of programming languages.

In this case, your only argument for not introducing this “magic” is
because people need to understand the difference between a string and an
integer, why is that so critical in this case? There’s no ambiguity in
“2”…“11”.

There are a lot of constructs in ruby that make it much easier to use
and understand from a natural language point of view, one of the big
strengths of ruby, and this in turn makes it more accessible to people
who are interested in programming and not getting bogged down in the
minutiae of why “2” is greater than “11”.

I vote against. If people want numeric ranges, it’s their job to use

Yukihiro M. wrote:

I repeat. If String#upto has not contained magic (that inherited from
Perl) for long time, I wouldn’t add any magic that is difficult to
explain using messaging model. But in reality, it does now, and we
cannot remove it. So, I’d rather move forward to avoid astonishment
of mere mortal seeing:

(“9”…“11”).to_a

where String#succ generates “9”, “10”, “11”.

That sounds nice and consistent, but why limit ourselves to numbers?
(“y”…“aa”).to_a
should then generate [“y”, “z”, “aa”] since that’s what String#succ
generates. Sounds like this could snowball into a nightmare of
incompatible changes.

Hi,

In message “Re: Bizarre Range behavior”
on Mon, 17 Aug 2009 21:38:11 +0900, Rick DeNatale
[email protected] writes:
|
|On Sun, Aug 16, 2009 at 6:20 PM, Yukihiro M.[email protected] wrote:
|> I admit 1.8 behavior in above example is simpler to implement, but 1.9
|> behavior follows our (or more precisely “my”) unwritten expectation of
|> character iteration.
|
|Matz, FWIW, and I don’t think I’m alone, my expectations are based on
|Ruby being object oriented meaning, for example, that things related
|to iteration/enumeration should be understandable based on messages
|being sent to the elements.
|
|Any magic which can’t be explained in these terms ‘astonishes’ me.

I repeat. If String#upto has not contained magic (that inherited from
Perl) for long time, I wouldn’t add any magic that is difficult to
explain using messaging model. But in reality, it does now, and we
cannot remove it. So, I’d rather move forward to avoid astonishment
of mere mortal seeing:

(“9”…“11”).to_a

where String#succ generates “9”, “10”, “11”.

And I promise I would not add any new method with Perl-like magic
behavior in the future.

          matz.

Hi,

In message “Re: Bizarre Range behavior”
on Tue, 18 Aug 2009 21:31:38 +0900, Daniel DeLorme
[email protected] writes:

|That sounds nice and consistent, but why limit ourselves to numbers?
| (“y”…“aa”).to_a
|should then generate [“y”, “z”, “aa”] since that’s what String#succ
|generates. Sounds like this could snowball into a nightmare of
|incompatible changes.

Indeed. The issue is that it’s not easily distinguishable whether
a sequence using #succ from a string reaches to another, or not.
If there’s affordable scheme to implement, I’d love to check in.

          matz.

2009/8/18 Yukihiro M. [email protected]

|incompatible changes.

Indeed. The issue is that it’s not easily distinguishable whether
a sequence using #succ from a string reaches to another, or not.
If there’s affordable scheme to implement, I’d love to check in.

For user-defined objects, the <=> method is used to determine whether
the
end of a range has been reached or exceeded, and it seems like 1.8 does
this
for strings (and numbers, of course). I prefer 1.8’s consistent
application
of this rule.

2009/8/18 James C. [email protected]

|That sounds nice and consistent, but why limit ourselves to numbers?
For user-defined objects, the <=> method is used to determine whether the
end of a range has been reached or exceeded, and it seems like 1.8 does this
for strings (and numbers, of course). I prefer 1.8’s consistent application
of this rule.

Actually I need to qualify this: the rule seems to be that if `(a <=> b)

1anda.length >= b.lengththen b is not reachable using asucc`
sequence from a.

On Monday 17 August 2009 08:46:25 am Scott B. wrote:

Piyush R. wrote:

+1 for that. and -1 for the change.

It just makes the developer look too stupid. Can’t we let the developers
understand the difference between a string and an integer ?

If it was 20 years ago, I’d understand this sentiment. What I don’t
understand is why programming languages seem to insist on using
semantics that don’t adapt to the natural ways that humans interact or
think.

Because the semantics with which humans interact and think are
ambiguous,
often illogical, and often rely on intuition.

We can’t give our languages intuition, but the more we try to do so, and
the
more magic we introduce, the less predictable things get.

There are a lot of constructs in ruby that make it much easier to use
and understand from a natural language point of view, one of the big
strengths of ruby, and this in turn makes it more accessible to people
who are interested in programming and not getting bogged down in the
minutiae of why “2” is greater than “11”.

Programming inevitably leads to at least understanding these minutiae. I
use
Ruby, and I love it for that natural-language expressiveness, and also
just
for the conciseness, even where I know it’s less efficient:

(2…11).map(:&to_s)

But there’s a case to be made that at a certain point, you need to
understand
what’s going on. A simple example: What’s the difference between a
string and a
symbol? Someone who uses strings where they should use symbols is making
their
program needlessly inefficient and verbose; someone who does the
opposite is
introducing a rather serious memory leak and potential DoS
vulnerability.

You could make the case that we should just use strings, and find ways
to make
them really efficient. But hey, at least the semantics of symbols are
adequately
covered by strings – the semantics of numbers really aren’t.

Put another way: Currently, we’re allowed to do:

puts 'Ho! '*3 + ‘Merry Christmas!’

Now, suppose we start making + and * smart, so that ‘2’*‘3’=‘6’. Now
what does
‘2’*3 do? Is it ‘6’, or 6, or ‘222’? It certainly seems feasible a
newbie
would get stuck here – for example, what if they feel like adding 000
as a
delimiter – ‘0’*80 instead of '-'80 to make a horizontal line – did
they
get eighty zeros, or the product of 0
80=0?

Or suppose they added a space into their number accidentally – is '2
'*80
equal to ‘160’ or ‘2 2 2 2 …’? Maybe it’s just me, but '2 ’ seems like
a
much more probable mistake (and a harder one to catch) than saying ‘2’
when
you mean 2.

By making the easy stuff ridiculously easy (and assuming users are
idiots), it
adds enough ambiguity to drive users crazy later on.

Maybe I’m overreacting, and this would be fine for ranges, but I think
“magic”
only makes sense when it’s very well understood and predictable. ‘puts’
calling #to_s on everything, and ‘p’ calling #inspect on everything,
makes
sense. Range calling #to_i sometimes just seems like it’s asking for
trouble.

Yukihiro M. wrote:

Indeed. The issue is that it’s not easily distinguishable whether
a sequence using #succ from a string reaches to another, or not.
If there’s affordable scheme to implement, I’d love to check in.

While investigating this I came across:

“\0019”.succ
=> “\00110”

I expected the result to be “\0020”; is that a bug?

2009/8/19 David M. [email protected]

semantics that don’t adapt to the natural ways that humans interact or
and understand from a natural language point of view, one of the big
strengths of ruby, and this in turn makes it more accessible to people
who are interested in programming and not getting bogged down in the
minutiae of why “2” is greater than “11”.

Programming inevitably leads to at least understanding these minutiae. I
use
Ruby, and I love it for that natural-language expressiveness, and also just
for the conciseness, even where I know it’s less efficient:

I second this. “Magic” (for want of a better word) is only useful when
it
gives you a faster way to achieve the same result. To anyone with
moderate
or above programming experience, the difference between strings and
numbers
is important and I for one would be annoyed at finding strings being
magically handled as numbers when that isn’t what I wanted – especially
if
it were happening to user-supplied data.

This isn’t an implementation detail that ought to be hidden from the
user to
make things easier (like dynamic typing, or automatic garbage
collection):
strings and numbers are conceptually different types of data that
support
different operations and different semantics. I think trying to do too
much
automatic type conversion is likely to end up producing a lot of the
problems that exist with number/string/boolean comparison in PHP and (to
a
lesser extent) JavaScript.

David mentions concatenation vs addition – what about splitting? I can
split “1234” into “12” and “34” and I have two perfectly valid strings;
if I
split the number 1234 into 12 and 34 I’ve not done something meaningful.
In
a number the digits have meaning based on their position within the
number,
which itself depends on the base used to represent the number. A string
is
just a sequence of glyphs, which have no intrinsic meaning at a
technical
level.

Ruby’s design is said to follow the principle of least surprise; to me
this
means that consistency and correctness shouid be maintained. Blurring
the
boundaries between strings and numbers is a frequent cause of bugs for
beginners in some other languages, and I think Ruby does well to enforce
some separation between them to guide people in the right direction.