Dirty ranges

I’m a new Ruby user (currently at page 68 of Programming Ruby !) and
having found something weird, I wonder if either a) “you have already
found a bug, report it” or b) “yeah, yeah, we all know that this is
a bit weird, but it is not a problem in practice”.

It seems that you can do destructive operations on the minimum element
of a range, but not on the maximum element (well, you can, but it
does not have any effect):

irb(main):001:0> rng=“a”…“z”
=> “a”…“z”
irb(main):002:0> rng.min[0]=“b”
=> “b”
irb(main):003:0> rng.max[0]=“y”
=> “y”
irb(main):004:0> rng
=> “b”…“z”

This just doesn’t seem right.

Actually this was the second thing I found that doesn’t seem right.
The first was that the first element is shared when you convert
a range into an array (again, the last one is different):

irb(main):005:0> arr=rng.to_a
=> [“b”, “c”, “d”, “e”, “f”, “g”, “h”, “i”, “j”, “k”, “l”, “m”, “n”,
“o”, “p”, “q”, “r”, “s”, “t”, “u”, “v”, “w”, “x”, “y”, “z”]
irb(main):006:0> rng.min[0]=“c”
=> “c”
irb(main):007:0> rng.max[0]=“x”
=> “x”
irb(main):008:0> arr
=> [“c”, “c”, “d”, “e”, “f”, “g”, “h”, “i”, “j”, “k”, “l”, “m”, “n”,
“o”, “p”, “q”, “r”, “s”, “t”, “u”, “v”, “w”, “x”, “y”, “z”]

This is really dirty, but at least this was to be expected from the
specifications (“All you need to be able to make ranges is ‘succ’ and
‘<=>’” – there is no talk of a deep copy as a requirement.)

Comments ?

Dirk van Deun

DÅ?a Pondelok 20 Február 2006 14:53 Dirk van Deun napísal:

=> “a”…“z”
The first was that the first element is shared when you convert
=> [“c”, “c”, “d”, “e”, “f”, “g”, “h”, “i”, “j”, “k”, “l”, “m”, “n”, “o”,
“p”, “q”, “r”, “s”, “t”, “u”, “v”, “w”, “x”, “y”, “z”]

This is really dirty, but at least this was to be expected from the
specifications (“All you need to be able to make ranges is ‘succ’ and
‘<=>’” – there is no talk of a deep copy as a requirement.)

Range code seems to enumerate and determine the maximum by generating
all
successors lesser or equal, or lesser than the Range endpoint. For these
examples to “work”, it would also require to check whether the last
generated
element is equal to the endpoint, and return the endpoint object if it
was.

That said, I think this situation is similar to the one with String keys
in a
hash. Where immutable objects would be enforced otherwise, Ruby gives
you the
responsibility for if you choose to change them. Maybe this behaviour
should
instead be documented as with the Hash class, if it isn’t already; I
can’t
imagine where this could only be worked around with a noticeable kludge.

David V.

[email protected] (Dirk van Deun) writes:

irb(main):004:0> rng
=> “b”…“z”

You can modify the endpoints if you reference them using #begin and
#end:

irb(main):001:0> r = ‘a’…‘z’
=> “a”…“z”
irb(main):002:0> r.begin << ‘!’
=> “a!”
irb(main):003:0> r.end << ‘!’
=> “z!”
irb(main):004:0> r
=> “a!”…“z!”

#begin and #end are direct accessors to the endpoint objects, whereas
#max is calculated, taking into account end-closedness. It should not
be surprising, then, that #max and #end return different objects.

But I suggest not modifying range endpoints at all; Ranges themselves
are immutable, so changing their value indirectly like that is kinda
going against the grain. And of course, if you ever modify your #end
while iterating over the range, it’s nasal demons.

DÅ?a Utorok 21 Február 2006 04:38 George O. napísal:

But I suggest not modifying range endpoints at all; Ranges themselves
are immutable, so changing their value indirectly like that is kinda
going against the grain. And of course, if you ever modify your #end
while iterating over the range, it’s nasal demons.

Pffft. Doesn’t even flinch.

ruby <<EOF
rng1 = (“a”…“g”)
rng1.each { |char|
if char == “d”
rng1.end[0] = “j”
end
puts char
}
rng2 = (“a”…“j”)
rng2.each { |char|
if char == “g”
rng2.end[0] = “d”
end
puts char
}
END

Outputs:

a
b
c
d
e
f
g
a
b
c
d
e
f
g
h
i
j

Of course, I have absolutely NO idea at all why, and don’t particularly
feel
like reading Ruby core source.

David V.

Hrmmm… the second example that Dirk gives is pretty ugly. I should
think that you could simply call clone on the begining value of the
range and get a more desirable result. I don’t know if a deep-copy
would be required, and if it were, it’s only a
Marshal.load(Marshal.dump(obj)) away, right?

David V. [email protected] wrote:

That said, I think this situation is similar to the one with String
keys in a hash. Where immutable objects would be enforced otherwise,
Ruby gives you the responsibility for if you choose to change them.
Maybe this behaviour should instead be documented as with the Hash
class, if it isn’t already; I can’t imagine where this could only be
worked around with a noticeable kludge.

Having said that I can’t see where modifying range members like this is
actually needed. IMHO it’s a bad idea to do so.

Kind regards

robert

DÅ?a Utorok 21 Február 2006 13:23 Dirk van Deun napísal:

=> “z”
irb(main):004:0> rng
=> “z”…“z”

Commandment of not causing obscure bugs: thou shalt not clobber
shared data without due reason.

I can’t imagine why I’d model any functionality with in-place
modification of
something two levels deep in a data structure I have as input. The whole
approach is bound to cause problems sooner or later if you really don’t
know
what you’re doing, in this specific case it’s just a bit more visible
because
it’s obvious that Range object makes little sense.

The begin/end versus min/max I obviously didn’t know, but then again,
if max is an ad-hoc constructed object, shouldn’t min be too, if
only for symmetry ?

How’d you do it? The only requirement Range places on its beginning
point is
comparability and generating successors (if enumerating the members).

No cloneability mentioned anywhere, even if it would happen to help in
case of
Strings. For example, the problem can’t appear with Fixnums, and you
can’t
dup those or clone those - IIRC o.object_id != o.dup.object_id must hold
true
for the operation to be correct, same for clone.

A lose / lose situation basically, but I prefer the currently used
option that
puts things in my hands.

David V.

Hi,

I followed the range discussion. Before now, I had honestly thought:
c’mon, it’s not that much of a problem!
Then I saw:

irb(main):001:0> rng=“a”…“z”
=> “a”…“z”
irb(main):002:0> arr=rng.to_a
=> [“a”, “b”, “c”, “d”, “e”, “f”, “g”, “h”, “i”, “j”, “k”, “l”, “m”,
“n”, “o”, “p”, “q”, “r”, “s”, “t”, “u”, “v”, “w”, “x”, “y”, “z”]
irb(main):003:0> arr[0][0]=“z”
=> “z”
irb(main):004:0> rng
=> “z”…“z”

OUCH!!!
I agree 1000000% with this statement:

Something like this may be more likely to be a real problem.

Oh yes.

Merc.

David V. [email protected] writes:

rng1 = (“a”…“g”)
end
e
i
j

Of course, I have absolutely NO idea at all why, and don’t particularly feel
like reading Ruby core source.

David V.

You’re right; modifying #end is okay. Not documented, though. Also
note that modifying #begin can break a loop. Whether or not this is
counterintuitive depends on the person, I guess.

g@crash:~$ irb
irb(main):001:0> r = ‘a’…‘z’
=> “a”…“z”
irb(main):002:0> r.each{|s| s << ‘!’; puts s}
a!
=> “a!”…“z”

This is another face of Dirk/Tony’s concern. Perhaps the first
element should be cloned for symmetry and safety. But how: #dup,
#clone, something else…? You’d also be adding a new requirement
that non-immediate range elements must be copyable. Hmmm…

For the people who remarked (quite sensibly, of course) that you just
shouldn’t do that, tinker with the endpoints of a range: it works
the other way too, of course:

irb(main):001:0> rng=“a”…“z”
=> “a”…“z”
irb(main):002:0> arr=rng.to_a
=> [“a”, “b”, “c”, “d”, “e”, “f”, “g”, “h”, “i”, “j”, “k”, “l”, “m”,
“n”, “o”, “p”, “q”, “r”, “s”, “t”, “u”, “v”, “w”, “x”, “y”, “z”]
irb(main):003:0> arr[0][0]=“z”
=> “z”
irb(main):004:0> rng
=> “z”…“z”

Something like this may be more likely to be a real problem.

The begin/end versus min/max I obviously didn’t know, but then again,
if max is an ad-hoc constructed object, shouldn’t min be too, if
only for symmetry ?

Hacker-Dirk likes Ruby, but Computer-Scientist-Dirk tends to be wary
of systems with irregularities like these…

Dirk van Deun

: This is another face of Dirk/Tony’s concern. Perhaps the first
: element should be cloned for symmetry and safety. But how: #dup,
: #clone, something else…? You’d also be adding a new requirement
: that non-immediate range elements must be copyable. Hmmm…

The requirement could be weakened a bit, because you do not need to
clone range elements immediately. The begin and end could stay
uncloned; and cloning could be delayed until a min is asked for; so
that the min would be an ad hoc calculated value like the max.

Methods like to_a and each would then need to use min and max, not
begin and end, but they probably already are. “Safe” methods,
like ===, could use begin and end in their implementation, so that
ranges of non-copyable elements would still be possible and
useful. (But only to be used in “safe” circumstances.)

Of course, the weakened solution would not prevent the following
from happening, but this is really “asking for it”:

irb(main):001:0> a=“a”
=> “a”
irb(main):002:0> z=“z”
=> “z”
irb(main):003:0> rng=a…z
=> “a”…“z”
irb(main):004:0> a[0]=“b”
=> “b”
irb(main):005:0> rng
=> “b”…“z”

Accidents that happen indirectly via innocuous-looking to_a and each
calls would be prevented.

Dirk van Deun

[email protected] (Dirk van Deun) writes:

: This is another face of Dirk/Tony’s concern. Perhaps the first
: element should be cloned for symmetry and safety. But how: #dup,
: #clone, something else…? You’d also be adding a new requirement
: that non-immediate range elements must be copyable. Hmmm…

The requirement could be weakened a bit, because you do not need to
clone range elements immediately. The begin and end could stay
uncloned; and cloning could be delayed until a min is asked for; so
that the min would be an ad hoc calculated value like the max.

That’s how I figured you would do it. One of the most common
operations to do on a Range, though, is to traverse it, so it’d be a
limited-use range if you use non-copyable objects. Perhaps I
should’ve said “requirement for traversal” – you could still include?
and friends.