Assert "foo"!= "foo"[3,1]...revisited

timr · August 23, 2010, 10:51am

consider the following irb session;

“foo”[0]
=> “f”

“foo”[1]
=> “o”

“foo”[2]
=> “o”

“foo”[3]
=> nil

“foo”[4]
=> nil

“foo”[0,3]
=> “foo”

“foo”[1,3]
=> “oo”

“foo”[2,3]
=> “o”
#note the following weird case!!!

“foo”[3,3]
=> ""e$B!!!e(B#this should be nil in my mind, “foo”[3] is nil, and
taking
three characters is still nil.

“foo”[4,3]
=> nil

So, to summarize, when indexing a position beyond the length of a
string, ruby returns nil. But when indexing a slice beyond the length
of a string, ruby returns an empty string “” for the first index
beyond and then nil.

I don’t like that this passes
assert “foo”[3] != “foo”[3,1]

Matz, you are so smart, but this does not follow the principle of
least surprise!!! e$B$3$s$J$3$H$O$X$s$8$c$J$$$9$+!#e(B

I would appreciate it if anyone can explain how this might make
sense…but please only try if you really believe it is a defensible
behavior for the language.
Thanks,
Tim

timr · August 23, 2010, 11:04am

Hi,

In message “Re: assert “foo”[3] != “foo”[3,1]…revisited”
on Mon, 23 Aug 2010 17:51:02 +0900, timr [email protected]
writes:

|I don’t like that this passes
|assert “foo”[3] != “foo”[3,1]
|
|Matz, you are so smart, but this does not follow the principle of
|least surprise!!! e$B$3$s$J$3$H$O$X$s$8$c$J$$$9$+!#e(B

“foo”[3,1] is “” since when index is within the string, the sought
length will be rounded to fit in the size. And 3 (which equals to the
length of the string) is considered as touching the end of the string,
so the result length is zero.

And a tip for you; never mention PoLS again to persuade me. It’s no
use. If you have real trouble besides misunderstanding, let me know.

          matz.

timr · August 23, 2010, 1:16pm

2010/8/23 Yukihiro M. [email protected]

…In message “Re: assert “foo”[3] != “foo”[3,1]…revisited”
on Mon, 23 Aug 2010 17:51:02 +0900, timr [email protected] writes:

|I don’t like that this passes
|assert “foo”[3] != “foo”[3,1]
|…

“foo”[3,1] is “” since when index is within the string, the sought

length will be rounded to fit in the size. And 3 (which equals to the
length of the string) is considered as touching the end of the string,
so the result length is zero.
…
matz.

I’m grateful for the reply, and for the question that prompted it, as
that
is something that has bothered/intrigued me for some time, and I had
wondered whether it was something to do with string (or array)
processing
near the end of the string. I think it’s worth adding something like
Matz’s
wording to
class String - RDoc Documentation
class Array - RDoc Documentation
because reasons for why things work a particular way do help people
(well,
at least me) remember behaviour.

I first noticed this when I was doing some complicated string
processing,
and was using (or trying to use) something like str[i, 1] being nil to
indicate that the end of the string had been reached. (I could perhaps
have
used regular expressions, but I didn’t - and sometimes still don’t -
trust
my understanding of them, and I was very wary of an apparent regular
expression match or non-match being not quite what I’d intended it to
be.) I
then found that arrays worked the same way.

At the time I wondered about asking whether it might be of more general
use
if String and Array also had a variation of slice having the behaviour I
wanted for what I was doing (that is working like slice, except when the
index is just past the end of the string or array when it would return
nil),
but didn’t pursue it, partly from assuming that since [] / slice were
fairly
fundamental parts of String and Array it was very likely this had been
debated before, even if I couldn’t find the debate.

I’m tempted to suggest that now, partly because I would find it useful
(not
sure what to call it, but it should start with “slice”) - yes, I know I
could write something to do that for my own use! - and partly because if
you
(well I) have two similar but a little differently named methods it
might
help you (well me) remember that there is a behaviour difference to what

choosing my words very carefully - I am expecting. (And being well aware
that I didn’t invent the Ruby computer language.)

*** off the topic of the thread, but on the topic alluded to by my last
comments, and prompted by Matz’s penultimate sentence: I’m rather
distrustful of slogans - I think there is a danger that they start being
used as a substitute for really thinking about things, so I was pleased
when
the use of a certain phrase in Ruby discussions became discouraged
(quite a
long time ago now). I don’t use Python, but I do sometimes look at the
discussion groups, and I get the impression that “there should only be
one
obvious way to do it” sometimes (frequently???) gets misused, not least
by
the omission of “obvious”. It also illustrates nicely how slogans can
get
corrupted: looking here
http://www.python.org/dev/peps/pep-0020
I find that (a) the original (?) version is more complex, and (b) more
than
somewhat self-deprecating, which I like, because it at least hints at
the
possibility that different people can quite reasonably take different
views
on things.
There should be one – and preferably only one – obvious way to do
it.
Although that way may not be obvious at first unless you’re Dutch.

timr · August 23, 2010, 2:40pm

Hi,

In message “Re: assert “foo”[3] != “foo”[3,1]…revisited”
on Mon, 23 Aug 2010 20:10:23 +0900, Colin B.
[email protected] writes:

Yep, description proposal is welcome.

          matz.

timr · August 24, 2010, 3:55am

Hi again,
Thanks for responding to my question–it is something akin to getting
a return letter from the President. And sorry for the previous
reference to PoLS. Though I understand the current function of
String#[] vs. String#[,] and can predict their output, I don’t concede
that anything gained by having a divergence in behavior at the end of
a string. In fact, without the divergent behavior at the end of a
string, the documentation could simply read:

Element Reference—If passed a single Fixnum, returns the code of the
character at that position. If passed two Fixnum objects, returns a
substring starting at the offset given by the first, and a length
given by the second. If given a range, a substring containing
characters at offsets given by the range is returned. In all three
cases, if an offset is negative, it is counted from the end of str.
Returns nil if the initial offset falls outside the string, the length
is negative, or the beginning of the range is greater than the end.

(This is in fact how the documentation currently reads. If there were
no edge cases to account for, one could leave it implied that the
String#[] and String#[,] methods are predictably related with
String#[] being equivalent to String#[,] with an implied second
argument of 1.)

However, as element referencing is currently implemented, the
documentation may benefit from highlighting the edge-case at the end
of strings. See below for an attempt…

Please note edge-case when the index position is the same as the
length of the string:
“foo”[0] == “foo”[0,1]
“foo”[1] == “foo”[1,1]
“foo”[2] == “foo”[2,1]
“foo”[3] != “foo”[3,1] #These are different (nil on the left and “” on
the right). Just memorize this edge case.

“foo”[3,1] = “d” #=> “food”
“foo”[3] = “d” #=> IndexError. #And memorize also for good measure.

And here is a proposal for the updated change to the documentation:

Element Reference—If passed a single Fixnum, returns the code of the
character at that position. At the end of the string, single parameter
element referencing will return nil. If passed two Fixnum objects,
returns a substring starting at the offset given by the first, and a
length given by the second. At the end of the string, two-argument
element referencing returns and empty string (""). If given a range, a
substring containing characters at offsets given by the range is
returned. In all three cases, if an offset is negative, it is counted
from the end of str. Returns nil if the initial offset falls outside
the string, the length is negative, or the beginning of the range is
greater than the end.

Of course, simplicity/elegance is in the mind of the beholder. But, I
hope that you might consider the option of eliminating the divergent
behavior of String#[] and String[,] when the index == str.length.
Thanks,
Tim

timr · August 24, 2010, 1:00pm

On Tue, Aug 24, 2010 at 2:55 AM, timr [email protected] wrote:

On Aug 23, 5:40 am, Yukihiro M. [email protected] wrote:

on 23 Aug 2010 20:10:23 +0900, Colin B. [email protected]
wrote:
Of course, simplicity/elegance is in the mind of the beholder. But,
I hope that you might consider the option of eliminating the divergent
behavior of String#[] and String[,] when the index == str.length.

At the end of this post I’ve put examples of what I had in mind for
slice
methods which behave that way.

The following is a try at modifying the documentation for Array. (I’m
using
Array because that doesn’t have the complication of the change in
behaviour
of string[index] from 1.8 to 1.9, and because the current documentation
for
Array does have the special cases, albeit I think it could perhaps be
more
precise. Adaptation to String should be straightforward.) As much as
possible the try uses the existing documentation with minimal changes,
and
I’ve included Matz’s explanation with what I hope are appropriate
changes
for array.

Comments (not intended to be included in the documentation) are /*
comment
*/.
(Apologies in advance if the formatting is weird: Gmail sometimes
deletes
leading spaces (and others?) when it thinks it knows better than me.)

array[index] → obj or nil
array[start, length] → an_array or nil
array[range] → an_array or nil
array.slice(index) → obj or nil
array.slice(start, length) → an_array or nil
array.slice(range) → an_array or nil

Element Referenceâ€”Returns the element at index, or returns a subarray
starting at start and continuing for length elements, or returns a
subarray
specified by range. Negative indices count backward from the end of the
array (-1 is the last element).
/* start a new line to highlight that an out of range index does not
always
return nil /
Returns nil if the start (or starting index) are out of range:
/ suggested additional documentation /
/new line/ unless* there is a length and Integer(start) ==
length;
/new line/ or the argument is a range and Integer(range.begin)

length.
/new line/ For these special cases (see the table of examples) the
return value is an empty array [].
/new line/ The reason for this special behaviour is that when the
start
is within the array the sought length is rounded /* or use “truncated”?
*/
to fit in the size. In the special cases examples 5 (which is the length
of
the array) is considered as touching the end of the array, so the
returned
value is a subarray with length zero.

/*back to current documentation */
a = [ “a”, “b”, “c”, “d”, “e” ] # a.length == 5
a[2] + a[0] + a[1] #=> “cab”
a[6] #=> nil
a[1, 2] #=> [ “b”, “c” ]
a[1…3] #=> [ “b”, “c”, “d” ]
a[4…7] #=> [ “e” ]
a[6…10] #=> nil
a[-3, 3] #=> [ “c”, “d”, “e” ]

/* suggested additional documentation */
The following table shows the special cases behaviour
when the start position is just past the end of the array.

index/
start a[index] a[start, 2] a[start…7]

3 “d” [ “d”, “e” ] [ “d”, “e” ]
4 “e” [ “e” ] [ “e” ]
5 nil [] [] # ← special cases
6 nil nil nil

*** *** example additional slice methods which return nil
*** *** if the start position is outside the array, even if
*** *** the start position is only just after the end of the array

module Array_String_at_slice

Intended for Array and String: behave like #[], #slice and #slice!

except when the arguments are not just an index,

that is the arguments are a range or an index and a length,

and Integer(range.begin) or Integer(index) == array_string.length,

when #[], #slice and #slice! return an empty array [] or string “”,

but #at_slice and #at_slice! return nil.

In other words, if #at(index) or #at(range.begin) would return nil

then #at_slice(index, arg), #at_slice!(index, arg),

#at_slice(range) and #at_slice!(range) also return nil.

The method names are intended to convey that the slice behaviour

is similar to #at. Using the name #slice_at was considered,

but rejected (wrongly?) as possibly being capable of being assumed

to be a synonym for #slice.

def at_slice( *args )
unless Numeric === (ii = args[0]) then ii = ii.begin end
if ii >= self.size then nil else slice( *args ) end
end

def at_slice!( *args )
unless Numeric === (ii = args[0]) then ii = ii.begin end
if ii >= self.size then nil else slice!( *args ) end
end
end

class Array
include Array_String_at_slice
end

timr · August 24, 2010, 1:21pm

Correcting my own post to remove confusion between length as in
array.size
and length as in slice(index, length)

Element Referenceâ€”Returns the element at index, or returns a subarray
starting at start and continuing for length elements, or returns a
subarray
specified by range. Negative indices count backward from the end of the
array (-1 is the last element).
/new line to highlight that an out of range index might not return
nil/
Returns nil if the start (or starting index) are out of range:
/* suggested additional documentation /
/new line/ unless there is a length and Integer(start) ==
array.size;
/new line/ or the argument is a range and Integer(range.begin) ==
array.size.
/new line/ For these special cases (see the table of examples) the
return
value is an empty array [].
/new line/ The reason for this special behaviour is that when the
start is
within the array the sought length is rounded / or use “truncated”? */
to
fit in the size. In the special cases examples 5 (which is the size of
the
array) is considered as touching the end of the array, so the returned
value
is a subarray with size zero.