String#[] behaviour

DNNX · December 18, 2007, 2:41pm

‘asd’[0…10] returns ‘asd’ while ‘asd’[-10…-1] returns nil.

As far as I understand, such behaviour completely satisfies ruby
documentation (class String - RDoc Documentation), but
it seems inconsistent to me.

Any thoughts?

Thanks

DNNX · December 18, 2007, 4:13pm

On Dec 18, 7:36 am, DNNX [email protected] wrote:

‘asd’[0…10] returns ‘asd’ while ‘asd’[-10…-1] returns nil.

As far as I understand, such behaviour completely satisfies ruby
documentation (class String - RDoc Documentation), but
it seems inconsistent to me.

Any thoughts?

Thanks

From the docs for String#[] at that link:

“Returns nil if the initial offset falls outside the string…”

DNNX · December 18, 2007, 4:55pm

2007/12/18, DNNX [email protected]:

‘asd’[0…10] returns ‘asd’ while ‘asd’[-10…-1] returns nil.

As far as I understand, such behaviour completely satisfies ruby
documentation (class String - RDoc Documentation), but
it seems inconsistent to me.

Any thoughts?

On one hand you are right. On the other hand, begin and end indexes
are asymmetric anyway: you know that the starting index is always 0
but the ending index can have arbitrary values. I could not say I
came across this so far so for me personally this is a non issue. On
a larger scale it is probably a minor issue. Let’s hear what others
say.

Kind regards

robert

DNNX · December 18, 2007, 5:27pm

On Dec 18, 9:06 am, yermej [email protected] wrote:

Thanks

From the docs for String#[] at that link:

“Returns nil if the initial offset falls outside the string…”

Nevermind what I said. Some days I don’t read so good and stuff.

DNNX · December 18, 2007, 5:36pm

On 18 ÄÅË, 17:54, Robert K. [email protected] wrote:

On one hand you are right. On the other hand, begin and end indexes
are asymmetric anyway: you know that the starting index is always 0
but the ending index can have arbitrary values. …

Hm… On the other hand, end and begin indexes are asymmetric anyway:
you know that the ending index is always -1 but the starting index can
have arbitrary values.

Isn’t this a symmetry?

Best regards,
Viktar

DNNX · December 18, 2007, 9:13pm

On 18/12/2007, DNNX [email protected] wrote:

The asymmetry is in that you can chop off “at most 10 characters from
the start” with 0…10 but not “at most 10 characters from the end”
with -10…-1 because the start that has to be inside the string is the
one of which you cannot be sure. You cannot swap the bounds because
you get an empty string then.

So the symmetric rule for range indexing would be something like this:

a) both ends of the range have same sign → the one with lower
absolute value must be inside the string. In other words, the range
must intersect with 0…string.length. This is the only option that
can create a valid range completely outside of the string (when the
condition is not met).

b) they have different sign, start is non-negative → simple. Either
they give a range inside the string or a range where the start is
higher than end (a…-b => a…length-b), and can always return string,
sometimes empty.

c) the start is negative, end is positive → ideally you get something
inside the string but you can get range that has both start and end
outside of the string - each on different side. Either way it makes
sense. It contains part of the string or start is higher than end
after evaluating (-a…b => length-a…b)

Thanks

Michal

DNNX · December 19, 2007, 12:46am

On Dec 18, 10:23 am, DNNX [email protected] wrote:

Isn’t this a symmetry?

Best regards,
Viktar

No, because “-1” is a special value…it’s got magic in it. It can
magically mean 5, or 10, or even 12 (because it’s magic). “-1” is
just sugar for #length, and #length is always a side-effect of a
container, whereas ‘0’ is a constant entry point.

Regards,
Jordan

DNNX · December 19, 2007, 11:21am

On 19/12/2007, MonkeeSage [email protected] wrote:

No, because “-1” is a special value…it’s got magic in it. It can
magically mean 5, or 10, or even 12 (because it’s magic). “-1” is
just sugar for #length, and #length is always a side-effect of a
container, whereas ‘0’ is a constant entry point.

-1 is as constant as 0. And because of its magic when used as
container index it always means the end. And really length - 1, not
just length. And at that place there is always the last object unless
the container is empty. The same way as the first object is at 0.

Thanks

Michal

DNNX · December 19, 2007, 9:16am

On 19 Ð´ÐµÐº, 01:42, MonkeeSage [email protected] wrote:

have arbitrary values.

Regards,
Jordan

-1 is more special and magic than 0? Hm… 0 also can magically mean
-6, -11, or even -13 (because it’s magic too).

-1 is sugar for #length? Not sure I understand correctly. Never heard
such an interpretation of -1 earlier. Why #length but not #length-1?
Why 0 is not
sugar for -#length? What do you mean saying -1 is a sugar for
something?

0 is a constant entry point? Great, -1 is a constant exit point.

Anyway, is there any symmetry or no, I still believe that returning
‘asd’ in one case and nil in other is not consistent (please see my
example in the first message).

Regards,
Viktar

DNNX · December 19, 2007, 12:49pm

Michal S. wrote:

On 18/12/2007, DNNX [email protected] wrote:

The asymmetry is in that you can chop off “at most 10 characters from
the start” with 0…10 but not “at most 10 characters from the end”
with -10…-1 because the start that has to be inside the string is the
one of which you cannot be sure. You cannot swap the bounds because
you get an empty string then.

So the symmetric rule for range indexing would be something like this:
…skipped…

So we must become clear with range indexing: I think it’s perfectly
legal to return intersection of an array/string with range instead of
nil in a case of negative start.
This can be done via one-line patch in range.c:615 (as in trunk) - just
assume beg = 0 instead of goto out_of_range
Thus we’ll have at least more perl-compatible behavior =) i.e. just as
‘abc’[0…6] is ‘abc’ now, so ‘abc’[-6…-1] will be ‘abc’ as well.

One problem I see in this assumption: ‘abc’[4…6] and ‘abc’[-6…-4] will
return ‘’ instead of nil.

DNNX · December 19, 2007, 1:03pm

On 19/12/2007, Pasha N. [email protected] wrote:

…skipped…
return ‘’ instead of nil.
You can still test the lower bound is inside the string. It’s just
that with negative ranges the lower bound is the second number, not
the first.

Thanks

Michal

DNNX · December 19, 2007, 4:01pm

Jordan Callicoat wrote:

Taking your example, “‘asd’[-10…-1]”, this means
‘asd’[-7…2] when you de-sugar it. Now in the other case,
“‘asd’[0…10]”, once you reach #length-1, you can stop and return
0…#length-1. But with ‘asd’[-7…2], what are you supposed to do when
the start index is less than the first index (0)? Well, you could skip
ahead to the first index, sure, but it makes just as much sense (if
not more) to return nil/empty string. Same goes for cases such as
‘asd’[-2…-3] (i.e., ‘asd’[1…0]), where the start index is greater
than the end index.

IMHO, the main goal of such a construct (some_string[-10…-1]) - to
return last 10 chars from some_string. And in this case - returning
‘asd’ for ‘asd’[-10…-1] seems to be as logical as returning ‘asd’ for
‘asd’ for [0…10] (as implemented now).

right now (1.8.6) we have:

‘asd’[0…10] => ‘asd’
‘asd’[2…1] => ‘’
‘asd’[-1…-2] => ‘’
-BUT-
‘asd’[-10…-1] => nil

I think, that by a “Principle of Least Astonishment” © we can unify
that cases - i.e. to return either ‘asd’ or nil in cases 1) and 4). All
that we need - adjust start index of the range to 0, if negative - right
after de-sugar.

DNNX · December 19, 2007, 3:40pm

On Dec 19, 2:12 am, DNNX [email protected] wrote:

but the ending index can have arbitrary values. …
No, because “-1” is a special value…it’s got magic in it. It can
-1 is sugar for #length? Not sure I understand correctly. Never heard

Regards,
Viktar

I think you (and Michal) missed my point. And yes, I should have said
#length-1. The point is, since there is no such thing as a negative
index – 0 is the first index – and “-1” (or -anynumber) is just
sugar (i.e., just a more convenient syntax for writing #length-
whatever), what you’re asking is for ranges such as [-7…2] and [1…0]
to be meaningful. Taking your example, “‘asd’[-10…-1]”, this means
‘asd’[-7…2] when you de-sugar it. Now in the other case,
“‘asd’[0…10]”, once you reach #length-1, you can stop and return
0…#length-1. But with ‘asd’[-7…2], what are you supposed to do when
the start index is less than the first index (0)? Well, you could skip
ahead to the first index, sure, but it makes just as much sense (if
not more) to return nil/empty string. Same goes for cases such as
‘asd’[-2…-3] (i.e., ‘asd’[1…0]), where the start index is greater
than the end index.

Regards,
Jordan

DNNX · December 20, 2007, 11:30am

On Dec 19, 9:01 am, Pasha N. [email protected] wrote:

‘asd’[-10…-1] => nil

I think, that by a “Principle of Least Astonishment” (c) we can unify
that cases - i.e. to return either ‘asd’ or nil in cases 1) and 4). All
that we need - adjust start index of the range to 0, if negative - right
after de-sugar.

Posted viahttp://www.ruby-forum.com/.

In thinking about it, I guess that does make some sense. Unless one
were to assume that negative starting indexes were more likely to be
programmer errors than larger-than-#length-1 end indexes (does anyone
claim this?), it seems to me that setting negative indexes to 0 would
be consistent with setting larger-than-#length-1 indexes to #length-1.
Maybe you should start an RCR for this.

Regards,
Jordan

String#[] behaviour

I think, that by a “Principle of Least Astonishment” (c) we can unify that cases - i.e. to return either ‘asd’ or nil in cases 1) and 4). All that we need - adjust start index of the range to 0, if negative - right after de-sugar.

I think, that by a “Principle of Least Astonishment” (c) we can unify
that cases - i.e. to return either ‘asd’ or nil in cases 1) and 4). All
that we need - adjust start index of the range to 0, if negative - right
after de-sugar.