String/array slices

luislavena · March 30, 2011, 7:34pm

Hello,

I know that this has been covered a bit here:
Ruby string slice/[] w/ range, weird end behavior - Ruby - Ruby-Forum but I’m still not certain that I
understand.

s = “foo”

s[3] is nil, like I would expect.

s[3,0] is “”, instead of nil.
s[4,0] is finally nil.

I don’t understand how I’m indexing ‘3’ in the context of [3,0] and
getting anything but nil.
Since s[3] is already nil in the first place.

Same for arrays:

a = [:one, :two]

a[2] is nil
a[2,0] is an empty array ??
a[3,0] is finally nil though.

I understand that in the docs these are special cases and they’re not
preventing me from working or anything like that. I am just curious to
understand the why/how about them working this way.

Thank you!

ptyler · March 30, 2011, 8:07pm

Patrick Tyler wrote in post #990031:

Hello,

I know that this has been covered a bit here:
Ruby string slice/[] w/ range, weird end behavior - Ruby - Ruby-Forum but I’m still not certain that I
understand.

s = “foo”

s[3] is nil, like I would expect.

s[3,0] is “”, instead of nil.

That behaviour is contrary to the description in the 1.9.2 docs here:

http://www.ruby-doc.org/core/classes/Array.html

which say:

 Returns nil if the index (or starting index) are out of range.

Because 3 is out of range, s[3,0] should return nil according to the
docs. Even the special cases examples in the docs contradict what the
docs say.

As that other thread mentioned, it probably has something to do with the
underlying C implementation. In C/C++ strings are null terminated:

foo\0

If you are stepping through a string in C, the only way you know that
you’ve come to the end of the string is when you hit \0.

The ruby string behaviour probably is related to that \0 character, i.e.
the index 3 actually refers to the \0 character, and ruby treats that
position as quasi inbounds.

ptyler · March 30, 2011, 8:23pm

I might add, that feature is clearly a mistake in the ruby language
because C suffers from no such problems.

ptyler · March 30, 2011, 10:32pm

On Mar 30, 2011, at 2:24 PM, 7stud – wrote:

I might add, that feature is clearly a mistake in the ruby language
because C suffers from no such problems.

It is quite clear that Ruby’s string model is not at all like C’s so why
should a particular string feature for Ruby be judged according to C’s
semantics?

Gary W.

ptyler · March 30, 2011, 11:16pm

Gary,

Do you have a different way of explaining why ruby goes past the last
possible index in the situations I asked about above? You mention that
Ruby’s model is not at all like C’s, so maybe you can help clear this up
please?

Thanks to you both!

ptyler · March 30, 2011, 9:42pm

Yep, I see. That’s nutty. I wish that ruby would stop at the null, not
include it.

Thanks!

ptyler · March 30, 2011, 11:23pm

On Mar 30, 2011, at 2:08 PM, 7stud – wrote:

s[3,0] is “”, instead of nil.

That behaviour is contrary to the description in the 1.9.2 docs here:

class Array - RDoc Documentation

The docs certainly could be more clear but the actual behavior is
self-consistent and useful.
Note: I’m assuming 1.9.X version of String.

It helps to consider the numbering in the following way:

-4 -3 -2 -1 ← numbering for single argument indexing
0 1 2 3
±–±–±–±–+
| a | b | c | d |
±–±–±–±–+
0 1 2 3 4 ← numbering for two argument indexing or start of
range
-4 -3 -2 -1

The common (and understandable) mistake is too assume that the semantics
of the single argument index are the same as the semantics of the
first argument in the two argument scenario (or range). They are not
the same thing in practice and the documentation doesn’t reflect this.
The error though is definitely in the documentation and not in the
implementation:

single argument: the index represents a single character position
within the string. The result is either the single character string
found at the index or nil because there is no character at the given
index.

s = “”
s[0] # nil because no character at that position

s = “abcd”
s[0] # “a”
s[-4] # “a”
s[-5] # nil, no characters before the first one

two integer arguments: the arguments identify a portion of the string to
extract or to replace. In particular, zero-width portions of the string
can also be identified so that text can be inserted before or after
existing characters including at the front or end of the string. In this
case, the first argument does not identify a character position but
instead identifies the space between characters as shown in the diagram
above. The second argument is the length, which can be 0.

s = “abcd” # each example below assumes s is reset to “abcd”

To insert text before ‘a’: s[0,0] = “X” # “Xabcd”
To insert text after ‘d’: s[4,0] = “Z” # “abcdZ”
To replace first two characters: s[0,2] = “AB” # “ABcd”
To replace last two characters: s[-2,2] = “CD” # “abCD”
To replace middle two characters: s[1…3] = “XX” # “aXXd”

The behavior of a range is pretty interesting. The starting point is the
same as the first argument when two arguments are provided (as described
above) but the end point of the range can be the ‘character position’ as
with single indexing or the “edge position” as with two integer
arguments. The difference is determined by whether the double-dot range
or triple-dot range is used:

s = “abcd”
s[1…1] # “b”
s[1…1] = “X” # “aXcd”

s[1…1] # “”
s[1…1] = “X” # “aXbcd”, the range specifies a zero-width portion of
the string

s[1…3] # “bcd”
s[1…3] = “X” # “aX”, positions 1, 2, and 3 are replaced.

s[1…3] # “bc”
s[1…3] = “X” # “aXd”, positions 1, 2, but not quite 3 are replaced.

If you go back through these examples and insist and using the single
index semantics for the double or range indexing examples you’ll just
get confused. You’ve got to use the alternate numbering I show in the
ascii diagram to model the actual behavior.

Gary W.

ptyler · March 30, 2011, 11:37pm

It’s not only a good thing that Ruby works this way, it’s necessary.

The s[n, 0] defines a place just before or after a character, and
often before one and after another.

So:

t = ‘hi’
t[0,0] = ‘(’
t[3,0] = ‘)’
t
=> “(hi)”

In your adjusted version this doesn’t work. It’s rather interesting
that the space between the last character and a string is not nil but
a 0-length string. This makes it possible to see beforehand if the
assignment would work. Otherwise, one would just have to wait for a
(possible) IndexError exception. It even makes sense intuitively if
you think about it a minute.

Ruby, as it happens, is designed very well.

Oh, and C doesn’t really have strings. I love C but it is about the
last place I would ever look for inspiration on string handling…

On Wed, Mar 30, 2011 at 2:16 PM, Patrick Tyler

ptyler · March 31, 2011, 2:45am

Ross Harvey wrote in post #990066:

It’s not only a good thing that Ruby works this way, it’s necessary.
It even makes sense intuitively if
you think about it a minute.

Yes and thanks to you too. I’m onboard now and agree, it’s a really
neat design!

ptyler · March 30, 2011, 11:43pm

Wow Gary, I really appreciate the time you took to type that up. I have
never considered thinking of it in the manner that you presented it.
Thanks a lot, I now understand it completely.

ptyler · March 31, 2011, 2:55am

Gary W. wrote in post #990058:

On Mar 30, 2011, at 2:24 PM, 7stud – wrote:

I might add, that feature is clearly a mistake in the ruby language
because C suffers from no such problems.

It is quite clear that Ruby’s string model is not at all like C’s so why
should a particular string feature for Ruby be judged according to C’s
semantics?

…because ruby is written in C??

ptyler · March 31, 2011, 7:50am

On Thu, Mar 31, 2011 at 09:55:22AM +0900, 7stud – wrote:

Gary W. wrote in post #990058:

It is quite clear that Ruby’s string model is not at all like C’s so
why should a particular string feature for Ruby be judged according
to C’s semantics?

…because ruby is written in C??

Should Ruby’s string model be like the Intel x86 instruction set’s
string
model, then? It’s turtles all the way down!

ptyler · March 31, 2011, 9:17am

On Thu, Mar 31, 2011 at 2:03 AM, 7stud – [email protected]
wrote:

Gary W. wrote in post #990065:

In particular, zero-width portions of the string
can also be identified so that text can be inserted before or after
existing characters including at the front or end of the string.

Nice explanation. Now if ruby had a self documenting docs, like php, we
could add your post to the docs, and they would be much improved.

+1 on getting this explanation into the docs.

ptyler · August 16, 2011, 7:42pm

Ross Harvey wrote in post #990066:

The s[n, 0] defines a place just before or after a character, and
often before one and after another.

So:

t = ‘hi’
t[0,0] = ‘(’
t[3,0] = ‘)’
t
=> “(hi)”

I’m new to Ruby and found this issue in one of the Ruby Koans1 tests
(test_slicing_arrays).

As an outsider, using two different notions of an index (the element and
the space between elements) is very unexpected, especially for
syntax that is so closely tied (a[0] vs. a[0,0]). Further, the
documentation makes no effort to differentiate between the two notions
(other than using ‘index’ and ‘start’).

If the only justification for this behavior is the example given above,
would it not be clearer to encourage str.insert and str.concat
(similarly, ary.insert and ary.concat)?

ptyler · March 31, 2011, 3:03am

Gary W. wrote in post #990065:

In particular, zero-width portions of the string
can also be identified so that text can be inserted before or after
existing characters including at the front or end of the string.

Nice explanation. Now if ruby had a self documenting docs, like php, we
could add your post to the docs, and they would be much improved.