Ruby string slice/[] w/ range, weird end behavior


#1

First the docs:

…If passed two Fixnum objects, returns a substring starting at the
offset given by the first, and a length given by the second. If given
a range, a substring containing characters at offsets given by the
range is returned… Returns nil if the initial offset falls outside
the string, the length is negative, or the beginning of the range is
greater than the end.

Now from irb (1.8):

“foo”[2…2]
=> “o”

“foo”[3…3]
=> “” # ???

“foo”[4…4]
=> nil

“foo”[2,1]
=> “o”

“foo”[3,1]
=> “” # ???

“foo”[4,1]
=> nil

“foo”[2]
=> 111 # (the ‘o’ char)

“foo”[3]
=> nil # This makes sense to me, but seems inconsistent wrt the above

Seems to me like the null terminator of the string is somehow getting
muddled into all of this.

Is there any meaning/purpose behind this behavior?

Thanks,
Gary


#2

On 9 May 2009, at 00:26, Gary Y. wrote:

=> “” # ???

Is there any meaning/purpose behind this behavior?

String indices start at zero, so:

“foo”[0…0] => ‘f’
“foo”[1…1] => ‘o’
“foo”[2…2] => ‘o’
“foo”[3…3] => nil

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason


#3

Am Samstag 09 Mai 2009 01:32:06 schrieb Eleanor McHugh:

“foo”[3…3] => nil

You sure? I get “”, same as the op. If it would return nil, I don’t
think the
op would have asked his question.


#4

Element Reference—If passed a single Fixnum, returns the code of the
character at that position. If passed two Fixnum objects, returns a
substring starting at the offset given by the first, and a length
given by the second. If given a range, a substring containing
characters at offsets given by the range is returned. In all three
cases, if an offset is negative, it is counted from the end of str.
Returns nil if the initial offset falls outside the string, the length
is negative, or the beginning of the range is greater than the end.

two fixed numbers returns a substring.

On Fri, May 8, 2009 at 4:39 PM, Sebastian H.


#5

On 9 May 2009, at 00:39, Sebastian H. wrote:

Am Samstag 09 Mai 2009 01:32:06 schrieb Eleanor McHugh:

“foo”[3…3] => nil

You sure? I get “”, same as the op. If it would return nil, I don’t
think the
op would have asked his question.

Sorry, typo on my part (I’m not having a good week for these it
seems). The point I was trying to make was that

“foo”[3…3] => “”

is a thoroughly valid range, but I guess that would have been clearer
with a fuller explanation. Consider

“foo”[2…3] => “o”

index [3] is actually the end of the string so whilst the range
accesses the string after any characters in it, it’s still accessing
the string in a valid range.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason


#6

Here’s an even simpler case:

“”[0]
=> nil

“”[0…0]
=> “”

“”[0…1]
=> “”

“”[1…1]
=> nil

Why differentiate between returning “” and nil?
Why isn’t this explained in the docs?


#7

On May 8, 4:47 pm, Eleanor McHugh removed_email_address@domain.invalid
wrote:

“foo”[3…3] => “”

index [3] is actually the end of the string so whilst the range
accesses the string after any characters in it, it’s still accessing
the string in a valid range.

Maybe in the native C. But why should that be exposed?
And why shouldn’t index[3] return “” if you are correct?

After all:

a=[]; “asd”.each_byte{|x| a << x}; a
=> [97, 115, 100]

If I ask for a substring entirely out of bounds, I should consistently
be returned nil or “”, not one of the two.


#8

On 9 May 2009, at 00:44, removed_email_address@domain.invalid wrote:

Element Reference—If passed a single Fixnum, returns the code of the
character at that position. If passed two Fixnum objects, returns a
substring starting at the offset given by the first, and a length
given by the second. If given a range, a substring containing
characters at offsets given by the range is returned. In all three
cases, if an offset is negative, it is counted from the end of str.
Returns nil if the initial offset falls outside the string, the length
is negative, or the beginning of the range is greater than the end.

two fixed numbers returns a substring.

The same result would occur with

“foo”[3,1] => “”
“foo”[4,1] => nil

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason


#9

On 9 May 2009, at 01:19, Gary Y. wrote:

On May 8, 4:47 pm, Eleanor McHugh removed_email_address@domain.invalid
wrote:

“foo”[3…3] => “”

index [3] is actually the end of the string so whilst the range
accesses the string after any characters in it, it’s still accessing
the string in a valid range.

Maybe in the native C. But why should that be exposed?

It’s not the C implementation, it’s the conceptual model of what a
string is: i.e. an array of characters addressable by index and range.

And why shouldn’t index[3] return “” if you are correct?

Because in this case the question you’re asking isn’t “What substring
occupies the given segment of the string” but “Which character is
stored at the given index in the string”. If no character is stored
there (as the case for “foo”[3]) then nil is the only meaningful answer.

“foo”[3] => nil
nil.to_s => “”

After all:

a=[]; “asd”.each_byte{|x| a << x}; a
=> [97, 115, 100]

If I ask for a substring entirely out of bounds, I should consistently
be returned nil or “”, not one of the two.

And the substring “foo”[3…3] is in bounds because conceptually you’re
dealing with:

f o o
0 1 2 3

so [3…3] equals the slice at the end of the string but not containing
any characters.

And yes, I know this is probably about as clear as mud - my ability to
write English seems to be inversely proportional to the difficulty of
the code I’m working on at any given time, and currently I’m buried in
research so the code is very hairy indeed :frowning:

Whilst it’s not directly relevant, being a different language and all,
I recommend “Chapter 3: String Processing” in Programming with Unicon
(http://unicon.sourceforge.net/ubooks.html
) as it’s the same basic model.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason


#10

Are replies to this group always like this?

someone blindly posts the doc without reading that I already posted
the doc or reading my example (like an implicit RTFM)
other folks automatically assume I’m a newb and respond in five
seconds without fully reading my post or thinking about their own post

do i need to be a kool kid or have a secret code word for this noise
to go away?

i’ve found the more niche ruby groups to be much more signal… but
this didn’t seem to fit into a niche


#11

On 9 May 2009, at 01:23, Gary Y. wrote:

Are replies to this group always like this?

someone blindly posts the doc without reading that I already posted
the doc or reading my example (like an implicit RTFM)
other folks automatically assume I’m a newb and respond in five
seconds without fully reading my post or thinking about their own post

Yes. We all have our moron moments. After all, the String.[]
documentation clearly states:

  Element Reference---If passed a single +Fixnum+, returns a
  substring of one character at that position.

which is precisely what I’ve just tried to explain in my other message
and confirms that the behaviour you’re querying is completely
consistent with the conceptual model of a string of characters.

A Ruby string is not a *char[] and the index points are intersticies
between an array of characters, not the addresses of those characters.

do i need to be a kool kid or have a secret code word for this noise
to go away?

i’ve found the more niche ruby groups to be much more signal… but
this didn’t seem to fit into a niche

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason


#12

On Sat, May 9, 2009 at 3:12 AM, Eleanor McHugh
removed_email_address@domain.invalid wrote:

On 9 May 2009, at 01:23, Gary Y. wrote:

Are replies to this group always like this?

someone blindly posts the doc without reading that I already posted
the doc or reading my example (like an implicit RTFM)
other folks automatically assume I’m a newb and respond in five
seconds without fully reading my post or thinking about their own post

Yes. We all have our moron moments. After all, the String.[] documentation
Speak for yourself, I am always a moron (luckily I have no idea what
that means).
clearly states:

Element Reference---If passed a single +Fixnum+, returns a
substring of one character at that position.

Does it?
Well I guess so, for Ruby1.8.* :wink:
OP will be pleased with Ruby1.9 I guess.
Cheers
Robert


#13

On Sat, May 9, 2009 at 1:41 PM, Eleanor McHugh
removed_email_address@domain.invalid wrote:

I pulled that straight from ri in my 1.9.1 install…

From Ruby 1.9.1

Element Reference---If passed a single +Fixnum+, returns a
substring of one character at that position.

oh yes that is what it does, I cannot read, sorry (but I just proved
my statemet above :wink:


#14

On 9 May 2009, at 09:22, Robert D. wrote:

On Sat, May 9, 2009 at 3:12 AM, Eleanor McHugh
removed_email_address@domain.invalid wrote:

Yes. We all have our moron moments. After all, the String.[]
documentation
Speak for yourself, I am always a moron (luckily I have no idea what
that means).

I know exactly what you mean - let those who never write a bug throw
the first rant :slight_smile:

clearly states:

Element Reference---If passed a single +Fixnum+, returns a
substring of one character at that position.

Does it?
Well I guess so, for Ruby1.8.* :wink:

I pulled that straight from ri in my 1.9.1 install…

  From Ruby 1.9.1

  Element Reference---If passed a single +Fixnum+, returns a
  substring of one character at that position.

Ellie

Eleanor McHugh
Games With Brains
http://slides.games-with-brains.net

raise ArgumentError unless @reality.responds_to? :reason


#15

Robert K. wrote:

On 09.05.2009 02:19, Gary Y. wrote:

On May 8, 4:47 pm, Eleanor McHugh removed_email_address@domain.invalid
wrote:

“foo”[3…3] => “”

index [3] is actually the end of the string so whilst the range
accesses the string after any characters in it, it’s still accessing
the string in a valid range.

Maybe in the native C. But why should that be exposed?
And why shouldn’t index[3] return “” if you are correct?

If we change perspective a bit, the behavior seems pretty naturally to
me: if you execute this:

It doesn’t to me. I’ll throw in with the op: that is stupid behavior
and whoever wrote the docs had no idea how the end of a string is
handled in ruby. Typical crappy ruby documentation.


#16

On 09.05.2009 02:19, Gary Y. wrote:

On May 8, 4:47 pm, Eleanor McHugh removed_email_address@domain.invalid
wrote:

“foo”[3…3] => “”

index [3] is actually the end of the string so whilst the range
accesses the string after any characters in it, it’s still accessing
the string in a valid range.

Maybe in the native C. But why should that be exposed?
And why shouldn’t index[3] return “” if you are correct?

If we change perspective a bit, the behavior seems pretty naturally to
me: if you execute this:

s = “foo”
l = s.length
(l + 2).times do |i|
p i, s[i,l - i], s[i, 1 + l - i]
end

you get this:

0
“foo”
“foo”
1
“oo”
“oo”
2
“o”
“o”
3
“”
“”
4
nil
nil

In this context, returning the empty string for 3,0 seems ok -
especially if you consider, that s[a,b] is truncated at the end of the
string if a + b > s.length.

Kind regards

robert


#17

The ruby array docs say:
a = [ “a”, “b”, “c”, “d”, “e” ]

special cases

a[5] #=> nil
a[5, 1] #=> []
a[5…10] #=> []

As one would expect, slice behavior for an array and a string are
consistent, even if not consistently documented.
I wish the docs would have also expressed the language designer’s
intent, rather than just enumerate special cases.

An abstraction of half-steps (brings me back to my days of
computational fluid dynamics research – pressure/density on whole
steps, velocity/flow on half steps) somewhat explains the slice
behavior – n elements have n+1 fenceposts.

Though this explains the behavior, it doesn’t explain why it is good
behavior. In fact, I find it highly annoying because you need to
compare against two possible values (nil and empty?, unless I monkey-
patch nil.empty? #=> true) to see if you have valid elements or not,
and that an empty array/string evaluates successfully with <=> rather
than raising a nil ptr exception.

The software engineer in me thinks [-1] s similarly dangerous because
it doesn’t catch an off-by-one bug, but I do find the negative indices
to be elegant enough to more than offset the detraction.

I would love to see someone cogently defend the behavior of slice
here. I think returning nil is safer from the software engg
perspective. Do you have a good use case where something can be done
elegantly using the special cases documented by array?