Difference between << and += for Strings and Arrays. Bug?

hostep · November 24, 2009, 3:46pm

Hi
I have been programming with Ruby for about 1 year and am loving it. I
cam across something that I can’t figure out why it works the way it
does:

Eg:
a=[1,2]
b=a
b << 3

The value of b is now [1,2,3], but the value of a is also [1,2,3]???
And yet:

a=[1,2]
b=a
b += [3]

The value of b is still [1,2,3] but a remains [1,2]. This just doesnt
make sense to. The same happens with Strings. Why would the append
operator (<<) change the original variable, but other operators dont
seem to. PS What I really want to know is why the original variable is
changed at all?

hostep · November 24, 2009, 4:31pm

On Tue, Nov 24, 2009 at 9:46 AM, Pieter H. [email protected]
wrote:

The value of b is now [1,2,3], but the value of a is also [1,2,3]???
changed at all?
This is another example of the classic confusion between variables and
objects.

variables only reference objects, the aren’t objects themselves

a = [1, 2]
a.object_id # => 2148166040
b = a
b.object_id # => 2148166040
b << 3
b.object_id # => 2148166040
a # => [1, 2, 3]
b # => [1, 2, 3]

b += [4]
a # => [1, 2, 3]
b # => [1, 2, 3, 4]
a.object_id # => 2148166040
b.object_id # => 2148162660

a<< whatever

sends the << message to the object referenced by a, which changes the
object.

This change will be visible no matter which variable referencing the
object

The assignment operator changes the binding of the variable on the
left hand side so that it references the object which is the result of
evaluating the right hand side.

And x += y

is the same as

  x = x + y

So in the

b += [3]
case

The variable b is changed to reference a new object which is the
result of b + [3]

HTH.

–
Rick DeNatale

Blog: http://talklikeaduck.denhaven2.com/
Twitter: http://twitter.com/RickDeNatale
WWR: http://www.workingwithrails.com/person/9021-rick-denatale
LinkedIn: http://www.linkedin.com/in/rickdenatale

hostep · November 24, 2009, 4:33pm

On Tue, Nov 24, 2009 at 9:46 AM, Pieter H. [email protected]
wrote:

The value of b is now [1,2,3], but the value of a is also [1,2,3]???
class Array - RDoc Documentation
“Append—Pushes the given object on to the end of this array.”

And yet:

a=[1,2]
b=a
b += [3]

The value of b is still [1,2,3] but a remains [1,2]. This just doesnt
make sense to. The same happens with Strings. Why would the append
operator (<<) change the original variable, but other operators dont
seem to.

http://ruby-doc.org/core/classes/Array.html#M002209
“Concatenation—Returns a new array built by concatenating the two
arrays together to produce a third array.”

PS What I really want to know is why the original variable is
changed at all?

After b=a, the variables ‘a’ and ‘b’ both reference the same object.
+= creates a new array (the + part) and changes b to reference that
new array (the = part). With << the object referenced by b is
directly modified, it appears that a is also changed because a
references the same object as b. It’s not really that b was changed
by <<, the object referenced by b is what changed.

hostep · November 24, 2009, 4:34pm

2009/11/24 Pieter H. [email protected]:

Hi
Hi,

I have been programming with Ruby for about 1 year and am loving it. I
cam across something that I can’t figure out why it works the way it
does:

Eg:
a=[1,2]
b=a
b << 3

The value of b is now [1,2,3], but the value of a is also [1,2,3]???
b=a means ‘the variable b points to the same object as the variable
a’. So calling a method on b which changes the underlying object means
that the underlying object is changed, regardless of whether you call
it a or b. The << method changes the array. As the docs describe it:
“Append—Pushes the given object on to the end of this array.”
(class Array - RDoc Documentation).

And yet:

a=[1,2]
b=a
b += [3]

The value of b is still [1,2,3] but a remains [1,2]. This just doesnt
make sense to.
‘b += [3]’ is syntactic sugar, or shorthand, for ‘b = b + [3]’. i.e.
This calls the ‘+’ method on b ‘under the hood’. ‘+’ does not change
the array, but returns a new array. From the docs again:
“Concatenation—Returns a new array built by concatenating the two
arrays together to produce a third array.”
(class Array - RDoc Documentation). So here you’re
getting a completely separate object and saying that the variable b
should point to this new object. The old object pointed to by b (and
also by a) is unaffected.

The same happens with Strings.
As per the docs for Strings on the equivalent methods.

Why would the append
operator (<<) change the original variable, but other operators dont
seem to. PS What I really want to know is why the original variable is
changed at all?
I hope my explanations make some kind of sense and help answer these
questions?

Cheers,

Tom

hostep · November 24, 2009, 5:11pm

Hi Guys

Thanks for the quick responses and insights. While I understand the
issue now I must confess that it perturbs my ‘least surprise’
expectation. Why was it decided that ‘<<’ should reference the original
object but ‘+’ should create a new object. When one works with a
variable one shouldn’t need to worry about what other stuff is being
modified due to a unintentional reference? What other operators are
doing this? I noticed that the ‘.pop’ array method was also modifying
the referenced array

a = [1,2]
b = a
b.pop

results in a and be being [1]

I worked around with

a = [1,2]
b = []
b += a
b.pop

Now b == [1] and a == [1,2], but its not as sleek. Is there maybe
another way?

hostep · November 24, 2009, 5:16pm

2009/11/24 Pieter H. [email protected]:

I worked around with

a = [1,2]
b = []
b += a
b.pop

Now b == [1] and a == [1,2], but its not as sleek. Is there maybe
another way?

Depending on your needs, try

a = [1, 2]
b = a.dup
b.pop

or

a = [1, 2]
a.last

Cheers,

Tom

hostep · November 24, 2009, 5:37pm

On Tue, Nov 24, 2009 at 11:11 AM, Pieter H. [email protected]
wrote:

Thanks for the quick responses and insights. While I understand the
issue now I must confess that it perturbs my ‘least surprise’
expectation. Why was it decided that ‘<<’ should reference the original
object but ‘+’ should create a new object.

Consider that b+=a is just a nice shorthand for b = b + a (or b =
b.+(a)).

a = [1, 2]
b = [3, 4]
c = b + a

Would it be desirable for the :+ message to modify the object
represented by b?

hostep · November 24, 2009, 6:18pm

Pieter H. wrote:

Hi Guys

Thanks for the quick responses and insights. While I understand the
issue now I must confess that it perturbs my ‘least surprise’
expectation.

According to Matz, POLS is only meant to apply within Ruby, not with
respect to other languages.

Why was it decided that ‘<<’ should reference the original
object but ‘+’ should create a new object.

That’s the way those two operators work in most classes where they’re
both defined. It’s not that surprising.

When one works with a
variable one shouldn’t need to worry about what other stuff is being
modified due to a unintentional reference?

Actually, keeping this straight is a fundamental part of programming in
Ruby. You can always use dup or clone to be safe.

What other operators are
doing this?

Ask the code, not us.

Best,

Marnen Laibow-Koser
http://www.marnen.org
[email protected]

hostep · November 24, 2009, 7:06pm

Pieter H. wrote:

Why was it decided that ‘<<’ should reference the original
object but ‘+’ should create a new object.

Perhaps because of the way C++ uses << with streams:

cout << “Hello world!”;

When one works with a
variable one shouldn’t need to worry about what other stuff is being
modified due to a unintentional reference?

That’s what mutable objects are all about. Yes, it certainly can be a
source of bugs if you don’t take care, but mutating state expected in
OOP.

If you don’t like this, you could always try Erlang instead…

What other operators are
doing this? I noticed that the ‘.pop’ array method was also modifying
the referenced array

Lots of operators modify the object. Look at ‘ri String’ and ‘ri Array’
and then check out each of the methods. Examples: String#replace,
String#upcase!, String#chomp!

Sometimes they come in pairs which modify in-place or create a new
object for you. For example, str.upcase is really just a shortcut for
str.dup.upcase!

hostep · November 25, 2009, 10:32am

2009/11/24 Pieter H. [email protected]:

Thanks for the quick responses and insights. While I understand the
issue now I must confess that it perturbs my ‘least surprise’
expectation. Why was it decided that ‘<<’ should reference the original
object but ‘+’ should create a new object.

<< does not always modify the receiver

irb(main):001:0> n = 1
=> 1
irb(main):002:0> m = n << 5
=> 32
irb(main):003:0> n.equal? m
=> false
irb(main):004:0>

The semantic of “appending” something to an object was indeed
inherited from C++ as Brian explained.

When one works with a
variable one shouldn’t need to worry about what other stuff is being
modified due to a unintentional reference?

OO is all about modifying state of objects. The state change is a
side effect of a method call - and this is intentional. That way you
can model how real world things behave. A bank account does not
change as the effect of you depositing 10 bucks. If it would, you
would get a new bank account number after every deposit or withdrawal.
For obvious reasons this is a bad idea. Objects work the same way:
you change their state but you do not get a new one every time. There
are of course special cases (such as Fixnum and Bignum in Ruby) where
objects are immutable. This does make sense as well (just think about
keys in a Hash, you would have to rehash all the time if key state
would change).

This means in turn that it is your task as a programmer in an OO
language to think about what classes and objects you need, what state
transitions of an object you want to reasonably allow and which not.
You create your application by picking deliberate choices. For some
classes someone else did that for you: that’s all the built in stuff
and the standard library code - actually, every library’s code that
you are using. In those cases you need to adjust your usage of that
code to the semantics and conventions determined by that code.

I worked around with

a = [1,2]
b = []
b += a
b.pop

Now b == [1] and a == [1,2], but its not as sleek. Is there maybe
another way?

Why do you want to have two copies of your Array? #push and #pop are
intended to allow usage of an Array as stack (AKA LIFO). If you want
to use an array as LIFO there is no point in creating two copies
during every push and pop operation because you want only a single
stack and you want everybody who sees the stack have the same idea of
it. Otherwise you would be able to pop the same top element off the
stack multiple times. That does not make sense.

If you only need to know what the last element is, you can use #last or
a[-1].

Kind regards

robert

hostep · November 26, 2009, 10:17am

Pieter H. wrote:

So I wrote the folder function:

def ancestors

For me the natural way to do this would be recursively, but an iterative
solution is of course fine, and probably more efficient anyway.

As for your problem of not modifying the ‘groups’ array you are looking
at, I’d suggest:

scanfolders = []
scanfolders.concat(ancestors)
...
scanfolders.concat(scanitem.groups)

This creates a new array (scanfolders), then mutates that array by
tacking on the contents of some other array, without modifying that
other array. The first two lines could also be scanfolders =
ancestors.dup of course.

Not that there’s anything really wrong with scanfolders += foo. It’s
just that if you’re going to be mutating this array (with pop), then you
might as well go for mutation across the board. It’s also more efficient
if this array becomes large, as there’s less array copying and object
creation going on.

scanfolders = [] #set up a stack to iterate through,
#looking for grandparents etc
scanfolders += ancestors #the stack starts with the current ancestors
if !scanfolders.nil? then

I observe that this condition is always true, since scanfolders is
never nil, so can be removed.

Other ways you can simplify your code:

while scanfolders.length > 0 do # while there are items on the stack

one option: until scanfolders.empty?

while scanfolders.length > 0 do # while there are items on the stack
  scanitem = scanfolders.pop # get the last one and reduce the stack
  if scanitem then

If your scanfolders never contains any nil items (and I don’t see why it
should), then all three lines could shrink to

while scanitem = scanfolders.pop

    if !scanitem.groups.nil? then #if this item has parents
                                        #add them to the stack

if scanitem.groups

     scanfolders += scanitem.groups
     scanfolders.uniq!
     ancestors += scanitem.groups #and record this item as an
                                        #ancestor
     ancestors.uniq!

The problem I see here is that you might already have scanned a folder
and popped it from scanfolders, then add it back again, therefore
scanning it extra times unnecessarily. I can’t convince myself whether
there is a possibility of ending up in an infinite loop.

One solution is to keep an explicit mark of folders you have traversed
(e.g. a hash of folder => true)

But since everything you’ve added to scanfolders has been added to
ancestors already, it may be sufficient to do:

scanfolders += scanitem.groups
scanfolders -= ancestors
ancestors += scanitem.groups

Or something like that. Actually I’m not sure if there’s a mutating
version of -=

Regards,

Brian.

hostep · November 26, 2009, 8:39am

Hi Robert (and everyone else) - thanks for the well reasoned responses.
I’ll get the hang of it I’m sure

Why do you want to have two copies of your Array?

I have a setup where I have Folder objects. A folder can have many other
children (members) as sub folders, but it can also have many other
folders as parents (groups it belongs to). I get this done via a
crosslink table. When creating a new parent-child relationship I need to
make sure that the child is not somehow an ancestor or the parent I am
trying to subordinate it to (as I need to avoid circular reference)

So I wrote the folder function:

def ancestors
ancestors = self.groups #all the immediate parents are obviously
ancestors
scanfolders = [] #set up a stack to iterate through,
#looking for grandparents etc
scanfolders += ancestors #the stack starts with the current ancestors
if !scanfolders.nil? then
while scanfolders.length > 0 do # while there are items on the stack
scanitem = scanfolders.pop # get the last one and reduce the stack
if scanitem then
if !scanitem.groups.nil? then #if this item has parents
#add them to the stack
scanfolders += scanitem.groups
scanfolders.uniq!
ancestors += scanitem.groups #and record this item as an
#ancestor
ancestors.uniq!
end
end
end
end
return ancestors
end

So - to answer the question - I need to arrays that are initially the
same (direct parents), But the one will eventually contain all ancestors
and the other will be empty after iterating through all ancestors and
testing them for further ancestors.

I should just replace

scanfolders = []
scanfolders += ancestors

with

scanfolders = ancestors.dup

Regards

Pieter

hostep · November 26, 2009, 11:17am

2009/11/26 Robert K. [email protected]:

As far as I can see you only need an inclusion check not the complete
list of ancestors. A simple iterative solution with a BFS could do
the job for you

Here’s a more modularized version:

require ‘set’

def ancestor_bfs
visited = Set.new
queue = [self]

until queue.empty?
n = queue.shift

if visited.add? n
yield n
queue.concat(n.ancestors)
end
end

false
end

def ancestor?(candidate)
ancestor_bfs {|n| return true if candidate == n}
false
end

or even without a method but still fast exit when the node is found:

folder.to_enum(:ancestor_bfs).any? {|n| candidate == n}

fully modularized

def bfs(next_meth, start = self)
visited = Set.new
queue = [start]

until queue.empty?
n = queue.shift

if visited.add? n
yield n
queue.concat(n.send(next_meth))
end
end

false
end

def bfs_ancestors(&b)
bfs(:ancestors, &b)
end

Now you can do

class Integer
def n; [self * 2, self * 2 + 1] end
end

irb(main):057:0> bfs(:n, 3) {|x| p x; break if x > 20}
3
6
7
12
13
14
15
24
=> nil
irb(main):058:0>

Cheers

robert

hostep · November 26, 2009, 10:28am

2009/11/26 Pieter H. [email protected]:

trying to subordinate it to (as I need to avoid circular reference)
while scanfolders.length > 0 do # while there are items on the stack
end
end
end
return ancestors
end

So - to answer the question - I need to arrays that are initially the
same (direct parents), But the one will eventually contain all ancestors
and the other will be empty after iterating through all ancestors and
testing them for further ancestors.

As far as I can see you only need an inclusion check not the complete
list of ancestors. A simple iterative solution with a BFS could do
the job for you

require ‘set’

def ancestor?(candidate)
visited = Set.new
queue = [self]

until queue.empty?
n = queue.shift

if visited.add? n
  return true if candidate == n
  queue.concat(n.ancestors)
end

end

false
end

Note: I prefer a BFS over a DFS in these cases because the stack depth
is far more limited than the memory:

10:26:50 ~$ ruby19 -e ‘def r(x) p x; r(x+1) end; r 0’ | tail -3
-e:1:in r': stack level too deep (SystemStackError) from -e:1:in r’
from -e:1:in r' from -e:1:in r’
from -e:1:in r' from -e:1:in r’
from -e:1:in r' from -e:1:in r’
from -e:1:in r' ... 8175 levels... from -e:1:in r’
from -e:1:in r' from -e:1:in r’
from -e:1:in <main>' 8184 8185 8186 10:27:04 ~$ ruby -e 'def r(x) p x; r(x+1) end; r 0' | tail -3 -e:1:in inspect’: stack level too deep (SystemStackError)
from -e:1:in p' from -e:1:in r’
from -e:1:in `r’
from -e:1
12623
12624
12625
10:27:08 ~$

Kind regards

robert