POLS and string-handling

Hi,

I have programmed in various languages previously, but am new to Ruby.

So far am very impressed with it; but there is one behaviour I find
quite alarming, which stems from the fact that Ruby treats strings as
objects rather than as primitives.

For instance:

a) Number

myNum1 = 5
myNum2 = myNum1

myNum2 = 3

=> myNum1 = 5
=> myNum2 = 3

which is what I would expect. However,

b) String

myString3 = “Fred Nerk”
myString4 = myString3

myString4[0,4] = “Bert”

=> myString3 = “Bert Nerk”
=> myString4 = “Bert Nerk”

myString3 has been “corrupted”, presumably because setting myString4 to
it actually set myString4’s pointer, not its value, in the standard OO
fashion.

But

c) String literal

myString1 = “Fred Bloggs”
myString2 = myString1

myString2[0,4] = “Bert”

puts "Fred Bloggs = " + “Fred Bloggs”
puts "myString2 = " + myString2

(Output)

=> Fred Bloggs = Fred Bloggs
=> myString2 = Bert Bloggs

Has the “Fred Bloggs” literal not been corrupted? Or did puts just use
another (uncorrupted) instance of it?

So let’s make the first string a constant. That produces the expected
behaviour in this case:

d) String constant 1

MyString5 = “Fred Potts”
MyString5 = “John Potts”

=> warning: already initialized constant MyString5

but not in this one: the constant gets “corrupted”.

e) String constant 2

MyString6 = “Fred Winterbotham”
myString7 = MyString6

myString7[0,4] = “Bert”

=> MyString6 = “Bert Winterbotham”
=> myString7 = “Bert Winterbotham”

So how to get around this? The following appears to do it:

f) String constant 3

MyString8 = “Fred Shufflebotham”
myString9 = MyString8.clone

myString9[0,4] = “Bert”

=> MyString8 = “Fred Shufflebotham”
=> myString9 = “Bert Shufflebotham”

but doesn’t it cause a memory leak?

Sorry if this question is too elementary.

On Sat, Dec 22, 2012 at 5:15 PM, Paul M. [email protected]
wrote:

a) Number

myString3 has been “corrupted”, presumably because setting myString4 to
it actually set myString4’s pointer, not its value, in the standard OO
fashion.

In Ruby parlance, it’s a reference; but essentially correct.

puts "Fred Bloggs = " + “Fred Bloggs”
puts "myString2 = " + myString2

(Output)

=> Fred Bloggs = Fred Bloggs
=> myString2 = Bert Bloggs

Has the “Fred Bloggs” literal not been corrupted? Or did puts just use
another (uncorrupted) instance of it?

A literal can’t be changed. Conceptually at least, every time you have
“Fred
Bloggs” in quotes, you’re referring to a different object.

=> myString7 = “Bert Winterbotham”
Yes. Note that a “constant” in Ruby is something that will trigger a
warning
if it’s modified, but it’s still possible to modify it. A reference to a
constant works the same as a reference to a non-constant object.

=> MyString8 = “Fred Shufflebotham”
=> myString9 = “Bert Shufflebotham”

but doesn’t it cause a memory leak?

Well, the memory used by those objects will stay allocated as long as
the
garbage collector can reach them somehow, i.e. there are variables that
point
to them or they belong to a collection like an array or hash.

On Sat, Dec 22, 2012 at 5:15 PM, Paul M. [email protected]
wrote:

So far am very impressed with it; but there is one behaviour I find
quite alarming, which stems from the fact that Ruby treats strings as
objects rather than as primitives.

Better get used to it. Ruby treats everything as an object, or
perhaps even more literally, a reference.

On Sat, Dec 22, 2012 at 3:15 PM, Paul M. [email protected]
wrote:

myString3 has been “corrupted”, presumably because setting myString4 to
it actually set myString4’s pointer, not its value, in the standard OO
fashion.

There’s a subtle point here:

myString3 = “Fred Nerk”

= is a built in piece of syntax to bind a variable to an object.
Variables are not themselves objects, they are transparent references
to objects. The only time a variable cannot be transparently replaced
by the object it refers to is when it’s on the left hand side of an
equal sign. So here, “Fred Nerk” uses the string literal to create the
string object #<String:0x01234567 “Fred Nerk”> [that is, an object
with type String, object_id 0x01234567 and value “Fred Nerk”] on the
heap, and binds the variable myString3 to it.

myString4 = myString3

Here, myString3 is transparently replaced by the object it refers to,
#<String:0x01234567 “Fred Nerk”>, and myString4 is bound to the same
object

myString4[0,4] = “Bert”

This is the subtle bit. Despite the syntactic sugar, this is not an
= sign. There is no variable binding = involved here, it is just ruby
syntax sugar that gets rewritten to myString.[]=(0, 4, “Bert”). That
is, it calls the “[]=” method on the string object, passing it values
(0, 4, “Bert”). Again, since myString4 is not on the left hand side of
an =, it gets transparently replaced by #<String:0x01234567 “Fred
Nerk”>, which then gets sent the message []= with arguments (0, 4,
“Bert”), and obligingly updates its value. So our object is now
#<String:0x01234567 “Bert Nerk”> (note that the object hasn’t changed,
just its value).

=> myString3 = “Bert Nerk”
=> myString4 = “Bert Nerk”

Again, these are both transparently replaced by the object they refer
to, now #<String:0x01234567 “Bert Nerk”>

But

c) String literal

myString1 = “Fred Bloggs”

Creates #<String:0x98765432 “Fred Bloggs”> on the heap, binds myString1
to it.

myString2 = myString1

Binds myString2 to #<String:0x98765432 “Fred Bloggs”>

myString2[0,4] = “Bert”

Sends []=, (0, 4, “Bert”) to #<String:0x98765432 “Fred Bloggs”>, which
updates itself to #<String:0x98765432 “Bert Bloggs”>

puts "Fred Bloggs = " + “Fred Bloggs”

Creates two new string object, #<String:0x00001111 "Fred Bloggs = ">
and #<String:0x00001112 “Fred Bloggs”> and passes the second one as an
argument to the + method of the first, which returns yet another
string object, #<String:0x00001113 “Fred Bloggs = Fred Bloggs”> which
it passes to “puts” which prints it out.

puts "myString2 = " + myString2

Creates one new string object, #<String:0x00001114 "myString2 = ">,
and calls its + method with #<String:0x98765432 “Bert Bloggs”> (the
transparent replacement for myString2) as an argument. This creates
yet another string object, #<String:0x00001115 “myString2 = Bert
Bloggs”>, which gets passed to puts and printed out.

[Note that all the string objects that got created but never had
variables bound to them are temporary objects that the garbage
collector will take care of at some point]

So how to get around this? The following appears to do it:

f) String constant 3

MyString8 = “Fred Shufflebotham”
myString9 = MyString8.clone

Clone creates a new string object, and sets its value equal to that of
the first one. = then binds myString9 to this new object.

myString9[0,4] = “Bert”

the []=, 0, 4, “Bert” message is getting sent to the new object

=> MyString8 = “Fred Shufflebotham”

myString8 is still bound to the first object, which never got sent a
message.

=> myString9 = “Bert Shufflebotham”

myString9 is still bound to the new object, which did get sent the
[]= message and updated its value

but doesn’t it cause a memory leak?

No, the garbage collector takes care of it.

martin

x = “hello”
y = “hello”

puts x.object_id
puts y.object_id
puts “hello”.object_id

–output:–
2151871380
2151871280 #Not the same as the previous id
2151871220

Quote marks are a String object constructor in ruby.

x[0] = “Y”
puts x
puts y

–output:–
Yello
hello

Paul M. wrote in post #1089966:

Hi,

I have programmed in various languages previously, but am new to Ruby.

So far am very impressed with it; but there is one behaviour I find
quite alarming, which stems from the fact that Ruby treats strings as
objects rather than as primitives.

Hi,

Well, you called strings as “primitives”; did you use JavaScript? :slight_smile:

Anyway, in Python the strings are indeed immutable; but not so in Ruby.
That’s why you got all the results.

Regarding why in Ruby the strings are mutable (and with all the
consequences), I will let somebody else explain it.

Regards,

Bill

Paul M. wrote in post #1089966:

Ruby treats strings as
objects rather than as primitives.

Check this out:

result = 9.5426.round 3
puts result

–output:–
9.543

Thanks for all the replies. I notice also that I can force changing of
the value (as opposed to the reference) by substituting a trivial
expression for the right-hand side of the assignment, e.g.

g) Expression

myStringA = “Fred Shufflebotham”
myStringB = myStringA + “”
myStringB[0,4] = “Bert”

=> myStringA = “Fred Shufflebotham”
=> myStringB = “Bert Shufflebotham”

But of course it’s an utter kludge. Is there really no more elegant
way?

On Dec 24, 2012, at 12:55 AM, Paul M. [email protected]
wrote:

=> myStringA = “Fred Shufflebotham”
=> myStringB = “Bert Shufflebotham”

But of course it’s an utter kludge. Is there really no more elegant
way?

Sure, either use a Method that doesn’t mutate the string but returns a
new one instead, like #sub and #gsub. e.g.:

myStringA.sub(“Fred”, “Bert”)
myStringA.sub(/.{4}/, “Bert”)

Or properly clone the string before mutating it:

myStringB = myStringA.clone
myStringB[0,4] = “Bert”

Regards,
Florian

Paul M. wrote in post #1090050:

Thanks for all the replies. I notice also that I can force changing of
the value (as opposed to the reference) by substituting a trivial
expression for the right-hand side of the assignment

Incorrect.

x = “hello”
y = x + “”

puts x.object_id
puts y.object_id

–output:–
2152313980
2152313940

The + operator up there is the name of a String method in ruby:

x = “hello”
y = x.+(“”)

puts x.object_id
puts y.object_id

–output:–
2152313980
2152313940

You are going to have to get used to the fact that:

  1. Strings are mutable in ruby.
  2. Some methods in the String class mutate their “receiver”(i.e. the
    object that called the method), and others methods in the String class
    return a new String object. If you are not sure what a method returns,
    then check the docs:

Writing something like the following to create a new String object:

y = x + “”

works, but it is code obfuscation. Ruby methods usually
have names that are descriptive and alert the reader what they do–use
them.

On Sun, Dec 23, 2012 at 7:40 AM, 7stud – [email protected] wrote:

Quote marks are a String object constructor in ruby.
Maybe a bit more illustrative: executing the same string literal
results in multiple different instances:

irb(main):001:0> 4.times { puts “foo”.object_id }
73580460
73580430
73580410
73580370
=> 4

Kind regards

robert

On Mon, Dec 24, 2012 at 12:55 AM, Paul M. [email protected]
wrote:

=> myStringA = “Fred Shufflebotham”
=> myStringB = “Bert Shufflebotham”

But of course it’s an utter kludge. Is there really no more elegant
way?

my_string_a = “Fred Shufflebotham”
my_string_b = my_string_a.dup

Note that in Ruby naming convention of local variables and method
names is not CamelCase but snake_case.

Paul, what you should take away from this discussion (I’ll try to
summarize what other’s have said already):

  • All variables hold references to objects.*
  • Assignment copies an object reference and stores it in a variable.
  • String literals are really object constructors, i.e. they create a
    new object whenever evaluated. (Don’t worry, behind the scenes this is
    made efficient.)
  • There are immutable classes (most numeric classes, nil,
    TrueClass…) and mutable classes (all others including String).
  • Arithmetic operators return a reference to a new instance in order
    to make math work properly (a + b + a would return wrong results if
    the first + changed state of a and returned a reference to the mutated
    a).
  • Note that this is not completely true in terms of the
    implementation of MRI but it is true from the perspective of the
    language user.

Kind regards

robert

Wow, everbody has been so kind and helpful to a newbie. I shall save
all this stuff off. Meantime, thank you all and Merry Christmas!

Robert K. wrote in post #1090092:

  • All variables hold references to objects.*

And this is a huge breath of fresh air compared to, say, Perl, where
arrays and arrayrefs are two different types of value, similarly hashes
and hashrefs, and a whole bunch of other special cases.

In ruby, all values are references to objects. Even integers.

a = -3
=> -3

a.to_s
=> “-3”

a.abs
=> 3

So consistently:

  • everything is pass-by-value
  • every value is a reference to an object

But as you have discovered, many objects are mutable, including strings.