Forum: Ruby POLS and string-handling

C32e42eed659b8b34206d27f0dd63791?d=identicon&s=25 Paul Magnussen (majjick)
on 2012-12-23 00:15
Hi,

I have programmed in various languages previously, but am new to Ruby.

So far am very impressed with it; but there is one behaviour I find
quite alarming, which stems from the fact that Ruby treats strings as
objects rather than as primitives.

For instance:

# a) Number

myNum1 = 5
myNum2 = myNum1

myNum2 = 3

=> myNum1 = 5
=> myNum2 = 3

which is what I would expect.  However,

# b) String

myString3 = "Fred Nerk"
myString4 = myString3

myString4[0,4] = "Bert"

=> myString3 = "Bert Nerk"
=> myString4 = "Bert Nerk"

myString3 has been "corrupted", presumably because setting myString4 to
it actually set myString4's pointer, not its value, in the standard OO
fashion.

But

# c) String literal

myString1 = "Fred Bloggs"
myString2 = myString1

myString2[0,4] = "Bert"

puts "Fred Bloggs = " + "Fred Bloggs"
puts "myString2 = " + myString2

(Output)

=> Fred Bloggs = Fred Bloggs
=> myString2 = Bert Bloggs

Has the "Fred Bloggs" literal not been corrupted?  Or did puts just use
another (uncorrupted) instance of it?

So let's make the first string a constant.  That produces the expected
behaviour in this case:

# d) String constant 1

MyString5 = "Fred Potts"
MyString5 = "John Potts"

=> warning: already initialized constant MyString5

but not in this one:  the constant gets "corrupted".

# e) String constant 2

MyString6 = "Fred Winterbotham"
myString7 = MyString6

myString7[0,4] = "Bert"

=> MyString6 = "Bert Winterbotham"
=> myString7 = "Bert Winterbotham"

So how to get around this?  The following appears to do it:

# f) String constant 3

MyString8 = "Fred Shufflebotham"
myString9 = MyString8.clone

myString9[0,4] = "Bert"

=> MyString8 = "Fred Shufflebotham"
=> myString9 = "Bert Shufflebotham"

but doesn't it cause a memory leak?

Sorry if this question is too elementary.
Abdb670e1c130f96f947a94d03c02efa?d=identicon&s=25 Eric Christopherson (echristopherson)
on 2012-12-23 02:50
(Received via mailing list)
On Sat, Dec 22, 2012 at 5:15 PM, Paul Magnussen <lists@ruby-forum.com>
wrote:
> # a) Number
>
> myString3 has been "corrupted", presumably because setting myString4 to
> it actually set myString4's pointer, not its value, in the standard OO
> fashion.

In Ruby parlance, it's a reference; but essentially correct.

> puts "Fred Bloggs = " + "Fred Bloggs"
> puts "myString2 = " + myString2
>
> (Output)
>
> => Fred Bloggs = Fred Bloggs
> => myString2 = Bert Bloggs
>
> Has the "Fred Bloggs" literal not been corrupted?  Or did puts just use
> another (uncorrupted) instance of it?

A literal can't be changed. Conceptually at least, every time you have
"Fred
Bloggs" in quotes, you're referring to a different object.

>
> => myString7 = "Bert Winterbotham"
Yes. Note that a "constant" in Ruby is something that will trigger a
warning
if it's modified, but it's still possible to modify it. A reference to a
constant works the same as a reference to a non-constant object.

> => MyString8 = "Fred Shufflebotham"
> => myString9 = "Bert Shufflebotham"
>
> but doesn't it cause a memory leak?

Well, the memory used by those objects will stay allocated as long as
the
garbage collector can reach them somehow, i.e. there are variables that
point
to them or they belong to a collection like an array or hash.
Aa082c8b00a50928e5860dcd70bf2368?d=identicon&s=25 tamouse mailing lists (Guest)
on 2012-12-23 04:27
(Received via mailing list)
On Sat, Dec 22, 2012 at 5:15 PM, Paul Magnussen <lists@ruby-forum.com>
wrote:
> So far am very impressed with it; but there is one behaviour I find
> quite alarming, which stems from the fact that Ruby treats strings as
> objects rather than as primitives.

Better get used to it. Ruby treats *everything* as an object, or
perhaps even more literally, a reference.
Ae16cb4f6d78e485b04ce1e821592ae5?d=identicon&s=25 Martin DeMello (martin_d)
on 2012-12-23 04:38
(Received via mailing list)
On Sat, Dec 22, 2012 at 3:15 PM, Paul Magnussen <lists@ruby-forum.com>
wrote:
> myString3 has been "corrupted", presumably because setting myString4 to
> it actually set myString4's pointer, not its value, in the standard OO
> fashion.

There's a subtle point here:

> myString3 = "Fred Nerk"

= is a built in piece of syntax to bind a variable to an object.
Variables are not themselves objects, they are transparent references
to objects. The only time a variable cannot be transparently replaced
by the object it refers to is when it's on the left hand side of an
equal sign. So here, "Fred Nerk" uses the string literal to create the
string object #<String:0x01234567 "Fred Nerk"> [that is, an object
with type String, object_id 0x01234567 and value "Fred Nerk"] on the
heap, and binds the variable myString3 to it.

> myString4 = myString3

Here, myString3 is transparently replaced by the object it refers to,
#<String:0x01234567 "Fred Nerk">, and myString4 is bound to the same
object

> myString4[0,4] = "Bert"

This is the subtle bit. Despite the syntactic sugar, this is *not* an
= sign. There is no variable binding = involved here, it is just ruby
syntax sugar that gets rewritten to myString.[]=(0, 4, "Bert"). That
is, it calls the "[]=" method on the string object, passing it values
(0, 4, "Bert"). Again, since myString4 is not on the left hand side of
an =, it gets transparently replaced by #<String:0x01234567 "Fred
Nerk">, which then gets sent the message []= with arguments (0, 4,
"Bert"), and obligingly updates its value. So our object is now
#<String:0x01234567 "Bert Nerk"> (note that the object hasn't changed,
just its value).

> => myString3 = "Bert Nerk"
> => myString4 = "Bert Nerk"

Again, these are both transparently replaced by the object they refer
to, now #<String:0x01234567 "Bert Nerk">

> But
>
> # c) String literal
>
> myString1 = "Fred Bloggs"

Creates #<String:0x98765432 "Fred Bloggs"> on the heap, binds myString1
to it.

> myString2 = myString1

Binds myString2 to #<String:0x98765432 "Fred Bloggs">

> myString2[0,4] = "Bert"

Sends []=, (0, 4, "Bert") to #<String:0x98765432 "Fred Bloggs">, which
updates itself to  #<String:0x98765432 "Bert Bloggs">

> puts "Fred Bloggs = " + "Fred Bloggs"

Creates two *new* string object, #<String:0x00001111 "Fred Bloggs = ">
and #<String:0x00001112 "Fred Bloggs"> and passes the second one as an
argument to the + method of the first, which returns yet another
string object, #<String:0x00001113 "Fred Bloggs = Fred Bloggs"> which
it passes to "puts" which prints it out.

> puts "myString2 = " + myString2

Creates *one* new string object, #<String:0x00001114 "myString2 = ">,
and calls its + method with #<String:0x98765432 "Bert Bloggs"> (the
transparent replacement for myString2) as an argument. This creates
yet another string object, #<String:0x00001115 "myString2 = Bert
Bloggs">, which gets passed to puts and printed out.

[Note that all the string objects that got created but never had
variables bound to them are temporary objects that the garbage
collector will take care of at some point]

> So how to get around this?  The following appears to do it:
>
> # f) String constant 3
>
> MyString8 = "Fred Shufflebotham"
> myString9 = MyString8.clone

Clone creates a new string object, and sets its value equal to that of
the first one. = then binds myString9 to this new object.

> myString9[0,4] = "Bert"

the []=, 0, 4, "Bert" message is getting sent to the new object

> => MyString8 = "Fred Shufflebotham"

myString8 is still bound to the first object, which never got sent a
message.

> => myString9 = "Bert Shufflebotham"

myString9 is still bound to the new object, which *did* get sent the
[]= message and updated its value

> but doesn't it cause a memory leak?

No, the garbage collector takes care of it.

martin
359432c3997195e0107cbad2811c6c35?d=identicon&s=25 Admin Tensor (tensor)
on 2012-12-23 05:04
Paul Magnussen wrote in post #1089966:
> Hi,
>
> I have programmed in various languages previously, but am new to Ruby.
>
> So far am very impressed with it; but there is one behaviour I find
> quite alarming, which stems from the fact that Ruby treats strings as
> objects rather than as primitives.

Hi,

Well, you called strings as "primitives"; did you use JavaScript? :)

Anyway, in Python the strings are indeed immutable; but not so in Ruby.
That's why you got all the results.

Regarding why in Ruby the strings are mutable (and with all the
consequences), I will let somebody else explain it.

Regards,

Bill
54404bcac0f45bf1c8e8b827cd9bb709?d=identicon&s=25 7stud -- (7stud)
on 2012-12-23 07:40
x = "hello"
y = "hello"

puts x.object_id
puts y.object_id
puts "hello".object_id

--output:--
2151871380
2151871280   #Not the same as the previous id
2151871220

Quote marks are a String object constructor in ruby.


x[0] = "Y"
puts x
puts y

--output:--
Yello
hello
54404bcac0f45bf1c8e8b827cd9bb709?d=identicon&s=25 7stud -- (7stud)
on 2012-12-23 08:00
Paul Magnussen wrote in post #1089966:
>
> Ruby treats strings as
> objects rather than as primitives.
>

Check this out:

result = 9.5426.round 3
puts result

--output:--
9.543
C32e42eed659b8b34206d27f0dd63791?d=identicon&s=25 Paul Magnussen (majjick)
on 2012-12-24 00:54
Thanks for all the replies.  I notice also that I can force changing of
the value (as opposed to the reference) by substituting a trivial
expression for the right-hand side of the assignment, e.g.

# g) Expression

myStringA = "Fred Shufflebotham"
myStringB = myStringA + ""
myStringB[0,4] = "Bert"

=> myStringA = "Fred Shufflebotham"
=> myStringB = "Bert Shufflebotham"

But of course it's an utter kludge.  Is there really no more elegant
way?
D1f1c20467562fc1d8c8aa0d328def62?d=identicon&s=25 Florian Gilcher (skade)
on 2012-12-24 01:17
(Received via mailing list)
On Dec 24, 2012, at 12:55 AM, Paul Magnussen <lists@ruby-forum.com>
wrote:

> => myStringA = "Fred Shufflebotham"
> => myStringB = "Bert Shufflebotham"
>
> But of course it's an utter kludge.  Is there really no more elegant
> way?

Sure, either use a Method that doesn't mutate the string but returns a
new one instead, like #sub and #gsub. e.g.:

  myStringA.sub("Fred", "Bert")
  myStringA.sub(/.{4}/, "Bert")

Or properly clone the string before mutating it:

  myStringB = myStringA.clone
  myStringB[0,4] = "Bert"

Regards,
Florian
54404bcac0f45bf1c8e8b827cd9bb709?d=identicon&s=25 7stud -- (7stud)
on 2012-12-24 11:19
Paul Magnussen wrote in post #1090050:
> Thanks for all the replies.  I notice also that I can force changing of
> the value (as opposed to the reference) by substituting a trivial
> expression for the right-hand side of the assignment
>

Incorrect.

x = "hello"
y = x + ""

puts x.object_id
puts y.object_id

--output:--
2152313980
2152313940


The + operator up there is the name of a String method in ruby:

x = "hello"
y = x.+("")

puts x.object_id
puts y.object_id

--output:--
2152313980
2152313940

You are going to have to get used to the fact that:

1) Strings are mutable in ruby.
2) Some methods in the String class mutate their "receiver"(i.e. the
object that called the method), and others methods in the String class
return a new String object.  If you are not sure what a method returns,
then check the docs:

http://www.ruby-doc.org/core-1.9.3/String.html#method-i-2B

Writing something like the following to create a new String object:

y = x + ""

works, but it is code obfuscation.  Ruby methods usually
have names that are descriptive and alert the reader what they do--use
them.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (robert_k78)
on 2012-12-24 13:02
(Received via mailing list)
On Sun, Dec 23, 2012 at 7:40 AM, 7stud -- <lists@ruby-forum.com> wrote:
> Quote marks are a String object constructor in ruby.
Maybe a bit more illustrative: executing the _same_ string literal
results in multiple different instances:

irb(main):001:0> 4.times { puts "foo".object_id }
73580460
73580430
73580410
73580370
=> 4

Kind regards

robert
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (robert_k78)
on 2012-12-24 13:13
(Received via mailing list)
On Mon, Dec 24, 2012 at 12:55 AM, Paul Magnussen <lists@ruby-forum.com>
wrote:
> => myStringA = "Fred Shufflebotham"
> => myStringB = "Bert Shufflebotham"
>
> But of course it's an utter kludge.  Is there really no more elegant
> way?

my_string_a = "Fred Shufflebotham"
my_string_b = my_string_a.dup

Note that in Ruby naming convention of local variables and method
names is not CamelCase but snake_case.

Paul, what you should take away from this discussion (I'll try to
summarize what other's have said already):

 - All variables hold _references_ to objects.*
 - Assignment copies an object reference and stores it in a variable.
 - String literals are really object constructors, i.e. they create a
new object whenever evaluated. (Don't worry, behind the scenes this is
made efficient.)
 - There are immutable classes (most numeric classes, nil,
TrueClass...) and mutable classes (all others including String).
 - Arithmetic operators return a reference to a new instance in order
to make math work properly (a + b + a would return wrong results if
the first + changed state of a and returned a reference to the mutated
a).

* Note that this is not completely true in terms of the
_implementation_ of MRI but it is true from the perspective of the
_language user_.

Kind regards

robert
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2012-12-24 14:01
Robert Klemme wrote in post #1090092:
>  - All variables hold _references_ to objects.*

And this is a huge breath of fresh air compared to, say, Perl, where
arrays and arrayrefs are two different types of value, similarly hashes
and hashrefs, and a whole bunch of other special cases.

In ruby, *all* values are references to objects. Even integers.

>> a = -3
=> -3
>> a.to_s
=> "-3"
>> a.abs
=> 3

So consistently:

- everything is pass-by-value
- every value is a reference to an object

But as you have discovered, many objects are mutable, including strings.
C32e42eed659b8b34206d27f0dd63791?d=identicon&s=25 Paul Magnussen (majjick)
on 2012-12-24 15:43
Wow, everbody has been so kind and helpful to a newbie.  I shall save
all this stuff off.  Meantime, thank you all and Merry Christmas!
Please log in before posting. Registration is free and takes only a minute.
Existing account

NEW: Do you have a Google/GoogleMail, Yahoo or Facebook account? No registration required!
Log in with Google account | Log in with Yahoo account | Log in with Facebook account
No account? Register here.