String doesnt auto dup on modification

On 22.01.2009 15:11, RK Sentinel wrote:

As and when i discover such bugs in my code, I start adding dup(), and
yes sometimes these lines bomb when another datatype is passed (I had
asked this in a thread recently: respond_to? dup was passing, but the
dup was failing).

Anyway, i realize its more my incompetence, and i must be careful with
destructive methods, but I just thought maybe there’s some other way to
do this, so i am not leaving it to my memory.

I would not call this “incompetence”: we live and learn. Basically this
is a typical trade off issue: you trade efficiency (no copy) for safety
(no aliasing). As often with trade offs there is no clear 100% rule
which exactly tells you what is always correct. Instead you have to
think about it - when creating something like a library even more so -
and then deliberately decide which way you go.

Kind regards

robert

On Wed, Jan 21, 2009 at 8:15 PM, RK Sentinel [email protected]
wrote:

But would like to know how others handle this generally in large apps.

  • Do you keep freezing Strings you make in your classes to avoid
    accidental change

  • Do you habitually dup() your string ?
    I try to, and I try to get rid of all references to the original
    string as soon as possible.
    This is because incremental GC works so well nowadays and allows for
    some clean code.
    Freezing a string seems like a good idea sometimes, but if that means
    holding on to the object longer than needed this might not be such a
    good idea after all.

R.

Robert D. wrote:

aware of freeze().
This is because incremental GC works so well nowadays and allows for
some clean code.
Freezing a string seems like a good idea sometimes, but if that means
holding on to the object longer than needed this might not be such a
good idea after all.

R.

Robert, for those of us who are considerably more clueless, what is
“incremental GC”?

Thanks,

t.

Tom C., MS MA, LMHC - Private practice Psychotherapist
Bellingham, Washington, U.S.A: (360) 920-1226
<< [email protected] >> (email)
<< TomCloyd.com >> (website)
<< sleightmind.wordpress.com >> (mental health weblog)
  • Do you habitually dup() your string ?
    I try to, and I try to get rid of all references to the original
    string as soon as possible.
    This is because incremental GC works so well nowadays and allows for
    some clean code.
    Freezing a string seems like a good idea sometimes, but if that means
    holding on to the object longer than needed this might not be such a
    good idea after all.

R.

Interesting point. So freezing a string prevents collection as long as
there are referers (obvious), but duping it helps release the original
one, but you still have a new string in memory. So net you still are
taking the same memory.

Is there a writeup on Ruby GC collection, my knowledge of GC is java
based, and it is 5 years old (based on the Inside the VM book and
various other articles on sun.com). Is ruby’s GC “generational” ? In
which iirc, an older object would have moved to an older generation and
be less likely to be collected.

Any links to ruby’s GC would be appreciated.

On 22.01.2009 12:23, Brian C. wrote:

RK Sentinel wrote:

I guess I’ll just have to remember not to modify a string I take from
another class.

Or as was said earlier on: simply don’t use destructive string methods
unless you really have to. Pretend that all strings are frozen.

You can enforce this easily enough in your unit tests: e.g. you could
pass in strings that really are frozen, and check that your code still
works :slight_smile:

Well, this is only half of the story: it does not save you from outside
code changing the instance under your hands. Aliasing works both ways,
i.e. you can screw up the receiver but also the caller. :slight_smile:

Cheers

robert

RK Sentinel wrote:

  • Do you keep freezing Strings you make in your classes to avoid
    accidental change

  • Do you habitually dup() your string ?

One possibility is copy-on-write.

require ‘delegate’

class CopyOnWriteString < DelegateClass(String)
DESTRUCTIVE_METHODS =
String.public_instance_methods(false).grep(/!/).map(&:to_sym) +
[
:[]=,
:<<,
:concat,
:initialize_copy,
:replace,
:setbyte,
# … and probably others …
]

DESTRUCTIVE_METHODS.each { |m|
define_method(m) { |*args, &block|
setobj(getobj.dup)
getobj.send(m, *args, &block)
}
}
end

class Person
def initialize(name)
@name = name
end
def name
CopyOnWriteString.new(@name)
end
end

person = Person.new(“fred”)
name = person.name

p name #=> “fred”
p person.name #=> “fred”

name << " flintstone"
p name #=> “fred flintstone”
p person.name #=> “fred”

(I’ve used some 1.8.7+ only features.)

Brian C. wrote:

The overhead of a single copy is small, and in any case this is probably
what is needed here (e.g. if the user makes some edits but clicks
‘cancel’ instead of ‘save’ then you may want to keep the old string
untouched)

You could try deferring the dup until the first time you call a
destructive method on the string, but the complexity overhead is
unlikely to be worth it.

Yes, I’ve got the set_buffer doing a dup (if its a string).

At the same time, the get_buffer also does a dup, since often the Field
is created blank (i did mention that set_buffer is an optional method
for editing a default value, if present).

Its a real TextField or Field. So you would be typing away in the field.
Each character you type is inserted in (or removed if its del or BS) -
exactly as I am typing away in this editbox.

The CopyOnWriteString a impressive, shows what all can be done with
Ruby.

RK Sentinel wrote:

Thanks for all the helpful replies. Its my first venture: a widget
library.

Here’s an example: Sometimes a string is passed to a class, say, using
its set_buffer method (which is an optional method).

set_buffer just assigns it to @buffer. But deep within the class this
variable is being edited using insert() or slice!() (and this IS
necessary) since the widget is an editing widget.

Thanks, so I guess the API is something like this:

edit_field.buffer = “foo”

… some time later, after user has clicked OK …

f.write(edit_field.buffer)

Now, if there is a compelling reason for this object to perform
“in-place” editing on the buffer then by all means do, and document
this, but it will lead to the aliasing problems you describe.

It may be simpler and safer just to use non-destructive methods inside
your class.

destructive

@buffer.slice!(x,y)

non-destructive alternative

@buffer = @buffer.slice(x,y)

destructive

@buffer.insert(pos, text)

non-destructive alternative

@buffer = buffer[0,pos] + text + buffer[pos…-1]

In effect, this is doing a ‘dup’ each time. It has to; since Ruby
doesn’t do reference-counting it has no idea whether any other object in
the system is holding a reference to the original object or not.

The only problem with this is if @buffer is a multi-megabyte object and
you don’t want to keep copying it. In this case, doing a single dup
up-front would allow you to use the destructive methods safely.

class Editor
def buffer
@buffer
end
def buffer=(x)
@buffer = x.dup
end
end

The overhead of a single copy is small, and in any case this is probably
what is needed here (e.g. if the user makes some edits but clicks
‘cancel’ instead of ‘save’ then you may want to keep the old string
untouched)

You could try deferring the dup until the first time you call a
destructive method on the string, but the complexity overhead is
unlikely to be worth it.

Brian C. wrote:

You could try deferring the dup until the first time you call a
destructive method on the string, but the complexity overhead is
unlikely to be worth it.

Careful. Intuition is worse than useless here. The only way to know is
to measure the particular case in question.

class Person
def initialize(name)
@name = name
end
def name_cow
CopyOnWriteString.new(@name)
end
def name_dup
@name.dup
end
end

require ‘benchmark’

n = 10_000
sizes = [100, 1000, 10_000, 100_000]
objects = sizes.inject(Hash.new) { |acc, size|
acc.merge!(size => Person.new(“x”*size))
}

sizes.each { |size|
object = objects[size]
puts “-”*40
puts “iterations: #{n} size: #{size}”
Benchmark.bm { |x|
x.report(“cow w/o change”) {
n.times { object.name_cow }
}
x.report(“dup w/o change”) {
n.times { object.name_dup }
}
x.report(“cow w/ change”) {
n.times { object.name_cow << “y” }
}
x.report(“dup w/ change”) {
n.times { object.name_dup << “y” }
}
}
}

ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]

iterations: 10000 size: 100
user system total real
cow w/o change 0.031000 0.000000 0.031000 ( 0.031000)
dup w/o change 0.032000 0.000000 0.032000 ( 0.031000)
cow w/ change 0.171000 0.000000 0.171000 ( 0.172000)
dup w/ change 0.047000 0.000000 0.047000 ( 0.047000)

iterations: 10000 size: 1000
user system total real
cow w/o change 0.032000 0.000000 0.032000 ( 0.031000)
dup w/o change 0.046000 0.000000 0.046000 ( 0.047000)
cow w/ change 0.172000 0.000000 0.172000 ( 0.172000)
dup w/ change 0.063000 0.000000 0.063000 ( 0.062000)

iterations: 10000 size: 10000
user system total real
cow w/o change 0.031000 0.000000 0.031000 ( 0.032000)
dup w/o change 0.109000 0.000000 0.109000 ( 0.109000)
cow w/ change 0.282000 0.000000 0.282000 ( 0.281000)
dup w/ change 0.156000 0.000000 0.156000 ( 0.156000)

iterations: 10000 size: 100000
user system total real
cow w/o change 0.031000 0.000000 0.031000 ( 0.032000)
dup w/o change 0.672000 0.000000 0.672000 ( 0.672000)
cow w/ change 1.406000 0.000000 1.406000 ( 1.406000)
dup w/ change 1.219000 0.000000 1.219000 ( 1.219000)

Destructive methods are less common in real code, and especially so when
the string comes from a attr_reader method. It is likely that the case
to optimize is the non-destructive call (the first of each quadruplet
above). But we have to profile the specific situation.

Basically yes.
But one has to be careful, as we somehow have the instinct not to
create lots of short time objects.

To Tom, sorry for the sloppy abbreaviation, but it means incremental
Garbage Collector, applying different strategies of collection
depending on object age.
If you have 45m to spare, there was a most interesting talk at
Rubytalk by Glenn Vanderbourg, have a look by all means:
http://rubyconf2008.confreaks.com/how-ruby-can-be-fast.html

Cheers
Robert

String.public_instance_methods(false).grep(/!/).map(&:to_sym) +

Is that a new feature? make the to_sym method act as block. Which to_sym
method?

DESTRUCTIVE_METHODS.each { |m|
define_method(m) { |*args, &block|
setobj(getobj.dup)
getobj.send(m, *args, &block)

WTF? What’s this [gs]etobj about?

(I’ve used some 1.8.7+ only features.)

mfg, simon … tia

Simon K. wrote:

String.public_instance_methods(false).grep(/!/).map(&:to_sym) +

Is that a new feature?

Symbol#to_proc is new in ruby core since 1.8.7+. It had been previously
defined by ActiveSupport and, I think, facets.

make the to_sym method act as block. Which to_sym method?

The to_sym method of each item in the array. symbol_to_proc is defined
somewhat like:
lambda {|x, *args| x.send(self, *args)}
So foo.map(&:to_sym) behaves like foo.map {|x| x.send(:to_sym)}

DESTRUCTIVE_METHODS.each { |m|
define_method(m) { |*args, &block|
setobj(getobj.dup)
getobj.send(m, *args, &block)

WTF? What’s this [gs]etobj about?

Those are methods defined by DelegateClass(). They are used to get and
set the
object that is delegated to. In this case they cause the proxy-object to
delegate to a copy of the original string instead of the original string
itself when a destructive method is called.

HTH,
Sebastian

Symbol#to_proc is new in ruby core since 1.8.7+. It had been previously
defined by ActiveSupport and, I think, facets.

WTF? What’s this [gs]etobj about?

Those are methods defined by DelegateClass(). […]

Thanks, I never saw DelegateClass before, now I understand.

mfg, simon … l

Robert D. wrote:

Cheers
Robert

Robert,

Thanks! I felt bad about the question, because reading on in to a couple
of the posts following, I figured it out, and realized I DID know what
the abbreviation meant. The “incremental” threw me off a bit. Thanks for
the link. I’d love to go check out that link, and will likely do so
later today.

Thank again for your help, as always. I’m endlessly grateful for the
helpfulness of the folk on this list.

t.

Tom C., MS MA, LMHC - Private practice Psychotherapist
Bellingham, Washington, U.S.A: (360) 920-1226
<< [email protected] >> (email)
<< TomCloyd.com >> (website)
<< sleightmind.wordpress.com >> (mental health weblog)