Forum: Ruby String doesnt auto dup on modification

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
0026dd77fd9ecc97b36e5b79cdbcf590?d=identicon&s=25 R. Kumar (sentinel)
on 2009-01-21 20:17
I'm writing my first largeish app. One issue that gets me frequently is
this:

I define a string in one class. Some other class references it, and
modifies it. I (somehow) expected that when another referer modifies the
reference, ruby would automatically dup() the string.

Anyway, through trial and error, I start dup()'ing strings myself. I am
aware of freeze().

But would like to know how others handle this generally in large apps.

- Do you keep freezing Strings you make in your classes to avoid
accidental change

- Do you habitually dup() your string ?

Is there some clean way of handling this that I am missing.
0026dd77fd9ecc97b36e5b79cdbcf590?d=identicon&s=25 R. Kumar (sentinel)
on 2009-01-21 20:21
> - Do you habitually dup() your string ?
>
> Is there some clean way of handling this that I am missing.
To continue:

In some critical places in my app I had done:

def set_value str
  @buffer = str.dup
end

def get_value
  @buffer.dup
end

Is this the norm? Do you do this generally, to avoid accidental changes
?
40613e55d7082e5f08429dfb50d0680e?d=identicon&s=25 Stefan Lang (Guest)
on 2009-01-21 22:11
(Received via mailing list)
2009/1/21 RK Sentinel <sentinel.2001@gmx.com>:
> But would like to know how others handle this generally in large apps.
>
> - Do you keep freezing Strings you make in your classes to avoid
> accidental change
>
> - Do you habitually dup() your string ?
>
> Is there some clean way of handling this that I am missing.

This is a well known "problem" with all languages that
have mutable strings. The solution is simple:

* Use destructive string methods only after profiling has shown
  that string manipulation is the bottleneck.

* Don't mutate a string after passing it across encapsulation
  boundaries.

Freezing certain strings can be beneficial in the same way
assertions are, habitually duping strings is a bad practice, IMO.

Stefan
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2009-01-21 22:59
RK Sentinel wrote:
> Anyway, through trial and error, I start dup()'ing strings myself. I am
> aware of freeze().
>
> But would like to know how others handle this generally in large apps.
>
> - Do you keep freezing Strings you make in your classes to avoid
> accidental change
>
> - Do you habitually dup() your string ?

Generally, no.

Of course there is no contract to enforce this, but in many cases it
would be considered bad manners to modify an object which is passed in
as an argument.

If you only read the object, then it doesn't matter. If you need a
modified version, create a new object. Usually this doesn't require
'dup'.

  def foo(a_string)
    a_string << "/foo"            # bad
    a_string = "#{a_string}/foo"  # good
    a_string = a_string + "/foo"  # good
  end

  DEFAULT_OPT = {:foo => "bar"}

  def bar(opt = {})
    opt[:foo] ||= "bar"           # bad
    opt = DEFAULT_OPT.merge(opt)  # good
  end

If you are paranoid, you can freeze DEFAULT_OPT and all its keys and
values.

Sometimes you will see frozen strings as an optimisation to reduce the
amount of garbage objects created:

  ...
  foo["bar"]        # creates a new "bar" string every time round

  BAR = "bar".freeze
  ...
  foo[BAR]          # always uses the same object

This probably won't make any noticeable difference except in the most
innermost of loops.
D15a45a973443d4562051eb675b60474?d=identicon&s=25 Tom Cloyd (Guest)
on 2009-01-21 23:10
(Received via mailing list)
Stefan Lang wrote:
>> aware of freeze().
>
> assertions are, habitually duping strings is a bad practice, IMO.
>
> Stefan
>
>
>
If this is an utterly dumb question, just ignore it. However, I AM
perplexed by this response. Here's why:

I thought it was OK for an object to receive input, and output a
modified version of same. If they don't get to do that, their use seems
rather limited. In my current app, I create a log object, and various
classes write to it. I don't create new objects every time I want to add
a log entry. Why would I do that? Makes no sense to me. I might want to
do exactly the same thing to a string. You seem to be saying this is bad
form. I can see that there are cases where you want the string NOT to be
modified, but you see to be saying that to modify the original string at
all is bad.

It makes perfect sense to me to pass an object (string, in this case)
across an encapsulation boundary specifically to modify it.

What am I missing here?

Thanks, if you can help me out!

Tom

--

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tom Cloyd, MS MA, LMHC - Private practice Psychotherapist
Bellingham, Washington, U.S.A: (360) 920-1226
<< tc@tomcloyd.com >> (email)
<< TomCloyd.com >> (website)
<< sleightmind.wordpress.com >> (mental health weblog)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0026dd77fd9ecc97b36e5b79cdbcf590?d=identicon&s=25 R. Kumar (sentinel)
on 2009-01-22 06:07
Tom Cloyd wrote:
> Stefan Lang wrote:
>>> aware of freeze().
>>
>> assertions are, habitually duping strings is a bad practice, IMO.
>>
>> Stefan
>>
>>
>>
> If this is an utterly dumb question, just ignore it. However, I AM
> perplexed by this response. Here's why:
i agree with you.

I have objects that get a string, process/clean it for printing/display.
(That is the whole purpose of centralizing data and behaviour into
classes.)

 Remembering I must not modify it is a big mental overhead and results
in strange things that I spend a lot of time tracking, till I find out
--- oh no the string got mutated over there. Now I must start dup()'ing
it -- okay, where all should I dup it ?

To the previous poster - yes, one does not have to use dup. One can
create a new string by changing the method from  say gsub! to just
gsub() and take the return value. I include such situations when i say
dup().
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-01-22 08:30
(Received via mailing list)
On 21.01.2009 22:57, Brian Candler wrote:
>
> Generally, no.

Same here.

> Of course there is no contract to enforce this, but in many cases it
> would be considered bad manners to modify an object which is passed in
> as an argument.

Depends: for example, if you have a method that is supposed to dump
something to a stream (IO and friends) which only uses << you can as
well use String there.

> If you only read the object, then it doesn't matter.

That may be true for methods but if you need to store a String as
instance variable then I tend to dup it if the application is larger.
You can even automate conditional dup'ing by doing something like this

class Object
   def dupf
     frozen? ? self : dup
   end
end

and then

class MyClass
   def initialize(name)
     @name = name.dupf
   end
end

Kind regards

  robert
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2009-01-22 09:17
Tom Cloyd wrote:
> I thought it was OK for an object to receive input, and output a
> modified version of same.

Do you mean "return the same object reference, after the object has been
modified", or "return a new object, which is a modified copy of the
original"?

> If they don't get to do that, their use seems
> rather limited. In my current app, I create a log object, and various
> classes write to it. I don't create new objects every time I want to add
> a log entry. Why would I do that? Makes no sense to me.

I'd consider a logger object as a sort of stream. You're just telling
the logger to "do" something every time you send it a message; you're
not really telling it to change into a different sort of logger. (Of
course, if the logger is logging to an underlying string buffer, then
changing that buffer is a desired side effect of logging, but the logger
itself is still the same)

> I might want to
> do exactly the same thing to a string. You seem to be saying this is bad
> form. I can see that there are cases where you want the string NOT to be
> modified, but you see to be saying that to modify the original string at
> all is bad.

No, I'm not saying this. Sometimes it's useful to modify the string
passed in:

  def cleanup!(str)
    str.strip!
    str.replace("default") if str.empty?
  end

However I'd say this is not the usual case. More likely I'd write

  def cleanup(str)
    str = str.strip
    str.empty? "default" : str
  end

> It makes perfect sense to me to pass an object (string, in this case)
> across an encapsulation boundary specifically to modify it.

Yes, in some cases it does, and it's up to you to agree the 'contract'
in your documentation that that's what you'll do. I'm not saying it's
forbidden.

But this seems to be contrary to your original question, where you were
saying you were defensively dup'ing strings, on both input and output,
to avoid cases where they get mutated later by the caller or some other
object invoked by the caller.

I'm saying to avoid this problem, the caller would not pass a string to
object X, and *then* mutate it (e.g. by passing it to another object Y
which mutates it). And in practice, I find this is not normally a
problem, because normally objects do not mutate strings which are passed
into them as arguments.

This is not a hard and fast rule. It's just how things work out for me
in practice. It depends on your coding style, and whether you're coding
for yourself or coding libraries to be used by other people too.

Regards,

Brian.
0026dd77fd9ecc97b36e5b79cdbcf590?d=identicon&s=25 R. Kumar (sentinel)
on 2009-01-22 10:28
instance variable then I tend to dup it if the application is larger.
> You can even automate conditional dup'ing by doing something like this
>
> class Object
>    def dupf
>      frozen? ? self : dup
>    end
> end
>
> and then
>
> class MyClass
>    def initialize(name)
>      @name = name.dupf
>    end
> end
>
> Kind regards
>
>   robert

thanks, this looks very helpful.

In response to the prev post by Brian, yes its a library for use by
others.
Cf7cd97cdc8ed7d4ae92965b24f0dfad?d=identicon&s=25 Stefan Rusterholz (apeiros)
on 2009-01-22 10:40
Robert Klemme wrote:
> class MyClass
>    def initialize(name)
>      @name = name.dupf
>    end
> end

I'd vote against this. It looks to me like a great way to confuse users
and complicate interfaces. I'd rather go towards transparency.
Generally I'd not mutate arguments, only the receiver. If there's a
valid case to mutate an argument, it should be documented and evident.
The user then has to provide a duplicate if he still needs the original.

Just my 0.02€

Regards
Stefan Rusterholz
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2009-01-22 10:52
RK Sentinel wrote:
>> class Object
>>    def dupf
>>      frozen? ? self : dup
>>    end
>> end
>>
>> and then
>>
>> class MyClass
>>    def initialize(name)
>>      @name = name.dupf
>>    end
>> end
>>
>> Kind regards
>>
>>   robert
>
> thanks, this looks very helpful.

Beware: this may not solve your problem. It will work if the passed-in
object is itself a String, but not if it's some other object which
contains Strings.

Try:

a = ["hello", "world"]
b = a.dupf
b[0] << "XXX"
p a

However, deep-copy is an even less frequently seen solution.

> In response to the prev post by Brian, yes its a library for use by
> others.

It's hard to provide concrete guidance without knowing what this library
does, but it sounds like it builds some data structure which includes
the string passed in.

I believe that a typical library is behaving correctly if it just stores
a reference to the string.

If a problem arises because the caller is later mutating that string
object, this could be considered to be a bug in the *caller*. The caller
can fix this problem by dup'ing the string at the point when they pass
it in, or dup'ing it later before changing it.

Again, this is not a hard-and-fast rule. Sometimes defensive dup'ing is
reasonable. For example, Ruby's Hash object has a special-case for
string keys: if you pass in an unfrozen string as a hash key, then the
string is dup'd and frozen before being used as the key.

This may (a) lead to surprising behaviour, and (b) doesn't solve the
general problem (*). However strings are very commonly used as hash
keys, and they are usually short, so it seems a reasonable thing to do.

Regards,

Brian.

(*) In the case of a hash with string keys, if you mutated one of those
keys then it could end up being on the wrong hash chain, and become
impossible to retrieve it from the hash using hash["key"]

So it's a question of which of two behaviours is least undesirable:
objects disappearing from the hash because you forgot to call #rehash,
or hashes working "right" with string keys but "wrong" with other keys.

irb(main):001:0> k = [1]
=> [1]
irb(main):002:0> h = {k => 1, [2] => 2}
=> {[1]=>1, [2]=>2}
irb(main):003:0> h[[1]]
=> 1
irb(main):004:0> k << 2
=> [1, 2]
irb(main):005:0> h
=> {[1, 2]=>1, [2]=>2}
irb(main):006:0> h[[1,2]]
=> nil                            <<< WHOOPS!
irb(main):007:0> h.rehash
=> {[1, 2]=>1, [2]=>2}
irb(main):008:0> h[[1,2]]
=> 1
0026dd77fd9ecc97b36e5b79cdbcf590?d=identicon&s=25 R. Kumar (sentinel)
on 2009-01-22 11:35
>
> Beware: this may not solve your problem. It will work if the passed-in
> object is itself a String, but not if it's some other object which
> contains Strings.

I guess I'll just have to remember not to modify a string I take from
another class. I don't feel to happy or reassured about that since it's
been already causing me some grief.

Maybe I am just used to a bad practice coming from another language
whereas for others it's second nature.
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2009-01-22 12:24
RK Sentinel wrote:
> I guess I'll just have to remember not to modify a string I take from
> another class.

Or as was said earlier on: simply don't use destructive string methods
unless you really have to. Pretend that all strings are frozen.

You can enforce this easily enough in your unit tests: e.g. you could
pass in strings that really are frozen, and check that your code still
works :-)
699c00ad35f2755810b4aa5f423d73e2?d=identicon&s=25 Albert Schlef (alby)
on 2009-01-22 12:41
Stefan Lang wrote:
> 2009/1/21 RK Sentinel <sentinel.2001@gmx.com>:
> > I define a string in one class. Some other class references it, and
> > modifies it. I (somehow) expected that when another referer modifies the
> > reference, ruby would automatically dup() the string.
>
> This is a well known "problem" with all languages that
> have mutable strings. The solution is simple:

If I understand the OP correctly, his problem has nothing to do with
strings.

He says he modifies some object ...and to his great surprise the object
indeed gets modified.
0026dd77fd9ecc97b36e5b79cdbcf590?d=identicon&s=25 R. Kumar (sentinel)
on 2009-01-22 13:08
Brian Candler wrote:
> RK Sentinel wrote:
>> I guess I'll just have to remember not to modify a string I take from
>> another class.
>
> Or as was said earlier on: simply don't use destructive string methods
> unless you really have to. Pretend that all strings are frozen.
>

Yes, I did mention too somewhere to use gsub! etc with caution. Fine in
local variables, strings that are not exposed.

> He says he modifies some object ...and to his great surprise the object
> indeed gets modified.
:-) Nice way of putting it. But this happened across objects, and I was
wondering whether there was some other solution to the problem which did
not require my having to remember not to modify a string taken from
elsewhere.
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-01-22 14:28
(Received via mailing list)
2009/1/22 Stefan Rusterholz <apeiros@gmx.net>:
> valid case to mutate an argument, it should be documented and evident.
> The user then has to provide a duplicate if he still needs the original.

Not sure what you mean because the code makes sure that the argument
is *not* mutated.

Or did you mean "caller" when you wrote "I" above?  There are
different policies to handle that as this thread has shown. Generally
I tend to follow the line that the caller is the owner of the object
unless otherwise stated. This also means that he is free to change it
at any time and consequently it is the responsibility of the receiver
to take whatever measures if he needs to make sure he keeps track of
the unmodified state.

IMHO this is a reasonable policy because if ownership passing would be
the default you would have to copy always as the caller if you need to
continue to work with the object - even if receivers just use the
object temporary and do not store references for longer. This would be
a waste of resources.

In practice this is usually not an issue for me so IMHO the discussion
has a strong "theoretical" aspect. OTOH it's good to reason about
these things from time to time. Helps clarify some things and avoid
mistakes.

Cheers

robert
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-01-22 14:30
(Received via mailing list)
2009/1/22 Albert Schlef <albertschlef@gmail.com>:
> strings.
Sorry, that's plain wrong. He explicitly mentions Strings in several
places. The object that is mutated *is* a String.

> He says he modifies some object ...and to his great surprise the object
> indeed gets modified.

:-)

Cheers

robert
B57c5af36f5c1f33243dd8b2dd9043b1?d=identicon&s=25 F. Senault (Guest)
on 2009-01-22 14:55
(Received via mailing list)
(Ooops, sorry if this message makes it through in duplicate with a
"strange" from, but I mixed up my "identities" in my newsreader...)

Le Thu, 22 Jan 2009 00:05:58 -0500, RK Sentinel a écrit :

> Tom Cloyd wrote:

>> If this is an utterly dumb question, just ignore it. However, I AM
>> perplexed by this response. Here's why:
> i agree with you.
>
> I have objects that get a string, process/clean it for printing/display.
> (That is the whole purpose of centralizing data and behaviour into
> classes.)

You could also design some "clever" accessors, like this :

class Class
  def attr_duplicator(*members)
    members.each do |m|
      self.class_eval(<<-EOM)
        def #{m}
          @#{m}.dup
        end
        def #{m}=(v)
          @#{m} = v.dup
        end
      EOM
    end
  end
  nil
end

(Could even bail if the writer is passed something else than a string.)

A long winded example :

>> class Blah ; attr_duplicator :blix ; end
=> nil
>> t = Blah.new()
=> #<Blah:0x000000016fce98>

>> z = "abc"
=> "abc"
>> z.__id__
=> 12031220
>> t.blix = z
=> "abc"
>> t.instance_variable_get("@blix").__id__
=> 12015560

>> t.blix.__id__
=> 11609640
>> t.blix.__id__
=> 11604360

>> t.blix << "aaa"
=> "abcaaa"
>> t.blix
=> "abc"

>> a = t.blix
=> "abc"
>> a << "aaa"
=> "abcaaa"
>> t.blix
=> "abc"

As usual, not sure about the performance impact, though, especially if
you manipulate very big string objects...

Fred
0026dd77fd9ecc97b36e5b79cdbcf590?d=identicon&s=25 R. Kumar (sentinel)
on 2009-01-22 15:12
Thanks for all the helpful replies. Its my first venture: a widget
library.

Here's an example: Sometimes a string is passed to a class, say, using
its set_buffer method (which is an _optional_ method).

set_buffer just assigns it to @buffer. But deep within the class this
variable is being edited using insert() or slice!() (and this IS
necessary) since the widget is an editing widget. So its not so clear
when i added set_buffer method that this would happen.

As and when i discover such bugs in my code, I start adding dup(), and
yes sometimes these lines bomb when another datatype is passed (I had
asked this in a  thread recently: respond_to? dup was passing, but the
dup was failing).

Yesterday, I had written an Action class which can be used to create a
menuitem or a button. The string used in the Action constructor can be
like "&Delete".
Button and Menuitem remove the "&" using a slice!(). And i was wondering
why the second usage was failing  to find the "&".

Anyway, i realize its more my incompetence, and i must be careful with
destructive methods, but I just thought maybe there's some other way to
do this, so i am not leaving it to my memory.

Thanks again for the helpful replies. The ruby community really rocks
when it comes helping others.
40613e55d7082e5f08429dfb50d0680e?d=identicon&s=25 Stefan Lang (Guest)
on 2009-01-22 16:15
(Received via mailing list)
2009/1/21 Tom Cloyd <tomcloyd@comcast.net>:
>>> reference, ruby would automatically dup() the string.
>>>
>>  boundaries.
> by this response. Here's why:
> It makes perfect sense to me to pass an object (string, in this case) across
> an encapsulation boundary specifically to modify it.
>
> What am I missing here?

There's nothing wrong with it if the purpose of the method
is to manipulate the string and it's documented clearly.

Every rule has exceptions :-)

Stefan
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-01-22 22:30
(Received via mailing list)
On 22.01.2009 15:11, RK Sentinel wrote:
> As and when i discover such bugs in my code, I start adding dup(), and
> yes sometimes these lines bomb when another datatype is passed (I had
> asked this in a  thread recently: respond_to? dup was passing, but the
> dup was failing).

> Anyway, i realize its more my incompetence, and i must be careful with
> destructive methods, but I just thought maybe there's some other way to
> do this, so i am not leaving it to my memory.

I would not call this "incompetence": we live and learn.  Basically this
is a typical trade off issue: you trade efficiency (no copy) for safety
(no aliasing).  As often with trade offs there is no clear 100% rule
which exactly tells you what is *always* correct.  Instead you have to
think about it - when creating something like a library even more so -
and then deliberately decide which way you go.

Kind regards

  robert
E0d864d9677f3c1482a20152b7cac0e2?d=identicon&s=25 Robert Klemme (Guest)
on 2009-01-22 22:31
(Received via mailing list)
On 22.01.2009 12:23, Brian Candler wrote:
> RK Sentinel wrote:
>> I guess I'll just have to remember not to modify a string I take from
>> another class.
>
> Or as was said earlier on: simply don't use destructive string methods
> unless you really have to. Pretend that all strings are frozen.
>
> You can enforce this easily enough in your unit tests: e.g. you could
> pass in strings that really are frozen, and check that your code still
> works :-)

Well, this is only half of the story: it does not save you from outside
code changing the instance under your hands.  Aliasing works both ways,
i.e. you can screw up the receiver but also the caller. :-)

Cheers

  robert
703fbc991fd63e0e1db54dca9ea31b53?d=identicon&s=25 Robert Dober (Guest)
on 2009-01-23 01:41
(Received via mailing list)
On Wed, Jan 21, 2009 at 8:15 PM, RK Sentinel <sentinel.2001@gmx.com>
wrote:
> But would like to know how others handle this generally in large apps.
>
> - Do you keep freezing Strings you make in your classes to avoid
> accidental change
>
> - Do you habitually dup() your string ?
I try to, and I try to get rid of all references to the original
string as soon as possible.
This is because incremental GC works so well nowadays and allows for
some clean code.
Freezing a string seems like a good idea sometimes, but if that means
holding on to the object longer than needed this might not be such a
good idea after all.

R.
D15a45a973443d4562051eb675b60474?d=identicon&s=25 Tom Cloyd (Guest)
on 2009-01-23 01:53
(Received via mailing list)
Robert Dober wrote:
>> aware of freeze().
> This is because incremental GC works so well nowadays and allows for
> some clean code.
> Freezing a string seems like a good idea sometimes, but if that means
> holding on to the object longer than needed this might not be such a
> good idea after all.
>
> R.
>
Robert, for those of us who are considerably more clueless, what is
"incremental GC"?

Thanks,

t.

--

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tom Cloyd, MS MA, LMHC - Private practice Psychotherapist
Bellingham, Washington, U.S.A: (360) 920-1226
<< tc@tomcloyd.com >> (email)
<< TomCloyd.com >> (website)
<< sleightmind.wordpress.com >> (mental health weblog)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
0026dd77fd9ecc97b36e5b79cdbcf590?d=identicon&s=25 R. Kumar (sentinel)
on 2009-01-23 06:19
>>
>> - Do you habitually dup() your string ?
> I try to, and I try to get rid of all references to the original
> string as soon as possible.
> This is because incremental GC works so well nowadays and allows for
> some clean code.
> Freezing a string seems like a good idea sometimes, but if that means
> holding on to the object longer than needed this might not be such a
> good idea after all.
>
> R.

Interesting point. So freezing a string prevents collection as long as
there are referers (obvious), but duping it helps release the original
one, but you still have a new string in memory. So net you still are
taking the same memory.

Is there a writeup on Ruby GC collection, my knowledge of GC is java
based, and it is 5 years old (based on the Inside the VM book and
various other articles on sun.com). Is ruby's GC "generational" ? In
which iirc, an older object would have moved to an older generation and
be less likely to be collected.

Any links to ruby's GC would be appreciated.
A246f7c0ce5f2909483d358bd9e83e4e?d=identicon&s=25 Mike Gold (mikegold)
on 2009-01-23 06:26
RK Sentinel wrote:
>
> - Do you keep freezing Strings you make in your classes to avoid
> accidental change
>
> - Do you habitually dup() your string ?
>

One possibility is copy-on-write.

require 'delegate'

class CopyOnWriteString < DelegateClass(String)
  DESTRUCTIVE_METHODS =
    String.public_instance_methods(false).grep(/!/).map(&:to_sym) +
    [
      :[]=,
      :<<,
      :concat,
      :initialize_copy,
      :replace,
      :setbyte,
      # ... and probably others ...
    ]

  DESTRUCTIVE_METHODS.each { |m|
    define_method(m) { |*args, &block|
      __setobj__(__getobj__.dup)
      __getobj__.send(m, *args, &block)
    }
  }
end

class Person
  def initialize(name)
    @name = name
  end
  def name
    CopyOnWriteString.new(@name)
  end
end

person = Person.new("fred")
name = person.name

p name        #=> "fred"
p person.name #=> "fred"

name << " flintstone"
p name        #=> "fred flintstone"
p person.name #=> "fred"

(I've used some 1.8.7+ only features.)
753dcb78b3a3651127665da4bed3c782?d=identicon&s=25 Brian Candler (candlerb)
on 2009-01-23 11:37
RK Sentinel wrote:
> Thanks for all the helpful replies. Its my first venture: a widget
> library.
>
> Here's an example: Sometimes a string is passed to a class, say, using
> its set_buffer method (which is an _optional_ method).
>
> set_buffer just assigns it to @buffer. But deep within the class this
> variable is being edited using insert() or slice!() (and this IS
> necessary) since the widget is an editing widget.

Thanks, so I guess the API is something like this:

  edit_field.buffer = "foo"

  ... some time later, after user has clicked OK ...

  f.write(edit_field.buffer)

Now, if there is a compelling reason for this object to perform
"in-place" editing on the buffer then by all means do, and document
this, but it will lead to the aliasing problems you describe.

It may be simpler and safer just to use non-destructive methods inside
your class.

  # destructive
  @buffer.slice!(x,y)
  # non-destructive alternative
  @buffer = @buffer.slice(x,y)

  # destructive
  @buffer.insert(pos, text)
  # non-destructive alternative
  @buffer = buffer[0,pos] + text + buffer[pos..-1]

In effect, this is doing a 'dup' each time. It has to; since Ruby
doesn't do reference-counting it has no idea whether any other object in
the system is holding a reference to the original object or not.

The only problem with this is if @buffer is a multi-megabyte object and
you don't want to keep copying it. In this case, doing a single dup
up-front would allow you to use the destructive methods safely.

class Editor
  def buffer
    @buffer
  end
  def buffer=(x)
    @buffer = x.dup
  end
end

The overhead of a single copy is small, and in any case this is probably
what is needed here (e.g. if the user makes some edits but clicks
'cancel' instead of 'save' then you may want to keep the old string
untouched)

You could try deferring the dup until the first time you call a
destructive method on the string, but the complexity overhead is
unlikely to be worth it.
0026dd77fd9ecc97b36e5b79cdbcf590?d=identicon&s=25 R. Kumar (sentinel)
on 2009-01-23 12:26
Brian Candler wrote:

> The overhead of a single copy is small, and in any case this is probably
> what is needed here (e.g. if the user makes some edits but clicks
> 'cancel' instead of 'save' then you may want to keep the old string
> untouched)
>
> You could try deferring the dup until the first time you call a
> destructive method on the string, but the complexity overhead is
> unlikely to be worth it.

Yes, I've got the set_buffer doing a dup (if its a string).

At the same time, the get_buffer also does a dup, since often the Field
is created blank (i did mention that set_buffer is an optional method
for editing a default value, if present).

Its a real TextField or Field. So you would be typing away in the field.
Each character you type is inserted in (or removed if its del or BS) -
exactly as I am typing away in this editbox.

The CopyOnWriteString a impressive, shows what all can be done with
Ruby.
A246f7c0ce5f2909483d358bd9e83e4e?d=identicon&s=25 Mike Gold (mikegold)
on 2009-01-23 15:29
Brian Candler wrote:
>
> You could try deferring the dup until the first time you call a
> destructive method on the string, but the complexity overhead is
> unlikely to be worth it.

Careful.  Intuition is worse than useless here.  The only way to know is
to measure the particular case in question.

class Person
  def initialize(name)
    @name = name
  end
  def name_cow
    CopyOnWriteString.new(@name)
  end
  def name_dup
    @name.dup
  end
end

require 'benchmark'

n = 10_000
sizes = [100, 1000, 10_000, 100_000]
objects = sizes.inject(Hash.new) { |acc, size|
  acc.merge!(size => Person.new("x"*size))
}

sizes.each { |size|
  object = objects[size]
  puts "-"*40
  puts "iterations: #{n} size: #{size}"
  Benchmark.bm { |x|
    x.report("cow w/o change") {
      n.times { object.name_cow }
    }
    x.report("dup w/o change") {
      n.times { object.name_dup }
    }
    x.report("cow w/  change") {
      n.times { object.name_cow << "y" }
    }
    x.report("dup w/  change") {
      n.times { object.name_dup << "y" }
    }
  }
}

ruby 1.8.7 (2008-08-11 patchlevel 72) [i386-cygwin]
----------------------------------------
iterations: 10000 size: 100
      user     system      total        real
cow w/o change  0.031000   0.000000   0.031000 (  0.031000)
dup w/o change  0.032000   0.000000   0.032000 (  0.031000)
cow w/  change  0.171000   0.000000   0.171000 (  0.172000)
dup w/  change  0.047000   0.000000   0.047000 (  0.047000)
----------------------------------------
iterations: 10000 size: 1000
      user     system      total        real
cow w/o change  0.032000   0.000000   0.032000 (  0.031000)
dup w/o change  0.046000   0.000000   0.046000 (  0.047000)
cow w/  change  0.172000   0.000000   0.172000 (  0.172000)
dup w/  change  0.063000   0.000000   0.063000 (  0.062000)
----------------------------------------
iterations: 10000 size: 10000
      user     system      total        real
cow w/o change  0.031000   0.000000   0.031000 (  0.032000)
dup w/o change  0.109000   0.000000   0.109000 (  0.109000)
cow w/  change  0.282000   0.000000   0.282000 (  0.281000)
dup w/  change  0.156000   0.000000   0.156000 (  0.156000)
----------------------------------------
iterations: 10000 size: 100000
      user     system      total        real
cow w/o change  0.031000   0.000000   0.031000 (  0.032000)
dup w/o change  0.672000   0.000000   0.672000 (  0.672000)
cow w/  change  1.406000   0.000000   1.406000 (  1.406000)
dup w/  change  1.219000   0.000000   1.219000 (  1.219000)

Destructive methods are less common in real code, and especially so when
the string comes from a attr_reader method.  It is likely that the case
to optimize is the non-destructive call (the first of each quadruplet
above).  But we have to profile the specific situation.
703fbc991fd63e0e1db54dca9ea31b53?d=identicon&s=25 Robert Dober (Guest)
on 2009-01-23 15:44
(Received via mailing list)
Basically yes.
But one has to be careful, as we somehow have the instinct not to
create lots of short time objects.

To Tom, sorry for the sloppy abbreaviation, but it means incremental
Garbage Collector, applying different strategies of collection
depending on object age.
If you have 45m to spare, there was a most interesting talk at
Rubytalk by Glenn Vanderbourg, have a look by all means:
http://rubyconf2008.confreaks.com/how-ruby-can-be-fast.html

Cheers
Robert
D15a45a973443d4562051eb675b60474?d=identicon&s=25 Tom Cloyd (Guest)
on 2009-01-23 19:29
(Received via mailing list)
Robert Dober wrote:
>
> Cheers
> Robert
>
>
Robert,

Thanks! I felt bad about the question, because reading on in to a couple
of the posts following, I figured it out, and realized I DID know what
the abbreviation meant. The "incremental" threw me off a bit. Thanks for
the link. I'd love to go check out that link, and will likely do so
later today.

Thank again for your help, as always. I'm endlessly grateful for the
helpfulness of the folk on this list.

t.

--

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tom Cloyd, MS MA, LMHC - Private practice Psychotherapist
Bellingham, Washington, U.S.A: (360) 920-1226
<< tc@tomcloyd.com >> (email)
<< TomCloyd.com >> (website)
<< sleightmind.wordpress.com >> (mental health weblog)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1d53b088a989e069b94597c282eebbbc?d=identicon&s=25 Simon Krahnke (Guest)
on 2009-01-27 23:05
(Received via mailing list)
* Mike Gold <mike.gold.4433@gmail.com> (2009-01-23) schrieb:

>     String.public_instance_methods(false).grep(/!/).map(&:to_sym) +

Is that a new feature? make the to_sym method act as block. Which to_sym
method?

>   DESTRUCTIVE_METHODS.each { |m|
>     define_method(m) { |*args, &block|
>       __setobj__(__getobj__.dup)
>       __getobj__.send(m, *args, &block)

WTF? What's this __[gs]etobj__ about?

> (I've used some 1.8.7+ only features.)

mfg,                   simon .... tia
7a561ec0875fcbbe3066ea8fe288ec77?d=identicon&s=25 Sebastian Hungerecker (Guest)
on 2009-01-28 00:30
(Received via mailing list)
Simon Krahnke wrote:
> * Mike Gold <mike.gold.4433@gmail.com> (2009-01-23) schrieb:
> >     String.public_instance_methods(false).grep(/!/).map(&:to_sym) +
>
> Is that a new feature?

Symbol#to_proc is new in ruby core since 1.8.7+. It had been previously
defined by ActiveSupport and, I think, facets.

> make the to_sym method act as block. Which to_sym method?

The to_sym method of each item in the array. symbol_to_proc is defined
somewhat like:
lambda {|x, *args| x.send(self, *args)}
So foo.map(&:to_sym) behaves like foo.map {|x| x.send(:to_sym)}


> >   DESTRUCTIVE_METHODS.each { |m|
> >     define_method(m) { |*args, &block|
> >       __setobj__(__getobj__.dup)
> >       __getobj__.send(m, *args, &block)
>
> WTF? What's this __[gs]etobj__ about?

Those are methods defined by DelegateClass(). They are used to get and
set the
object that is delegated to. In this case they cause the proxy-object to
delegate to a copy of the original string instead of the original string
itself when a destructive method is called.

HTH,
Sebastian
1d53b088a989e069b94597c282eebbbc?d=identicon&s=25 Simon Krahnke (Guest)
on 2009-01-28 18:35
(Received via mailing list)
* Sebastian Hungerecker <sepp2k@googlemail.com> (00:27) schrieb:

> Symbol#to_proc is new in ruby core since 1.8.7+. It had been previously
> defined by ActiveSupport and, I think, facets.

>> WTF? What's this __[gs]etobj__ about?
>
> Those are methods defined by DelegateClass(). [...]

Thanks, I never saw DelegateClass before, now I understand.

mfg,               simon .... l
This topic is locked and can not be replied to.