Forum: Ruby Adventures in Optimization... or why CONST frozen is Good

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
John C. (Guest)
on 2008-12-02 02:00
(Received via mailing list)
...or when a language design level optimization is a pessimization.

Ruby allows destructive string operations. String instance methods
with a "Bang!" at the end.

Consider this code.

a = ['froot']
b=a.first
c = {"d"=>b}

Now a[0], b, c["d"] refer to _exactly_ the same string instance

> a[0].object_id
=> -605300798
> b.object_id
=> -605300798
> c["d"].object_id
=> -605300798

So if I do a destructive operation on any of them, all are clobbered.

a.last.sub!(/oo/,"ui")
=> "fruit"
irb(main):009:0> b
=> "fruit"
irb(main):010:0> c
=> {"d"=>"fruit"}

Traditionally destructive ops have been allowed in languages such as
Lisp etc. as an optimization. You don't have to "new" a new object
instance if you don't want to.

The other day I was optimizing my code, when I decided to hunt
unnecessary object allocation.

I used my MemoryProfiler snippet to find that String's were by far the
most common object I was generating.

http://rubyforge.org/snippet/detail.php?type=snipp...

So I extended that to find _which_ was the most common string I was
generating.

    def MemoryProfile::string_duplicates
       Dir.chdir "/tmp"
       ObjectSpace::garbage_collect
       sleep 10 # Give the GC thread a chance

       tally = Hash.new(0)
       ObjectSpace.each_object do |obj|
          next if obj.class != String
          tally[obj]+=1
       end

       open( LOG_FILE, 'a') do |outf|
          outf.puts '='*70
          outf.puts "
String Duplicates report for #{$0}


"
          tally.keys.find_all{|s| tally[s] > 1}.sort_by{|s|
tally[s]}.each do |s|
             outf.puts "#{s}\t#{tally[s]}"
          end
       end
    end


The answer, by a long shot, was "U".

Somewhere in my code I had the line
   symbols_needed[symbol_name] = 'U'

I could replace that with the symbol :U, but other places that had
Good Reasons of using strings would break.

Now I have a class CONSTANT...
   UNDEFINED = 'U'.freeze

and
   symbols_needed[symbol_name] = UNDEFINED

Of course, if anywhere I apply a destructive op to one of those
thousands of references, my code will die.

Bit at least the "freeze" will cause a loud and messy death, not a
subtle and hidden bug.

So as I said at the start, the optimization to allow the occasional
destructive op to a string... can be a pessimization in every case where
you assign a string literal.

a= "froot"
=> "froot"
irb(main):002:0> a.object_id
=> -605331808
irb(main):003:0> a= "froot"
=> "froot"
irb(main):004:0> a.object_id
=> -605352198


John C.                             Phone : (64)(3) 358 6639
Tait Electronics                        Fax   : (64)(3) 359 4632
PO Box 1645 Christchurch                Email : 
removed_email_address@domain.invalid
New Zealand
Ara H. (Guest)
on 2008-12-02 02:03
(Received via mailing list)
On Dec 1, 2008, at 4:53 PM, John C. wrote:

>
>
>
>
>
>         end
> Good Reasons of using strings would break.
> Bit at least the "freeze" will cause a loud and messy death, not a
> => -605331808
>
>

thanks john.   interesting.

a @ http://codeforpeople.com/
Brian C. (Guest)
on 2008-12-02 15:57
> Now I have a class CONSTANT...
>    UNDEFINED = 'U'.freeze
>
> and
>    symbols_needed[symbol_name] = UNDEFINED

Yes, that's the "right" solution with Ruby today, and you'll see this
done in a lot of Ruby libraries. (Perhaps it would be nice if there were
some syntax to define an inline frozen string literal)

I don't consider this any sort of "optimisation" though. It's
fundamental to the nature of Ruby that there is only one kind of value,
which is a reference to an object. An assignment always copies only the
reference.

This is a breath of fresh air when compared to, say, Perl. Is this value
a scalar? Is it a scalar number or string, or a reference to an Array or
a Hash, or a typeglob, or a filehandle, or ...?

However, you could argue that string literals should have been immutable
(like Symbol). The language would end up being somewhat different to
use:

    a = "hello"           # maybe Symbol or StringLiteral
    b = String.new(a)     # mutable String
    b << " world"

You'd also have to have a load of rules to work out. Should a.dup return
the same Symbol, or a new mutable String? Should (a + "world") return a
new Symbol, or a new mutable String?

From this point of view, just having String keeps things simple, even if
it does end up creating a load of garbage objects. In those cases where
this matters, your approach (of profiling and zapping) is a good one.
Robert K. (Guest)
on 2008-12-02 17:39
(Received via mailing list)
2008/12/2 Brian C. <removed_email_address@domain.invalid>:
> I don't consider this any sort of "optimisation" though. It's
> fundamental to the nature of Ruby that there is only one kind of value,
> which is a reference to an object. An assignment always copies only the
> reference.

Completely agree, this comes as no surprise. Actually, this is an
obvious design decision, if you want to use the same value to denote a
particular state then just use one object.

Another, probably more subtle issue is this:

irb(main):001:0> s="foo"
=> "foo"
irb(main):002:0> h={s=>1}
=> {"foo"=>1}
irb(main):003:0> s.equal? h.keys.first
=> false
irb(main):004:0> [s.object_id, h.keys.first.object_id]
=> [1073539250, 1073539280]
irb(main):005:0> s.freeze
=> "foo"
irb(main):006:0> h={s=>1}
=> {"foo"=>1}
irb(main):007:0> s.equal? h.keys.first
=> true
irb(main):008:0> [s.object_id, h.keys.first.object_id]
=> [1073539250, 1073539250]

In other words, there is a hidden dup going on if the Hash key is a
String which is not frozen.

Kind regards

robert
F. Senault (Guest)
on 2008-12-02 17:45
(Received via mailing list)
Le 02 décembre à 16:33, Robert K. a écrit :

> In other words, there is a hidden dup going on if the Hash key is a
> String which is not frozen.

For some values of hidden :

16:41 grappa:~> qri 'Hash#[]='
--------------------------------------------------------------- Hash#[]=
     hsh[key] = value        => value
     hsh.store(key, value)   => value
------------------------------------------------------------------------
     Element Assignment---Associates the value given by value with the
     key given by key. key should not have its value changed while it
     is in use as a key (a String passed as a key will be duplicated
     and frozen).

Fred
This topic is locked and can not be replied to.