Adventures in Optimization... or why CONST frozen is Good


#1

…or when a language design level optimization is a pessimization.

Ruby allows destructive string operations. String instance methods
with a “Bang!” at the end.

Consider this code.

a = [‘froot’]
b=a.first
c = {“d”=>b}

Now a[0], b, c[“d”] refer to exactly the same string instance

a[0].object_id
=> -605300798
b.object_id
=> -605300798
c[“d”].object_id
=> -605300798

So if I do a destructive operation on any of them, all are clobbered.

a.last.sub!(/oo/,“ui”)
=> “fruit”
irb(main):009:0> b
=> “fruit”
irb(main):010:0> c
=> {“d”=>“fruit”}

Traditionally destructive ops have been allowed in languages such as
Lisp etc. as an optimization. You don’t have to “new” a new object
instance if you don’t want to.

The other day I was optimizing my code, when I decided to hunt
unnecessary object allocation.

I used my MemoryProfiler snippet to find that String’s were by far the
most common object I was generating.

http://rubyforge.org/snippet/detail.php?type=snippet&id=70

So I extended that to find which was the most common string I was
generating.

def MemoryProfile::string_duplicates
   Dir.chdir "/tmp"
   ObjectSpace::garbage_collect
   sleep 10 # Give the GC thread a chance

   tally = Hash.new(0)
   ObjectSpace.each_object do |obj|
      next if obj.class != String
      tally[obj]+=1
   end

   open( LOG_FILE, 'a') do |outf|
      outf.puts '='*70
      outf.puts "

String Duplicates report for #{$0}

"
tally.keys.find_all{|s| tally[s] > 1}.sort_by{|s|
tally[s]}.each do |s|
outf.puts “#{s}\t#{tally[s]}”
end
end
end

The answer, by a long shot, was “U”.

Somewhere in my code I had the line
symbols_needed[symbol_name] = ‘U’

I could replace that with the symbol :U, but other places that had
Good Reasons of using strings would break.

Now I have a class CONSTANT…
UNDEFINED = ‘U’.freeze

and
symbols_needed[symbol_name] = UNDEFINED

Of course, if anywhere I apply a destructive op to one of those
thousands of references, my code will die.

Bit at least the “freeze” will cause a loud and messy death, not a
subtle and hidden bug.

So as I said at the start, the optimization to allow the occasional
destructive op to a string… can be a pessimization in every case where
you assign a string literal.

a= “froot”
=> “froot”
irb(main):002:0> a.object_id
=> -605331808
irb(main):003:0> a= “froot”
=> “froot”
irb(main):004:0> a.object_id
=> -605352198

John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email : removed_email_address@domain.invalid
New Zealand


#2

On Dec 1, 2008, at 4:53 PM, John C. wrote:

    end

Good Reasons of using strings would break.
Bit at least the “freeze” will cause a loud and messy death, not a
=> -605331808

thanks john. interesting.

a @ http://codeforpeople.com/


#3

Now I have a class CONSTANT…
UNDEFINED = ‘U’.freeze

and
symbols_needed[symbol_name] = UNDEFINED

Yes, that’s the “right” solution with Ruby today, and you’ll see this
done in a lot of Ruby libraries. (Perhaps it would be nice if there were
some syntax to define an inline frozen string literal)

I don’t consider this any sort of “optimisation” though. It’s
fundamental to the nature of Ruby that there is only one kind of value,
which is a reference to an object. An assignment always copies only the
reference.

This is a breath of fresh air when compared to, say, Perl. Is this value
a scalar? Is it a scalar number or string, or a reference to an Array or
a Hash, or a typeglob, or a filehandle, or …?

However, you could argue that string literals should have been immutable
(like Symbol). The language would end up being somewhat different to
use:

a = "hello"           # maybe Symbol or StringLiteral
b = String.new(a)     # mutable String
b << " world"

You’d also have to have a load of rules to work out. Should a.dup return
the same Symbol, or a new mutable String? Should (a + “world”) return a
new Symbol, or a new mutable String?

From this point of view, just having String keeps things simple, even if
it does end up creating a load of garbage objects. In those cases where
this matters, your approach (of profiling and zapping) is a good one.


#4

Le 02 décembre à 16:33, Robert K. a écrit :

In other words, there is a hidden dup going on if the Hash key is a
String which is not frozen.

For some values of hidden :

16:41 grappa:~> qri ‘Hash#[]=’
--------------------------------------------------------------- Hash#[]=
hsh[key] = value => value
hsh.store(key, value) => value

 Element Assignment---Associates the value given by value with the
 key given by key. key should not have its value changed while it
 is in use as a key (a String passed as a key will be duplicated
 and frozen).

Fred


#5

2008/12/2 Brian C. removed_email_address@domain.invalid:

I don’t consider this any sort of “optimisation” though. It’s
fundamental to the nature of Ruby that there is only one kind of value,
which is a reference to an object. An assignment always copies only the
reference.

Completely agree, this comes as no surprise. Actually, this is an
obvious design decision, if you want to use the same value to denote a
particular state then just use one object.

Another, probably more subtle issue is this:

irb(main):001:0> s=“foo”
=> “foo”
irb(main):002:0> h={s=>1}
=> {“foo”=>1}
irb(main):003:0> s.equal? h.keys.first
=> false
irb(main):004:0> [s.object_id, h.keys.first.object_id]
=> [1073539250, 1073539280]
irb(main):005:0> s.freeze
=> “foo”
irb(main):006:0> h={s=>1}
=> {“foo”=>1}
irb(main):007:0> s.equal? h.keys.first
=> true
irb(main):008:0> [s.object_id, h.keys.first.object_id]
=> [1073539250, 1073539250]

In other words, there is a hidden dup going on if the Hash key is a
String which is not frozen.

Kind regards

robert