Some odd ball optimization tricks

Here are two idioms which you may find useful, or at least curious, when
trying to optimize the heck out of a Ruby script…

require ‘pp’

Similar to the “autosequence” facility in SQL.

Useful for replacing a complex key with a very simply POD proxy

object.
file_sequence = 0
file_index = Hash.new{|hash,key| file_sequence+=1;hash[key] =
file_sequence}

p file_index[“foo”]
p file_index[“foo”]
p file_index[“bah”]
p file_index[“foo”]
p file_index[“bah”]
pp file_index

Similar to “to_sym”, but can cope with spaces and weird characters…

So if tom1, tom2, tom3… go out of scope they can be garbage

collected…

so if you end up just holding tomtom, you have only one copy of “tom”

and
you can test for

equality with .object_id == .object_id!

one_true_string = Hash.new{|hash,string| hash[string] = string}

tom1 = “tom”
tom2 = “tom”
tom3 = “#{tom1}”
tom4 = tom2.clone
tom5 = “t”+“o”+“m”

toms = [tom1, tom2, tom3, tom4, tom5]

pp toms
pp toms.collect{|tom| tom.object_id }
tomtom = toms.collect{|tom| one_true_string[tom] }
pp tomtom
pp tomtom.collect{|tom| tom.object_id }

Running the above results in…
1
1
2
1
2
{“foo”=>1, “bah”=>2}
[“tom”, “tom”, “tom”, “tom”, “tom”]
[-609503848, -609503908, -609503888, -609503918, -609503978]
[“tom”, “tom”, “tom”, “tom”, “tom”]
[-609503848, -609503848, -609503848, -609503848, -609503848]


John C. Phone : (64)(3) 358 6639
Tait Electronics Fax : (64)(3) 359 4632
PO Box 1645 Christchurch Email :
[email protected]
New Zealand

=======================================================================
This email, including any attachments, is only for the intended
addressee. It is subject to copyright, is confidential and may be
the subject of legal or other privilege, none of which is waived or
lost by reason of this transmission.
If the receiver is not the intended addressee, please accept our
apologies, notify us by return, delete all copies and perform no
other act on the email.
Unfortunately, we cannot warrant that the email has not been
altered or corrupted during transmission.

On Wednesday, August 25, 2010 06:51:58 pm John C. wrote:

p file_index[“foo”]
p file_index[“bah”]
pp file_index

I guess I can think of a few rare instances this might be useful, but
I’ve
never had the keys be the bottleneck, or re-used keys enough in a script
for
this to matter. Still, interesting.

Similar to “to_sym”, but can cope with spaces and weird characters…

…what?

irb> ‘foo bar’.to_sym
=> :“foo bar”

“a@$bc☃!d\t”.to_sym
=> :“a@$bc☃!d\t”

Worst thing that happens is 1.8 turns my beautiful Unicode into ugly hex
escapes when pretty-printing, since it’s all just a binary string to
1.8. But
if I print it straight out with puts, I get the original Unicode stuff
back,
and 1.9.1 handles this gracefully.

I mean, I’ve got a friggin’ SNOWMAN in there. Just what characters have
you
discovered that you can’t make a string out of?

And, as it suggests, you can use :“foo” or :‘foo’ as shorthand for the
above.

So if tom1, tom2, tom3… go out of scope they can be garbage

collected…

Same is true of any string you call to_sym on.

you can test for

equality with .object_id == .object_id!

I don’t know if String is smart enough to do that (I’d hope so), but
Symbol
certainly would be.

So what does this solve over just using symbols, other than the fact
that you
could manually prune the one_true_string hash?

2010/8/26 John C. [email protected]:

p file_index[“foo”]
p file_index[“foo”]
p file_index[“bah”]
p file_index[“foo”]
p file_index[“bah”]
pp file_index

I prefer

irb(main):001:0> file_index = Hash.new {|h,k| h[k] = h.size}
=> {}
irb(main):002:0> file_index[“foo”]
=> 0
irb(main):003:0> file_index[“foo”]
=> 0
irb(main):004:0> file_index[“bah”]
=> 1
irb(main):005:0> file_index[“foo”]
=> 0
irb(main):006:0> file_index[“bah”]
=> 1
irb(main):007:0> file_index
=> {“foo”=>0, “bah”=>1}

Similar to “to_sym”, but can cope with spaces and weird characters…

So if tom1, tom2, tom3… go out of scope they can be garbage collected…

so if you end up just holding tomtom, you have only one copy of “tom” and

you can test for

equality with .object_id == .object_id!

We have #equal? for that.

one_true_string = Hash.new{|hash,string| hash[string] = string}

Note that this wastes a bit of memory because of the unfrozen string
Hash optimization:

irb(main):009:0> one_true_string = Hash.new{|hash,string| hash[string] =
string}
=> {}
irb(main):010:0> one_true_string[“a”]
=> “a”
irb(main):011:0> one_true_string.each {|k,v| puts k.equal? v}
false
=> {“a”=>“a”}

Better freeze the string:

irb(main):012:0> one_true_string = Hash.new{|hash,string|
hash[string.freeze] = string}
=> {}
irb(main):013:0> one_true_string[“a”]
=> “a”
irb(main):014:0> one_true_string.each {|k,v| puts k.equal? v}
true
=> {“a”=>“a”}

Cheers

robert