Association Field Find Caching Technique

Hi everyone,

I tend to be searching my association collections a lot for specific
field values. I could do a foo.bars.find_all_by_fieldname(fieldvalue)
each time, but why hit the database over again? Especially when the
full collection is already in memory. I wrote this module to extend
associations to look at the already loaded collection for field/value
matches.

module MyAssociationExtentions

def field_find(field, value, opts = {})
@field_cache = nil if opts[‘reload’]
((@field_cache ||= {field => {}})[field] ||= {})[value] ||=
self.is_a?(Enumerable) ? (self.select { |task| task.send(field) == value
}) : (self if self.send(field) == value)
end

end

Here it is again written out into multi lines so it is easier to read in
the forum

module MyAssociationExtentions
def field_find(field, value, opts = {})
@field_cache = nil if opts[‘reload’]
@field_cache = {} unless @field_cache
@field_cache[field] = {value => []} unless @field_cache[field]
@field_cache[field][value] ||= self.is_a?(Enumerable) ? (self.select
{ |task| task.send(field) == value }) : (self if self.send(field) ==
value)
end
end

Questions:

  1. I use a multi-dimentional hash to store each potential field/value
    lookup. Is this too memory intensive?
  2. Does this even theoretically improve performance vs the database? or
    is it a waste of time
  3. Is there a better way to write that line (all those annoying checks
    to see if the hash is already there)
  4. could I push this into memcache to lower the memory usage by
    distributing it across mongrels.

Thanks

Steve,

Have you tried to benchmark your solution, this should answer the
question whether this solution has any performance gains. Generally
speaking anything stored in physical memory is accessible much much
faster than any IO operations.

Also in your solution, when do you invalidate the @field_cache and re-
read the field values from database ?

Regards,
-daya

On Apr 21, 12:14 pm, Steve M. [email protected]

Hi Daya,

Don’t know how I missed out on require ‘benchmark’, but I did some
testing with it, and it is so much faster for finding by field.

The first time it runs it performs about the same as doing a find_by
because it hasn’t loaded the collection, if the collection is already in
memory it is lightning fast. I have added a reload flag that will skip
the use of the @field_cache in case dynamic data is being used.

Once the collection loaded finders should not hit the DB anymore, they
are too expensive, Let me know if you see any holes in this. -Steve

Here are some benchmarks

setup
foo = Foo.find(:first)

Hitting the DB on each look
Benchmark.bm { |x| x.report { 5000.times {
foo.bars.find_all_by_some_field(1) } } }
user system total real
14.260000 3.030000 17.290000 ( 18.280076)

Without Field Caching

Benchmark.bm { |x| x.report { 5000.times {
foo.bars.field_finder(“some_field”, 1, true) } } }
user system total real
0.210000 0.070000 0.280000 ( 0.269146)

With Field Caching

Benchmark.bm { |x| x.report { 5000.times {
foo.bars.field_finder(“some_field”, 1, false) } } }
user system total real
0.110000 0.040000 0.150000 ( 0.155943)