Forum: Ruby on Rails How best to handle non-serializable session data?

Announcement (2017-05-07): www.ruby-forum.com is now read-only since I unfortunately do not have the time to support and maintain the forum any more. Please see rubyonrails.org/community and ruby-lang.org/en/community for other Rails- und Ruby-related community platforms.
Wes G. (Guest)
on 2006-04-13 00:13
I have a piece of data that needs to persist across requests that is not
serializable.  It's a Rubyful soup parse tree and it's very expensive to
instantiate and I need it for a while in my app.

Therefore, by default, it can't be stored in the session since the
default session storage mechanism is pstore.

One option I have to is change the session storage mechanism to
in-memory only.

Otherwise, if I want to use the regular session storage (pstore) and
still have this object available across requests, then it seems like I
would have to implement my own little in-memory cache.

Has anyone got any better ideas than this?

Thanks,
Wes
Wes G. (Guest)
on 2008-01-18 20:43
Someone responded to me off list about this issue, so I thought I would
update this thread with what I ended up having to do.

You can't use a solution like memcached to store this type of data,
since it is not serializable.

The best solution that I could come up with was an in-memory cache.  The
problem with this of course, is that Rails apps. are running in separate
Ruby processes, so implementing an in-memory cache immediate implies
that each process has its own cache (this is why Rails is into
"shared-nothing").  Obviously, if you want the state of these stored
objects to be represented consistently in the app., then you have to
figure out how to manage the caches which may or may not be present
across all running Ruby processes.

If each Ruby process has its own cache, then there is a significant
probability that you may hit process A on one request, establish a cache
entry, then hit process B on another request, and have to regenerate the
same cache entry, etc.  So this approach may only make sense when the
number of potential cache requests is > the number of Ruby processes on
your back end.  Then, worst case, you only do your expensive cache-entry
generation N times (where N is the number of Ruby processes) to service
some X (where X > N) number of requests.  Of course, perhaps the same
process gets hit on every request and you reap the maximum benefit from
one in-process cache (1 cache-entry generation for X requests).

Here's what I ended up with:

I created a custom Cache object show below:

class WesCache::Cache < Hash
  REFRESH_TIME_KEY_PART = "_last_refresh_time"

  def needs_refreshing?(key, time_to_refresh)
    self.refresh_time(key) < time_to_refresh
  end

  def refresh_time(key)
    self.set_refresh_time(key, Time.now) unless
self.has_key?("#{key}#{REFRESH_TIME_KEY_PART}")
    self["#{key}#{REFRESH_TIME_KEY_PART}"]
  end

  def set_refresh_time(key, refresh_time)
    self["#{key}#{REFRESH_TIME_KEY_PART}"] = refresh_time
  end

  def delete(key)
    super("#{key}#{REFRESH_TIME_KEY_PART}")
    super
  end
end

The key values embed the session id somewhere so that each in-process
cache may be holding data related to any number of sessions.  This cache
also holds a key within itself (also implicitly on a per session basis)
that represents the last time that this local cache values was refreshed
(which in my case, means deleted - my objects are read-only so they
either exist or they don't, so "refereshing" doesn't mean update, it
means removed).

Then there is a concept of "global last refresh time" which is managed
_globally_ for the application.  The unified value of the last refresh
time for a given _cache and key within it_ is stored in a memcache.  So,
to summarize, each Ruby process has its own "smart hash" cache that can
keep track of when the last refresh was done for _a given key_ for
itself.  Then, it can compare the local refresh time against the global
refresh time and know to remove its local entry (and thus cause it to be
regenerated by the caller).

The object that makes use of all of this is the CacheManager - in
retrospect, I might have moved all of this logic into the Cache object
itself.  I might refactor this in the future.

Here's the CacheManager:

require 'wes_cache/cache'

#The refresh times are stored in the _memcache_ cache which is referred
to through "Cache".
#DO NOT CONFUSE our caches (which are WesCaches) with the memcache.
#The parse_tree_cache and the list_data_cache are WesCaches (local to
the Ruby process).
#The parse_tree_refresh_times and the list_data_refresh_times are
memcaches ("global" to all Ruby processes).
class CacheManager
  @@logger = RAILS_DEFAULT_LOGGER

  #Local process based caches (effectively hashes)
  @@parse_tree_cache = WesCache::Cache.new
  @@list_data_cache = WesCache::Cache.new

  #"Global" memcaches (for access by any process)
  Cache.put("parse_tree_refresh_times", Hash.new) if
Cache.get("parse_tree_refresh_times").nil?
  Cache.put("list_data_refresh_times", Hash.new) if
Cache.get("list_data_refresh_times").nil?

  def self.parse_tree_cache(key)
    get_cache("parse_tree_refresh_times", @@parse_tree_cache, key)
  end

  def self.remove_from_parse_tree_cache(key)
    remove_from_cache("parse_tree_refresh_times", @@parse_tree_cache,
key)
  end

  def self.list_data_cache(key)
    get_cache("list_data_refresh_times", @@list_data_cache, key)
  end

  def self.remove_from_list_data_cache(key)
    remove_from_cache("list_data_refresh_times", @@list_data_cache, key)
  end

private
  def self.remove_from_cache(refresh_times_cache_name, cache, key)
    refresh_times_cache = Cache.get(refresh_times_cache_name)
    refresh_times_cache[key] = Time.now
    Cache.put(refresh_times_cache_name, refresh_times_cache)
    cache.delete(key)
  end

  def self.get_cache(refresh_times_cache_name, cache, key)
    refresh_times_cache = Cache.get(refresh_times_cache_name)
    last_global_refresh_time = refresh_times_cache[key] || Time.now
    @@logger.debug("\tLast global refresh time is:
#{last_global_refresh_time}")
    @@logger.debug("\tLast time this cache was refreshed:
#{cache.refresh_time(key)}")
    if cache.needs_refreshing?(key, last_global_refresh_time)
      @@logger.info("Need to refresh list data cache for key #{key}")
      cache.set_refresh_time(key, Time.now)
      cache.delete(key)
    else
      @@logger.info("Don't need to refresh list data cache for key
#{key}")
    end

    cache
  end
end

I realize that all of this may be confusing.  If anyone find it useful,
I'm happy to answer any questions.

Wes
This topic is locked and can not be replied to.