On Sat, 12 Aug 2006, Francis C. wrote:
Kirk, how did you do this? Are you storing immutable objects and
turning older versions into garbage somehow? If you have no indexing,
then how do you enforce a consistent view across processes without
incurring disk i/o (at least having to read the inode)? Or are you
managing an arena that is backed by shared memory and you convert the
key values deterministically into offsets?
Sorry for all the questions, but I’ve just had to fight with this
problem too many times and it would be great if you’ve come up with a
I don’t think it is necessarily a better mousetrap. In this case, I’m
just willing to make the filesystem do some of the work for me instead
keeping a separate index in memory.
I’m using a modified version of Ara Howard’s lockfile.rb to handle
between processes. It works over NFS, and I have modified it so that it
also works on Windows by automatically falling back to a Windows safe
I create a hash (Digest::SHA512 based) out of the key, and use that key
the filename for the data. There is also a second file written that
contains some metadata (the structure on disk is actually a linked list
more easily support some of the LRU aspects and element expiration).
There are a couple of metadata files that also maintain some information
on the data store as a whole.
The code automatically breaks the file storage into a tree of
to help avoid any filesystem problems with having too many files in a
single directory. The size of this tree is tuneable so that a data
that may have a million entries may have more nodes (directories) than
that is only expected to have 10000.
So, retrieving an element from the data store is reduced to hashing the
key, finding the file, and reading it. It’s disk i/o, but less than
PStore generates most of the time.
Since my need was for something that could do this while also being
treated as an LRU cache, there is some extra work mixed into there to
maintain the meta data and expire elements from the cache (it can also
expire them based on age), and that’s really where most of the
hit comes from.
I’m working on creating a second version that strips out all of the LRU
cache code to provide a simple, fast data persistence implementation
without any of that extra overhead, and will be better able to report on
just how much of a performance hit that overhead brings with it soon.