Fast portable storage for queues

snacktime · October 23, 2006, 1:57am

I’ve tested out a couple of ways of storing a queue structure and
wondered if others had some better ideas. I tried a berkeleydb based
queue using it’s native queue structure, which gives the best
performance so far but it’s for unix only. I also have a file based
queue where every message is just stored in a sequentially numbered
file. Almost as fast as berkeleydb and works anywhere. Just for
giggles I tried sqlite, and then immediately deleted it. 6-8 tps with
sqlite compared to around 1000 with berkeleydb/flat files.

Another alternative I was thinking of is a single file storage of some
type, maybe fixed length or even variable length records. Something
that’s portable. But I’m thinking that could get complicated fairly
quickly.

Any other ideas?

snacktime · October 23, 2006, 2:09am

snacktime wrote:

type, maybe fixed length or even variable length records. Something
that’s portable. But I’m thinking that could get complicated fairly
quickly.

Any other ideas?

http://raa.ruby-lang.org/project/rq/

(That’s sqlite again, though.)

What are your requirements?

portable to which platforms?
who are the consumers and producers?

I thought BDB did work on windows… or is it just the queueing
mechanism that is not portable?

snacktime · October 23, 2006, 3:35am

What are your requirements?

What I’d really like is something that’s portable to pretty much
everything, so that I can standardize on one queue storage if
possible. The file queue is actually fairly good, but I’d like to
cut down on the IO if possible by just having everything in one or two
files. Plus I just wanted to see if I missed something obvious, it’s
happened before:)

portable to which platforms?

unix and windows at a minimum.

who are the consumers and producers?

It’s for stompserver.

I thought BDB did work on windows… or is it just the queueing
mechanism that is not portable?

BDB does but as far as I know the ruby extension for it only works on
unix.

snacktime · October 23, 2006, 6:14am

On 10/22/06, snacktime [email protected] wrote:

Another alternative I was thinking of is a single file storage of some
type, maybe fixed length or even variable length records. Something
that’s portable. But I’m thinking that could get complicated fairly
quickly.

Any other ideas?

Kirk H. was involved in a thread here a couple of months back in
which
he made the case that it’s perfectly fast and perfectly scalable to just
use
the filesystem for moderate requirements. His scheme used filenames that
were actually hashes of the contents, and as I recall he had a mechanism
for
automatically generating a directory hierarchy based on the content
hashes
to keep the inode sizes under control.

Having implemented all kinds of crazy single-file storage systems for
high-scalability message-queueing systems, I initially thought Kirk’s
idea
was just loopy. But sometimes the simplest solutions are the best. I
remember trying it out with some quick tests and it worked great.

snacktime · October 23, 2006, 1:12pm

On 10/23/06, [email protected] [email protected] wrote:

Kirk H. was involved in a thread here a couple of months back in which

naming scheme is itself portable.

Kirk H.

Not a queue, but i use memcache for such needs, where i must offload
data storage to some external engine, because Ruby doesn’t play nice
with ever growing "in memory " data structures.

Of course…its not suitable for many things, but it works for me and
its quite fast. Any cons?

snacktime · October 23, 2006, 11:38am

On Mon, 23 Oct 2006, Francis C. wrote:

he made the case that it’s perfectly fast and perfectly scalable to just use
the filesystem for moderate requirements. His scheme used filenames that
were actually hashes of the contents, and as I recall he had a mechanism for
automatically generating a directory hierarchy based on the content hashes
to keep the inode sizes under control.

Francis remembers right. I did this for, essentially, a replacement for
pstore that didn’t need to read an index into memory for each
transaction,
so it’d have flat memory usage and speed regarless of the size of number
of elements in the store. It works pretty well.

In my case I create a hash of the storage key and use that as part of
the
filename, and then I had the code make subdirectories based on the
starting letters of the hash in order to avoid having too many files in
any one directory. My default was to take a chunk of two characters
from
the beginning of the hash as subdirectory name, twice.

So a hash starting with ‘abd45cc’ would have it’s file in ab/d4/

The speed of the approach is tied very directly to the speed of the IO
system, but it’s relatively fast and very portable so long as your file
naming scheme is itself portable.

Kirk H.

snacktime · October 23, 2006, 3:15pm

Hey Chris,

Thanks for the continued interest and work on stompserver, I am on my
way home and will be helping out soon. I will give a big thumbs up to
using the file system. It is actually where I was planning on going,
but used Madeleine because it was faster to get started.

Another approach that I am planning on investigating is on using mmap
(based on a discussion and idea I while talking with Daniel B. at
RubyConf) to provide a circular queue space – this should be very
efficient and simple enough to implement – until the allocated queue
is full. Then special logic and handling or an expensive resize would
be required.

pth

snacktime · October 23, 2006, 2:41pm

On Mon, 23 Oct 2006, hemant wrote:

Of course…its not suitable for many things, but it works for me and
its quite fast. Any cons?

memcache is something of a different beast, but that aside, the con to
memcache is that one must have memcache installed. There is no barrier
to
usage for some sort of simple to-disk persistence mechanism because they
can be trivially written using only the libraries in the standard Ruby
distribution.

Memcache also doesn’t help if one wants the data to survive machine
reboots, or be something that a person can transfer to another computer,
archive, edit, or whatever.

On the flipside, though, if neither of those are a concern, memcache is
going to give better speed.

Kirk H.

snacktime · October 23, 2006, 7:26pm

On 10/23/06, Francis C. [email protected] wrote:

(based on a discussion and idea I while talking with Daniel B. at
bloody hell, but it’s also (unfortunately) a compiled Ruby extension. Let me
know if you want to play with it when you get home. If you like it, maybe
you can convert it to pure Ruby without losing too much performance. One of
the keys to performance is fixed-length record-buffers, which means there’s
no record-table. All lookups are O(1).

Send it across Franics, I would be delighted to check it.

snacktime · October 23, 2006, 4:14pm

On 10/23/06, Patrick H. [email protected] wrote:

RubyConf) to provide a circular queue space – this should be very
efficient and simple enough to implement – until the allocated queue
is full. Then special logic and handling or an expensive resize would
be required.

pth

mmap and circular space: I have an implementation of that. It’s fast as
bloody hell, but it’s also (unfortunately) a compiled Ruby extension.
Let me
know if you want to play with it when you get home. If you like it,
maybe
you can convert it to pure Ruby without losing too much performance. One
of
the keys to performance is fixed-length record-buffers, which means
there’s
no record-table. All lookups are O(1).

snacktime · October 23, 2006, 9:37pm

On Tue, 24 Oct 2006, hemant wrote:

Another approach that I am planning on investigating is on using mmap
mmap and circular space: I have an implementation of that. It’s fast as
bloody hell, but it’s also (unfortunately) a compiled Ruby extension. Let
me know if you want to play with it when you get home. If you like it,
maybe you can convert it to pure Ruby without losing too much performance.
One of the keys to performance is fixed-length record-buffers, which means
there’s no record-table. All lookups are O(1).

Send it across Franics, I would be delighted to check it.

hi hemant-

can you elaborate on your requirements? eg, what kind of objects are
being
put into the queue? is it fifo, lifo, priority based, etc? how big are
the
objects? how many of them? do they timeout or grow stale?

cheers.

-a

snacktime · October 24, 2006, 7:20am

[email protected] wrote:

Kirk H. was involved in a thread here a couple of months back in
Francis remembers right. I did this for, essentially, a replacement for
So a hash starting with ‘abd45cc’ would have it’s file in ab/d4/

The speed of the approach is tied very directly to the speed of the IO
system, but it’s relatively fast and very portable so long as your file
naming scheme is itself portable.

Kirk,

Did you get a feeling for what the break-even point was? In other words,
how many files do you need before the depth 2 scheme beats a single flat
dir?

Playing around with this idea in fsdb, with 100_000 files, access times
seem to get worse as depth varies from 0 to 2. Creation time was better
at depth==1 than in either of the other cases.

I’m not at all confident of benchmarking this though, since it’s likely
to be so sensitive to many things. FWIW, this is on linux 2.6.15, ext3
with no special params, 1.7GHz centrino, 40Gb Fujitsu laptop drive,
running at nice -n -20.

snacktime · October 24, 2006, 7:18am

On Mon, 23 Oct 2006, snacktime wrote:

type, maybe fixed length or even variable length records. Something
that’s portable. But I’m thinking that could get complicated fairly
quickly.

Any other ideas?

why not:

harp:~ > cat a.rb
require ‘dbm’

n = 10000
db = DBM.open ‘db’
at_exit{ db.close }

kvs = Array.new(n).map{ [ k = rand(n).to_s, v = rand(n).to_s ] }

a = Time.now.to_f; kvs.each{|k,v| db[k] = v}; b = Time.now.to_f
elapsed = b - a

puts(“elapsed : #{ elapsed }”)
puts(“tps : #{ n / elapsed }”)

harp:~ > ruby a.rb
elapsed : 0.220124006271362
tps : 45428.9387576941

that’s damn fast: it runs at nearly the exact same speed on my windows
box too.
btw, dbm will in fact be backed by bdb on any newish *nix dist, and ndbm
for
the windows one-click installer - either way though, you get it in the
standard
lib and, even if you decide to marshal objects it’s still extremely
fast, easy
to use, and disk based.

it should be very robust, esp in the case of a bdb backed impl since bdb
is acid.

regards.

-a

snacktime · October 24, 2006, 3:49pm

On Tue, 24 Oct 2006, Joel VanderWerf wrote:

Playing around with this idea in fsdb, with 100_000 files, access times seem
to get worse as depth varies from 0 to 2. Creation time was better at
depth==1 than in either of the other cases.

Not really. The IO performance on my test machine is so bad that it
didn’t seem to make a significant difference.

You have me curious now, though, so I may go test it on a box with
better
IO and see what I can find out.

Kirk H.

snacktime · October 24, 2006, 7:40am

that’s damn fast: it runs at nearly the exact same speed on my windows box too.
btw, dbm will in fact be backed by bdb on any newish *nix dist, and ndbm for
the windows one-click installer - either way though, you get it in the standard
lib and, even if you decide to marshal objects it’s still extremely fast, easy
to use, and disk based.

I didn’t realize ruby dbm was supported on windows. I’ll have to
throw a dbm storage together and see how it does.

Chris

snacktime · October 24, 2006, 4:37pm

On Tue, 24 Oct 2006 [email protected] wrote:

Playing around with this idea in fsdb, with 100_000 files, access times
seem to get worse as depth varies from 0 to 2. Creation time was better at
depth==1 than in either of the other cases.

Not really. The IO performance on my test machine is so bad that it didn’t
seem to make a significant difference.

You have me curious now, though, so I may go test it on a box with better IO
and see what I can find out.

doesn’t ext3 use btree under the hood? if so you’d basically just be
adding a
method call per level since the lookup of 100_000 is still quite fast -
on the
otherhand anypeice of code that puts 100_000 file in one directory is
evil!

i guess what i’m driving at is that what you may be comparing is method
call
overhead vs big-O operation and not IO speed. i had this happen to me
once
before looking for the fastest way to search a huge in memory table. i
tried
gperf, std:: crap, sqlite in mem, bdb in mem, etc. in the end c’s built
in
bsearch, including sort, was way faster for even 10 or 20 searches. the
reason was that, after the initial hit of sorting, the function call
overhead
for bsearch, which just flips pointers all over the place, was far less
that
for that of gperf, which made quite a few function calls. anyhow, it
was as
suprise to me that a O(log n) solution ran much faster in the real world
than an O(1) one with prety dang big data sets…

food for thought - be curious to hear what you find out.

-a

snacktime · October 24, 2006, 5:15pm

On Oct 23, 2006, at 4:36 AM, [email protected] wrote:

In my case I create a hash of the storage key and use that as part
of the filename, and then I had the code make subdirectories based
on the starting letters of the hash in order to avoid having too
many files in any one directory. My default was to take a chunk of
two characters from the beginning of the hash as subdirectory name,
twice.

So a hash starting with ‘abd45cc’ would have it’s file in ab/d4/

This sounds like it would make a neat open source project. Any plans
to share the code?

James Edward G. II

snacktime · October 24, 2006, 7:54pm

James Edward G. II wrote:

This sounds like it would make a neat open source project. Any plans to
share the code?

James Edward G. II

I’m guessing it is something like this:

— flat.rb —

Fast, flat storage based on Kirk H.’ technique.

require ‘fsdb’
require ‘digest/md5’

class FlatDB < FSDB::Database

def initialize(path, depth = 2)
raise ArgumentError, “Invalid depth #{depth} > 32” if depth > 32

 @path_from_key = {}
 @path_pat = Regexp.new("^" + "(..)"*depth)
 @depth = depth

 super(path)

end

def path_from_key(key)
path = @path_from_key[key]
unless path
if @depth > 0
path_components =
Digest::MD5.hexdigest(key).scan(@path_pat).first
path_components << key
path = path_components.join("/")
else
path = key
end
@path_from_key[key] = path
end
path
end

def browse(key)
super path_from_key(key)
end

def edit(key)
super path_from_key(key)
end

def replace(key)
super path_from_key(key)
end

def delete(key, load=true)
@path_from_key.delete key
## should probably purge this hash periodically
super path_from_key(key), load
end

don’t bother with #insert, #fetch, #[], #[]= since they are

inherently less efficient

end

db = FlatDB.new(’/tmp/fsdb/flat’, 2)

db.replace ‘foo.txt’ do
“this is the foo text”
end

db.browse ‘foo.txt’ do |x|
p x
end

key names can have ‘/’ in them, in which case they reference deeper

subdirs
db.replace ‘sub/foo.txt’ do
“this is the subdir’s foo text”
end

db.browse ‘sub/foo.txt’ do |x|
p x
end

require ‘benchmark’

Benchmark.bm(10) do |bm|
nfiles = 100

[0,1,2].each do |depth|
db = FlatDB.new("/tmp/fsdb/depth_#{depth}", depth)

 puts "\ndepth=#{depth}"

 bm.report "create" do
   nfiles.times do |i|
     db.replace i.to_s do
       i  # this will be marshaled
     end
   end
 end

 bm.report "access" do
   nfiles.times do |i|
     db.browse i.to_s do |j|
       raise unless i == j
     end
   end
 end

end
end

END

results for nfiles=100_000, linux 2.6.15, ext3 with no special params,
1.7GHz centrino, 40Gb Fujitsu laptop drive, running at nice -n -20:

“this is the foo text”
“this is the subdir’s foo text”
user system total real

depth=0
create 72.680000 1772.030000 1844.710000 (1853.686824)
access 55.780000 13.090000 68.870000 ( 97.170382)

depth=1
create 125.170000 24.250000 149.420000 (329.576419)
access 143.210000 12.040000 155.250000 (759.768371)

depth=2
create 263.900000 32.570000 296.470000 (1950.482468)
access 195.200000 17.250000 212.450000 (1562.214063)

du -sk depth_0

804236 depth_0

du -sk depth_1

804832 depth_1

du -sk depth_2

1006408 depth_2