TupleSpace performance (TupleBag really)

mfaine · February 7, 2007, 6:23pm

We’ve been struggling with this problem for months. We use TupleSpace to
implement a distributed processing framework and periodically if the
number
of objects
in the TupleBag gets large (not ridiculous but 50,000 or so) the
TupleSpace
begins to take 100pct cpu and is effectively neutered and must be
restarted.
We’ve been trying to come up
with a better implementation of TupleBag but have not had much luck so
far.
Has anyone else done this?

Mark

mfaine · February 8, 2007, 2:24am

On Feb 7, 2007, at 09:21, Mark Alexander F. wrote:

with a better implementation of TupleBag but have not had much luck
so far.
Has anyone else done this?

I’ve heard a few people talk of it (mainly, TupleSpace with a
persistence (database)), but to my knowledge nobody’s done it.

mfaine · February 9, 2007, 10:30am

Ryan D. wrote:

restarted.
We’ve been trying to come up
with a better implementation of TupleBag but have not had much luck so
far.
Has anyone else done this?

What’s happening is that you’ve populated a hash enough that you’ve
saturated the bins. You’re now hitting a lot of hash collisions.
Do Ruby’s hashes not extend themselves once they hit a certain
saturation? Is there a hard limit at play here? Or is it simply a
matter of the hash function not distributing keys well?

mfaine · February 9, 2007, 11:50am

On Fri, Feb 09, 2007 at 09:50:34AM +0900, Ryan D. wrote:

What’s happening is that you’ve populated a hash enough that you’ve
saturated the bins. You’re now hitting a lot of hash collisions. You
can either improve the hash method on the tuples (possibly… really
depends on what you’re storing), or switch the underlying data
structure (maybe a balanced tree will suit you better at large tuple
populations).

Are you sure? In Ruby, the number of bins is increased (exponentially)
when
there are more than 5 entries per bin…

Actually, in a TupleBag tuples are indexed by their size and stored in
an
array. Finding a tuple matching a given template in that array is O(n)

##
# Finds a live tuple that matches +template+.

def find(template)
  @hash.fetch(template.size, []).find do |tuple|
    tuple.alive? && template.match(tuple)
  end
end

If you want to make TupleSpace perform better, you’ll have to make that
operation (finding a live tuple matching an arbitrary template) faster.

[Of course, the TupleSpace behaves much better when it gets the read
request
before the tuple, since it just has to scan the list of templates
waiting
for a tuple.]

mfaine · February 9, 2007, 1:51am

On Feb 7, 2007, at 9:21 AM, Mark Alexander F. wrote:

with a better implementation of TupleBag but have not had much luck
so far.
Has anyone else done this?

What’s happening is that you’ve populated a hash enough that you’ve
saturated the bins. You’re now hitting a lot of hash collisions. You
can either improve the hash method on the tuples (possibly… really
depends on what you’re storing), or switch the underlying data
structure (maybe a balanced tree will suit you better at large tuple
populations).

Or, if you’re using patterns ala Gelernter, you might be able to pre-
partition your spaces into multiple bags based on activities or
specialists (or job or task type). Sounds like you’ve probably
already tried that tho.

mfaine · February 9, 2007, 11:58am

On Fri, Feb 09, 2007 at 06:29:48PM +0900, Alex Y. wrote:

Ryan D. wrote:

What’s happening is that you’ve populated a hash enough that you’ve
saturated the bins. You’re now hitting a lot of hash collisions.
Do Ruby’s hashes not extend themselves once they hit a certain
saturation? Is there a hard limit at play here?

The number of bins is doubled when there are more than 5 entries per bin
(avg.).

Or is it simply a matter of the hash function not distributing keys well?

It’s a matter of the tuples being indexed by their sizes, and performing
O(n)
matching over all the tuples of a given size to find one matching a
given
template (see my other message).

mfaine · February 10, 2007, 2:49am

On 9-Feb-07, at 7:04 PM, [email protected] wrote:

On Thu, 8 Feb 2007, Eric H. wrote:

I’ve heard a few people talk of it (mainly, TupleSpace with a
persistence (database)), but to my knowledge nobody’s done it.

i have. if people like it, i’ll release next week.

What version/implementation of sqlite are you using?

Cheers,
Bob

Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

mfaine · February 10, 2007, 3:11am

On Sat, 10 Feb 2007, Bob H. wrote:

i have. if people like it, i’ll release next week.

What version/implementation of sqlite are you using?

1.3.1 - oops.

but, i’ll tweak so it works with both the older and newer (2.0.1)
versions…

sorry bout that.

what are you running?

-a

mfaine · February 10, 2007, 1:05am

On Thu, 8 Feb 2007, Eric H. wrote:

I’ve heard a few people talk of it (mainly, TupleSpace with a persistence
(database)), but to my knowledge nobody’s done it.

i have. if people like it, i’ll release next week.

it’s a hack
it’s persistent

jib:~ > cat a.rb
require ‘sqlitespace’

rss = Rinda::SQLiteSpace.new
any_pair = [nil, nil]
rss.read_all(any_pair).sort.each{|t| p t} # note this prints FIRST!
rss.write [Time.now, rand]

jib:~ > ruby a.rb

jib:~ > ruby a.rb
[Fri Feb 09 16:53:37 MST 2007, 0.388129945611581]

jib:~ > ruby a.rb
[Fri Feb 09 16:53:37 MST 2007, 0.388129945611581]
[Fri Feb 09 16:53:52 MST 2007, 0.250816953601316]

jib:~ > ruby a.rb
[Fri Feb 09 16:53:37 MST 2007, 0.388129945611581]
[Fri Feb 09 16:53:52 MST 2007, 0.250816953601316]
[Fri Feb 09 16:53:54 MST 2007, 0.400566845666617]
it’s pretty fast (and uses little cpu)

jib:~ > cat b.rb
require ‘sqlitespace’

rss = Rinda::SQLiteSpace.new

rss.transaction{ 42424.times{ write [Time.now, rand] } }

jib:~ > time ruby b.rb

real 0m19.063s
user 0m18.110s
sys 0m0.560s
it’s a hack
obviously it requires sqlite to be installed, but that’s it
it’s a hack

here’s the code:

harp:~ > cat sqlitespace.rb

require ‘monitor’
require ‘thread’
require ‘drb/drb’
require ‘rinda/rinda’
require ‘rinda/tuplespace’
require ‘fileutils’
require ‘base64’

begin
require ‘rubygems’
rescue LoadError
42
ensure
require ‘sqlite’
end

module Rinda
class SQLiteSpace < ::Rinda::TupleSpace
def initialize(timeout=60, prefix=home_dir)
super()
ensure
setup prefix
end

 %w[ prefix dir bag read_waiter take_waiter notify_waiter ].each{|a|

attr a}

 def setup(prefix=home_dir)
   @prefix = prefix
   FileUtils.mkdir_p @prefix
   @dir = File.join @prefix, 'sqlitespace'
   FileUtils.mkdir_p @dir

   @bag = SQLiteBag.new File.join(@dir, 'bag')
   @read_waiter = SQLiteBag.new File.join(@dir, 'read_waiter')
   @take_waiter = SQLiteBag.new File.join(@dir, 'take_watier')
   @notify_waiter = SQLiteBag.new File.join(@dir, 'notify_watier')
 end

 def transaction *a, &b
   @bag.transaction{
     @read_waiter.transaction{
       @take_waiter.transaction{
         @notify_waiter.transaction{
           instance_eval &b
         }
       }
     }
   }
 end


 def home_dir
   ENV['HOME'] || ENV['USERPROFILE']  # *nix || winblows
 end

end

class SQLiteBag
class DB
SCHEMA = <<-SQL
create table bag(
id integer primary key,
size integer,
blobsize integer,
blob blob
)
SQL

   %w[ path db in_transaction ].each{|a| attr a}

   def initialize path
     @path = path
     @db = ::SQLite::Database.new(path) rescue

::SQLite::Database.new(path,0)
@in_transaction = false
setup
end

   def setup
     begin
       @db.execute(SCHEMA){}
     rescue ::SQLite::SQLException => e
       42
     end
   end

 #
 # for older rubys
 #
   begin
     Base64
     def encode64 blob
       Base64.encode64 blob
     end

     def decode64 blob
       Base64.decode64 blob
     end
   rescue NameError
   end

 #
 # handles DRbUndumped feature of Rinda::TupleEntry class
 #
   def encode obj
     data = [obj.class]
     obj.instance_variables.each do |ivar|
       value = obj.instance_variable_get ivar
       data << [ivar, value]
     end
     blob = encode64(Marshal.dump(data))
   end
 #
 # handles DRbUndumped feature of Rinda::TupleEntry class
 #
   def decode blob
     data = Marshal.load(decode64(blob))
     klass = data.shift
     obj = klass.allocate
     data.each{|ivar, value| obj.instance_variable_set ivar, value}
     obj
   end

   def execute *a, &b
     @db.execute *a, &b
   end

   def transaction *a, &b
     raise 'nested transaction' if @in_transaction
     begin
       @db.execute 'begin transaction'
       @in_transaction = true
       yield
     ensure
       @db.execute 'end transaction'
       @in_transaction = false
     end
   end

   def template_entry(ary)
     if ::Rinda::TemplateEntry === ary
       ary
     else
       ::Rinda::TemplateEntry.new ary
     end
   end

   def push(ary)
     size = ary.size
     blob = encode ary
     blobsize = blob.size
     @db.execute "insert into bag values(null, #{ size }, #{

blob.size }, ‘#{ blob }’)"
end

   def delete(ary)
     size = ary.size
     blob = encode ary
     blobsize = blob.size
     @db.execute "delete from bag where size=#{ size } and

blobsize=#{ blobsize } and blob=’#{ blob }’"
end

   def find_all(template, limit=nil, &block)
     template= Rinda::TemplateEntry.new(Array.new(template)) if

Numeric === template
size = template.size
all = []
@db.execute “select blob from bag where size=#{ size } #{ limit
? (‘limit=%d’ % limit) : ‘’ }” do |tuple|
blob = tuple[‘blob’]
t = decode blob
if t.alive? && template.match(t)
block ? block[t] : (all << t)
end
end
block ? nil : all
end

   def find(template, limit=1, &block)
     find_all(template, limit).first
   end

   def find_all_template(tuple, limit=nil, &block)
     size = tuple.size
     all = []
     @db.execute "select blob from bag where size=#{ size } #{ limit

? (‘limit=%d’ % limit) : ‘’ }" do |tuple|
blob = tuple[‘blob’]
t = decode blob
if t.alive? && t.match(tuple)
block ? block[t] : (all << t)
end
end
block ? nil : all
end

   def delete_unless_alive
     deleted = []
     @db.transaction do
       find_all do |tuple|
         unless tuple.alive?
           delete tuple
           deleted << tuple
         end
       end
     end
     deleted
   end
 end

 %w[ path db ].each{|a| attr a}

 def initialize path
   @path = path
   @db = DB.new path
   @hash = {}
 end

 def transaction *a, &b
   @db.transaction *a, &b
 end

 %w( push delete find_all find find_all_template delete ).each do

|m|
module_eval <<-code
def #{ m } *a, &b
@db.#{ m } *a, &b
end
code
end
end
end

enjoy.

-a

we can deny everything, except that we have the possibility of being
better.
simply reflect on that.

the dalai lama

-a

mfaine · February 10, 2007, 4:02pm

On 9-Feb-07, at 10:01 PM, [email protected] wrote:

I’ve heard a few people talk of it (mainly, TupleSpace with a
persistence (database)), but to my knowledge nobody’s done it.
i have. if people like it, i’ll release next week.

What version/implementation of sqlite are you using?

give this one a whirl…

Thanks.

Okay I admit that I never quite worked out what was going on with the
sqlite packages in ruby. There are two it seems in the same project:
sqlite and sqlite3. And I think there is another project out there
someplace.

Trying to install sqlite as a gem installs version 2.0.1 but the most
recent version, if I understand rubyforge correctly, is 2.2.3; and
2.0.1 is marked a beta. Even though gem install tries to install
2.0.1 there is a gem on rubyforge for 2.2.3.

Worse, 2.0.1 won’t compile on my machine. I havn’t looked into it yet
but it reeks of missing libraries.

Sqlite3 does appear to install. I know I’ve had it working before so
this doesn’t surprise me too much (this is also the reason I never
worked out what the different projects are).

Oh well. Something to do on a Saturday morning. And the delayed
gratification in seeing your ‘hack’ isn’t all it’s cracked up to be,
I think I prefer instant gratification – but gratification there
will be

Cheers,
Bob

-a

we can deny everything, except that we have the possibility of
being better.
simply reflect on that.

the dalai lama
<sqlitespace.rb>

Bob H. – blogs at <http://www.recursive.ca/
hutch/>
Recursive Design Inc. – http://www.recursive.ca/
Raconteur – http://www.raconteur.info/
xampl for Ruby – http://rubyforge.org/projects/xampl/

mfaine · February 10, 2007, 4:01am

On Sat, 10 Feb 2007, Bob H. wrote:

i have. if people like it, i’ll release next week.

What version/implementation of sqlite are you using?

give this one a whirl…

-a

mfaine · February 10, 2007, 6:53pm

On Sat, 10 Feb 2007, Mauricio F. wrote:

It’s pretty fast considering it’s persistent, but it is of course slower
than the std. TupleSpace…

yes. i think it could be made quite a bit faster though. it’s also
broken
wrst to callbacks now, so i’m working on that first!

… and it won’t scale any better, so it wouldn’t solve the OP’s problem:

not currently, but i think it easily could - it’s just a matter of
having a
smarter schema in the db.

PS: some warnings removed…

thanks - in next version…

cheers.

-a

mfaine · February 12, 2007, 8:59pm

On Feb 7, 2007, at 9:21 AM, Mark Alexander F. wrote:

We’ve been trying to come up with a better implementation of TupleBag
but have not had much luck so far. Has anyone else done this?

This is probably as good a time as any to mention to the wider Ruby
community that I’ve been working on a new implementation of a tuple
space in Ruby, completely independent of the standard
implementation. My implementation currently searches a list to do
template matching, but it’s on my to-do list to research and
implement more scalable data structures and algorithms.

The most general form of template matching is actually quite
challenging to implement, and to achieve scalability, I’ll have to
impose some restrictions. The current DRb-based Rinda is extremely
flexible in what values can be in tuples, but in my implementation,
I’ll most likely require that all tuple components be scalars–
numbers, strings, etc. You can use YAML to store more complex
structures (opaquely to the tuple space), but arbitrary user objects,
regular expressions, and class names (e.g., take [String, 5,
Integer]) won’t be allowed directly as tuple components. This is a
trade-off, but one that I think most applications using a tuple space
can live with. This restriction makes the problem more amendable to
techniques developed in the fields of databases, data analysis (e.g.,
clustering algorithms to partition the space), machine learning, etc.

I have a working, though unreleased, distributed tuple-space
implementation right now and written in Ruby. It doesn’t use DRb at
all; it uses a custom wire protocol over SSL (with validation of
client and server certificates). It supports a good many features
beyond the standard tuple space; namely, multiple tuple space
regions, local and global scopes (two-level hierarchy of tuple
spaces), private one-to-one and group communication, region
privileges, and a “monitor” operation (the read_all operation returns
all available matching tuples; the monitor operation returns all
available and future matching tuples). Additionally, my
implementation allows processes to pass file descriptors over the
local tuple space (which, by definition, is confined to a given host,
and so communication takes place over a Unix domain socket).

More information on my implementation is available in my talk slides at

CAIDA Resource Catalog
young_wide0611_ark/

The design of the tuple space is being driven by the needs of CAIDA’s
next-generation distributed network measurement infrastructure named
Archipelago.

I hope to release the code under the GPL, with the client library
under the LGPL. There’s no clear time table for code release. It’s
also going to be months down the line before I can work on
implementing scalable matching in earnest, since other work has
higher priority.

–Young

mfaine · February 10, 2007, 10:13am

On Sat, Feb 10, 2007 at 09:04:51AM +0900, [email protected] wrote:

On Thu, 8 Feb 2007, Eric H. wrote:

I’ve heard a few people talk of it (mainly, TupleSpace with a persistence
(database)), but to my knowledge nobody’s done it.

i have. if people like it, i’ll release next week.

it’s a hack

it’s persistent
[…]
jib:~ > time ruby b.rb

real 0m19.063s
user 0m18.110s
sys 0m0.560s

It’s pretty fast considering it’s persistent, but it is of course slower
than
the std. TupleSpace…

$ cat tuplespace.rb
#!/usr/bin/env ruby

require ‘rinda/tuplespace’

ts = Rinda::TupleSpace.new

require ‘benchmark’
p Benchmark.measure {
42424.times{ ts.write [Time.now, rand] }
}

#require ‘rubygems’
#gem “sqlite-ruby”
require ‘sqlitespace’
$VERBOSE = false
puts “Using SQLite #{SQLite::Version::STRING}”
ts2 = Rinda::SQLiteSpace.new

p Benchmark.measure {
ts2.transaction{ 42424.times{ write [Time.now, rand] } }
}

$ time ruby tuplespace.rb
#<Benchmark::Tms:0xa795b878 @total=4.62, @cutime=0.0, @label="",
@stime=0.24, @real=4.64543700218201, @utime=4.38, @cstime=0.0>
/home/batsman/usr//lib/ruby/site_ruby/1.8/sqlite/database.rb:243:
warning: *' interpreted as argument prefix /home/batsman/usr//lib/ruby/site_ruby/1.8/sqlite/statement.rb:103: warning:’ interpreted as argument prefix
(eval):2: warning: *' interpreted as argument prefix (eval):2: warning:’ interpreted as argument prefix
(eval):2: warning: *' interpreted as argument prefix (eval):2: warning:’ interpreted as argument prefix
(eval):2: warning: *' interpreted as argument prefix (eval):2: warning:’ interpreted as argument prefix
(eval):1: warning: method redefined; discarding old delete
Using SQLite 2.2.2
#<Benchmark::Tms:0xa7717d8c @total=74.8, @cutime=0.0, @label="",
@stime=1.79, @real=76.8895101547241, @utime=73.01, @cstime=0.0>

real 1m21.836s
user 1m17.625s
sys 0m2.060s

(it seems slower w/o .transaction)

… and it won’t scale any better, so it wouldn’t solve the OP’s
problem:

        block ? block[t] : (all << t)
      end
    end
    block ? nil : all
  end

[…]

    end
    block ? nil : all
  end

PS: some warnings removed…

— sqlitespace.rb.orig 2007-02-10 09:33:20.000000000 +0100
+++ sqlitespace.rb 2007-02-10 10:09:50.000000000 +0100
@@ -43,7 +43,7 @@
@read_waiter.transaction{
@take_waiter.transaction{
@notify_waiter.transaction{

```
          instance_eval &b
```

          instance_eval(&b)
        }
      }
    }

@@ -79,7 +79,7 @@
def setup
begin
@db.execute(SCHEMA){}

```
    rescue ::SQLite::SQLException => e
```

    rescue (SQLite::SQLException rescue

::SQLite::Exceptions::SQLException) # tested with sqlite-ruby 2.2.3
42
end
end
@@ -88,7 +88,7 @@
# for older rubys
#
begin

```
    Base64
```

    x = Base64
    def encode64 blob
      Base64.encode64 blob
    end

@@ -122,7 +122,7 @@
end

    def execute *a, &b

```
    @db.execute *a, &b
```

    @db.execute(*a, &b)
  end

  def transaction *a, &b

@@ -213,7 +213,7 @@
end

  def transaction *a, &b

```
  @db.transaction *a, &b
```

  @db.transaction(*a, &b)
end

%w( push delete find_all find find_all_template delete ).each do

|m|