A memcached-like server in Ruby - feasible?

Tom_M.hinski · October 28, 2007, 6:06pm

I’d like to see you take the approach of extending memcached with ruby.
Leverage what memcached is already doing and already good at.

Based on my limited understanding (please correct me if I’m wrong):
memcached works by accepting a request for a certain ‘key’ and then
returns
all objects that match that key. Right?

For example, you can make memcached store 1000 user records and then ask
for
all of them, but you can’t ask for them with a ‘query’ that limits the
set
of users.

The way that memcached would answer your query for 1000 users is to go
to
each node and fetch and return to you all the users stored in that node,
this node has 400 of them, this node has 200 of them, combine them all
together and return them…

So what you want to do, is to be able to define some arbitrary ruby that
gets executed at each Node to trim down the set of users so that the
entire
set doesn’t need to be returned.

So alter memcached to accept a ‘query’ in the form of arbitrary ruby (or
perhaps a pre-defined ruby) that a peer-daemon is to execute over the
set of
results a particular memcached node contains.

In my understanding, this is sort of the way CouchDB is supposed to
work.
(http://theexciter.com/articles/couchdb-views-in-ruby-instead-of-javascript)

do you follow?

Tom_M.hinski · October 28, 2007, 11:31pm

why don’t you try Gemstone or other object-oriented databases?

besides, memcache isn’t THAT much faster then database, it’s faster
becouse it can store objects in memory, but if you need queries it
looses all it’s advantages.

greets

Tom_M.hinski · October 28, 2007, 9:10pm

What I would try is using a slave to replicate just the tables you
need (actually the indexes if that were possible) and memcached to
keep copies of all those objects. I’ve been using memcached for years
and I can swear by it. But keeping indexes in memcached is not easy/
reliable to do and mysql would do a better job. So then you would
query the slave DB for the conditions you need but only to return the
ids. And then you would ask memcached for those objects. I’ve been
doing something similar in my CMS and it has worked great for me. Here
is an article that might explain better where I’m coming from [1]. And
if mysql clusters make you feel a little dizzy simple slave
replication and mysql-proxy [2] might help out too.

Hope it helps,

Adrian M.

[1]
http://blog.methodmissing.com/2007/4/24/partially-bypass-activerecord-instantiation-when-using-memcached/
[2] http://forge.mysql.com/wiki/MySQL_Proxy

Tom_M.hinski · October 29, 2007, 1:26am

“Tom M.hinski” [email protected] writes:

On 10/28/07, Yohanes S. [email protected] wrote:

The other thing you can play with is using sqlite as the local (one
per app server) cache engine.

Thanks, but if I’m already caching at the local process level, I might
as well cache to in-memory Ruby objects; the entire data-set isn’t
that huge for a high-end server RAM capacity: about 500 MB all in all.

Caching to in-memory ruby objects does not automatically confer the
smartness you was describing. The sqlite is for the smartness.

I think Ara T Howard in the other thread was quite spot-on in
summarising your need.

YS.

Tom_M.hinski · October 29, 2007, 4:21am

Tom M.hinski wrote:

mysql can either do this with a readonly slave or it cannot be done
request-per-second we want to serve. As we need at least 50 reqs/sec,
between the database and the Rails servers. That way, we will need
less MySQL servers, output requests faster (as the layer would hold
the data in an already processed state), and save a much of the
replication / clustering overhead.

-Tom

MapReduce and Starfish?

Tom_M.hinski · October 29, 2007, 4:02am

Tom M.hinski wrote:

I might have impressed you with a somewhat inflated view of how large
our data-set is

We have about 100K objects, occupying ~500KB per object. So all in
all, the total weight of our dataset is no more than 500MBs. We might
grow to maybe twice that in the next 2 years. But that’s it.

So it’s very feasible to keep the entire data-set in good RAM for a
reasonable cost.

I was just thinking … Erlang has an in-RAM database capability called
“Mnesia”. Perhaps it could be ported to Ruby or one could write an
ActiveRecord connector to a Mnesia database.

Good point. Unfortunately, MySQL 5 doesn’t appear to be able to take
hints. We’ve analyzed our queries and there’s some strategies there we
could definitely improve by manual hinting, but alas we’d need to
switch to an RDBMS that supports those.

I wonder if you could trick PostgreSQL into putting its database in a
RAM disk. Seriously, though, if you’re on Linux, you could probably
tweak PostgreSQL and the Linux page cache to get the whole database in
RAM while still having it safely stored on hard drives. I suppose you
could also do that for MySQL, but PostgreSQL is simply a better RDBMS.

Tom_M.hinski · October 29, 2007, 4:35pm

Yohanes S. schrieb:

memcached by itself seems insufficient for our needs.

The other thing you can play with is using sqlite as the local (one
per app server) cache engine.

With in-memory tables

Regards,

Michael

Tom_M.hinski · October 29, 2007, 2:36pm

Marcin R. wrote:

why don’t you try Gemstone or other object-oriented databases?

besides, memcache isn’t THAT much faster then database, it’s faster
becouse it can store objects in memory, but if you need queries it
looses all it’s advantages.

greets

here’s simple object oriented DB i wrote ages ago, just throw out state
machine stuff and have phun, searching is done in ruby and really easy.
it took me one day if i remember correctly so that should be indication
of time you’d need to make simple ODB

Module responsible for handling requests informations

Information status:

- :waiting (waiting for server response)

- :progress (server reported progress)

- :results (server returned results)

- :error (j.w)

- :timeout (request timed out)

- :collect - to be garbage collected (right now for debuging

purposes)
module Informations

default time to live of message (used when expire is set to :ttl)

attr_accessor :default_ttl

default timeout - time betwen ANY actions sent by server

attr_accessor :default_timeout

use garbage collecting?

attr_accessor :gc

def init(ttl=30, timeout=30, gc=true)
@gc = gc
@default_ttl = ttl
@default_timeout = timeout
@informations={}
end

creates new informations about request, id is request id,

hash should contain additional informations (the’ll be merged)

def new_info(id, hash)
#hash=hash.dup
#hash.delete(:data)
info={}
info[:id]=id
info[:status]=:waiting
info[:timeout]=@default_timeout
info[:last_action]=info[:start]=Time.now
info[:expire]=:new
info[:ttl]=@default_ttl
info.merge! hash

 @informations[id] = info

end

information state machine

checks message status - and takes care of checking state

transitions

if transition is wrong it’s ignored (no exception is rised!!)

new info is returned

def change_status(info, state)
case info[:status]
when :waiting, :progress
if [:progress, :results, :error, :timeout].include? state
info[:status]=state
info[:stop]=Time.now unless state == :progress
info[:last_action]=Time.now
end
when :results, :error, :timeout
if state == :collect
info[:status]=state
info[:last_action]=Time.now
end
end
info
end

checks if message timed out

def timeout?(info)
change_status(info, :timeout) if ([:wait, :progress].include?
info[:status]) && (Time.now > info[:last_action] + info[:timeout])
end

finds information with id

takes care of marking msg, as timed out/ to be collected

returns info

def find(id)
self.timeout?(@informations[id])

 begin
   info = @informations[id].dup

   #return nil if info[:state]==:collect # don't return expired

infos
if info[:expire]==:first
@gc ? change_status(@informations[id], :collect) :
@informations.delete(id)
end
if (info[:expire]==:ttl) && (Time.now < info[:last_action] +
info[:ttl])
@gc ? change_status(@informations[id], :collect) :
@informations.delete(id)
end

 rescue Exception
   info=nil
 end


 #info[:last_action]=Time.now preventing expire ?
 info

end

finds all message matching criteria block

or checks if :server_id and :name provided in hash match

block should return true if information should be returned

Examples:

find_all({:name=>“actions”, :server_id=>“121516136171356151”})

find_all() {|i| i[:last_action] > Time.now-60 }

returns all informations that state changed minute or less ago

find_all() {|i| i[:status]==:error}

returns all messages that returned errors

gc! if find_all() {|i| i[:status]==:collect}.size > 1000

clears old messages when there’s more then 1000 of them

def find_all(hash={})
res = []
@informations.each_pair { |k,v|
if block_given?
res << self.find(k) if yield(v.dup)
else
catch(:no_match) {
# add more here!!
[:server_id, :name].each { |x|
throw(:no_match) if hash[x] && hash[x]!=v[x]
}
res << self.find(k)
}
end
}
res.empty? ? nil : res
end

clears all messages marked for collection

def gc!
@informations.each_pair { |k,v|
@informations.delete(k) if v[:status]==:collect
}
end
end