Accessing ruby objects across VMs

piyush_r · April 21, 2009, 12:14am

I have 1 ruby instance running for a website. I do heavy polling using
in
memory data and to do run a few more. I have all my data stored in a
MySQL
DB. Right now whenever an entry is made to this DB (containing only a
single
table), I cache this data in memory and that serves my purpose. Now as
the
traffic is increasing I need to start more ruby instances. Is there any
way
to store this data in a single in-memory location and access it from all
other vms ?

There are a few options that come to my mind like

Storing in a memcache
Storing in localmemcache
The problem with both of these options is that I need to store heavily
nested objects. If I do that in memcache I may have to do using
something
like Marshal dump and load which is a little slow for my liking.
Other option could be to keep seperate copies in all VMs and refresh it
using a hhtp get call to all VMs whenever the data changes in the DB.

Now the question is , is there a way to access in a shared manner
between
all the VMs ? Or am I mssing something very obvious here? or is it not
doable ?

Piyush R.

piyush_r · April 21, 2009, 2:12am

Thanks Joel.
DRB seems a better option but I think Marshal load dump seems easier
even
though there is a performance degradation.
Optimization is premature but I have to pull it off because there is
limited
hardware available to me for this project.
Thanks
Piyush

piyush_r · April 21, 2009, 2:00am

Piyush R. wrote:

Storing in localmemcache
The problem with both of these options is that I need to store heavily
nested objects. If I do that in memcache I may have to do using something
like Marshal dump and load which is a little slow for my liking.
Other option could be to keep seperate copies in all VMs and refresh it
using a hhtp get call to all VMs whenever the data changes in the DB.

Now the question is , is there a way to access in a shared manner between
all the VMs ? Or am I mssing something very obvious here? or is it not
doable ?

AFAIK there’s no way for two VMs to share live, writable object storage.

Looks like the choice is between deserializing on every read (opts 1 and
2) vs. deserializing on every write (your “other option”). So if reads
outnumber writes, as they often do, the other option might be better.

You could use drb (which does marshal over sockets) instead of http for
this. Probably faster, and certainly easier to use. Use the
drbunix:/path/to/socket protocol instead of tcp, and it will be even
faster (maybe 50%, in my experience).

YMMV, and standard cautions against premature optimization apply…

piyush_r · April 21, 2009, 2:55am

Piyush R. wrote:

Thanks Joel.
DRB seems a better option but I think Marshal load dump seems easier even
though there is a performance degradation.

Marshal #load and #dump is what drb is doing.

piyush_r · April 21, 2009, 9:29am

2009/4/21 Piyush R. [email protected]:

DRB seems a better option but I think Marshal load dump seems easier even
though there is a performance degradation.

I am not sure what you mean by this. As Joel pointed out, DRb does use
Marshal. You just need to decide which objects are remotely accessible
and which should be sent over the wire.

If you have a heavy concurrent application with frequent updates you
must watch out for all sorts of consistency issues. It may turn out
that it is better to use the database for maintaining integrity and do
a per process caching of relevant data from the DB.

Kind regards

robert