On 2013-01-14, at 6:33 AM, Eliezer C. [email protected]
wrote:
robert
Thanks Robert,
Hi,
Here are a bunch of questions to ask yourself that cover off things that
I’ve found helpful to know.
You might think about why you need to move from your combination of
TokyoCabinet and Redis. In particular, why not Redis. It’s a little more
clear why you’d move from TokyoCabinet, I’ve used it happily for years
myself, so I can imagine a bunch of reasons.
The scale is in couple directions:
- multiple physical nodes the main DB stored on.
Is this for reliability or performance? This sounds like a solution not
a requirement.
How big do you think it’ll be?
What kind of request rate are you thinking?
Do you care about latency? (you should) Throughput and latency are
pretty much independent variables when it comes to databases.
- master and secondary updates\replication
Again, this sounds like a solution not a requirement. What’s the issue
that makes you say this?
What is your read/write ratio? What is your write rate? Are you updating
or writing new data, and what’s the ratio of update to write? Do you
need secondary indexes? How many, what kind?
If you write to the master then replicate there’ll be a time period
where the various nodes will provide different results. Can your
application tolerate this? or do you need some kind of stronger
consistency constraint?
Are your updates/writes exposing you to consistency issues? (i.e. do you
need transactions?) If you update (or even write) multiple records, it’s
possible that the updates arrive in an essentially random order to the
replicas, and possibly in a different order to the different replicas.
For now one machine will host the DB while it gets updates from couple sources
such as human and other auto-testing tools.
This will be a dedicated DB machine while there are others servers which gets
updates from the master DB when needed.
The problem is that the updates are live and should be replicated with the
smallest delay possible.
What does “when needed” mean given that the updates should be “as soon
as possible”? I’m thinking that this master/slave setup you’re thinking
of is lifted from how you’d do it with TokyoCabinet or Redis. Things
like Cassandra or Riak or HBase don’t do it that way.
Are you ever going to have to scan your whole database? How often?
Cheers,
Bob