Tlv\btree data structure + Berkeley DB

aris · August 18, 2012, 9:44pm

I am using blacklist of SquidGuard for content filtering.
SquidGuard uses Berkeley DB to store: Domains, urls and regex.
now i am using redis + mysql DB redis for memory cache and mysql to
store the data.

SquidGuard search is really fast and faster then mysql many times and i
want to try to store the Domains in Berkeley DB file as persistent
storage.
I have domains blacklist file which contains on each line one domain
that I want to store in the DB file.
I have tried to read about Berkeley DB how it works but I dont really
understand yet how they use the DB to store domains.

the original file is 17+ MB and i want to benefit from the DB for fast
lookup.
in mysql the size of the DB + INDEX is about 100MB.
a Berkeley DB of the same data the was made by SquidGuard is about
50-60MB size.

I want to benchmark the Berkeley DB and mysql or other DB.

so:

basic suggestions on how to organize TLV domains DB?
how do i organize the domains in a “Ordered key-value” DB such as
Berkeley?
ways to benchmark key lookup in DB?
other DB you can recommend for the task?

The API i want to use is “add(domain)” “exist(domain)” “remove(domain)”.
I am looking for code snippets and examples on usage of Berkeley DB in
ruby using the ruby-bdb(0.2.6.5).
I have seen the example in the github repo but some more examples for
real-world usage is what i am looking for.

Thanks,
Eliezer