DHT upstream module + nginx patches

Peter_SchSSS_ller · October 9, 2009, 10:53am

Hello,

We have been doing some nginx development for internal use. Tommie has
already sent the statistics module to the list yesterday separately,
because it was very self-contained. However, we have additional
changes that are not suitable for submission for inclusion, but we
still want to publish the code in the hope that it may be useful to
someone and to elicit feedback from interested people.

I am attaching two things; a module (spdht) which implements DHT based
routing of requests to multiple upstream servers, and a patchset for
nginx itself (against 0.7.61) that are needed, in part, in order to
use the module.

In both cases, it is unpolished in terms of its release, and we
realize it is not directly applicable to any user of nginx. However
even so we would rather release it than not, and at least interested
people may look at the code. Some parts may be suitable for selective
inclusion.

The spdh module routes requests based on the hash of the URL being
requested. It needs some configuration in nginx itself (an example
nginx.conf is included in the tarball). In addition the DHT ring is
configured through DNS. For a simple case with only two hosts (for
brevity), DNS is configured similar to this:

; DHT cluster options - replication level for each collection, and
hash algorithm
config._service-name._http TXT “slaves=stuff:2
otherstuff:1” “hash=sha1”

; SRV records for the service
_service-name._http SRV 1000 1000 80 host1
_service-name._http SRV 1000 1000 80 host2

; TXT records containing DHT tokens
tokens.80.host1 TXT
“FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF”
tokens.80.host2 TXT
“7FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF”

This puts host1 and host2 into the ring, each responsible for half of
the keyspace.

For a given path P, the set of hosts responsible for that path is is
calculated by hashing the path N times for N levels of redundancy
(note that the config TXT record specifies slave count; this is
actually a misnomer since the slave role does not exist; 1 slave -> 2
copies of a file).

In case of duplicate hosts, hashing continues (with some limit) until
N unique hosts have been found.

Now, in terms of the patches to nginx itself, a short summary of the
approximate feature set is:

Add support for SHA1 in the caching module.
Support multi-threaded (one thread per disk) traversal of the
cache during cache manager start up.
Some tempfile allocation fixes, avoiding an infinite loop in
certain failure modes (e.g. broken disk).
Additional statistics (as submitted separately, but included here
too).
Support failing quickly when workers are exhausted (e.g. due to
broken disks, overload) rather than
causing slow modes of failure (max_active_workers).
A posix_fdatasync()/posix_fadvise() hack to avoid buffer cache
thrashing when pulling data into the cache (synchronous
call - will block unrelated requests in the same worker).
Cache module uses prefix instead of postfix directory structure.
When allocating temp files, pass a prefix onto the tempfile so
that the tempfile ends up in the same
directory as the final file. This allows the prefix tree to by a
symlink farm pointing to distinct drives,
without breaking atomic rename() semantics.

Peter_SchSSS_ller · October 9, 2009, 11:35am

Am 9. Oktober 2009 10:46 schrieb Peter SchÃ¼ller [email protected]:

[…]

I am attaching two things; a module (spdht) which implements DHT based
routing of requests to multiple upstream servers, and a patchset for
nginx itself (against 0.7.61) that are needed, in part, in order to
use the module.

Thank you for sharing! At least, I am interested.

[…] In addition the DHT ring is configured through DNS. […]

In a project of mine in academia this turned out to be a very good
approach - if you cached DNS responses in the configuration reader,
though.

Â * Add support for SHA1 in the caching module. […]

No need to use cryptographic hashes in such an application. They’re
slower than those I will mention below and you don’t really need the
“from a given output bit you cannot judge to an input bit”. It is
sufficient that every output bit will be toggled at a probability of
~50%, thus of a almost perfect dispersion.

For faster hashing try Murmurhash64, FNV1A and friends:
[1] http://murmurhash.googlepages.com/
[2] FNV Hash
[3] A Hash Function for Hash Table Lookup (see the very bottom
of that page)

BTW, replacing MD5, SHA1 etc. by one of those you can accelerate a lot
of Key/Value storages out there.

Peter_SchSSS_ller · October 9, 2009, 12:17pm

Thank you for sharing! At least, I am interested.

We’re glad

Â * Add support for SHA1 in the caching module. […]

No need to use cryptographic hashes in such an application. They’re
slower than those I will mention below and you don’t really need the
“from a given output bit you cannot judge to an input bit”. It is
sufficient that every output bit will be toggled at a probability of
~50%, thus of a almost perfect dispersion.

The only reason we moved to SHA1 was consistency with other (internal)
systems. Performance (of the hashing algorithm) is not an issue in our
use case.

I did not mean to imply that we felt MD5 was insufficient for the
purposes of the caching module.

Peter_SchSSS_ller · October 10, 2009, 5:05am

I am interested. but I dont’ know what’s this module suitable scene.
Can you intro more about this?

2009/10/9 Peter Schüller [email protected]:

Peter_SchSSS_ller · October 27, 2009, 11:21am

Hello,

sorry about the delay in responding,

I am interested. but I dont’ know what’s this module suitable scene.
Can you intro more about this?

The intended use case of the upstream module, in combination with some
of the other changes, is the ability to use a cluster of nginx servers
for large amounts of file caching with redundancy and fail-over.

So for example you may have large amounts of data, far too large to
reasonable fit on a single server, that you want to provide
high-throughput access to it. This means spreading the data around
over multiple machines for three reasons:

(1) The data will not fit on a single server.
(2) A single server (even with many disks) may not be capable of
serving a sufficiently high request rate.
(3) You want files to be accessible when individual servers fail.

In addition you want to be able to dynamically add and remove hosts
from the system, to scale according to performance demands and/or disk
space demands and/or redundancy demands.

This is essentially what you can do with the DHT upstream module, the
patches to nginx itself, and using the caching module in nginx. The
main components is the reading of configuration via DNS (making it
practical to maintain configuration in an authoritative fashion), the
actual routing (i.e., for a given request /path/to/some/resource,
produce a set of hosts which, according to the DHT hash ring, should
have a copy of the file), and the failover logic that is capable of
marking hosts as down.

The routing part allows re-routing a request to other members of the
same hash ring, but also supports forwarding the request to a parent
ring if the resource must be obtained further upstream (has not yet
been cached).

The use case assumes that at some point up the tree of hash rings,
there is some kind of authoritative storage (i.e., does not just cache
something upstream, but serves concrete files).

Peter_SchSSS_ller · November 1, 2009, 8:53am

Hello

Support multi-threaded (one thread per disk) traversal of the

cache during cache manager start up.

does this patch solve the nginx multi-threaded problem?Or it can just
used
for your dht upstream module?

thanks

Peter_SchSSS_ller · November 3, 2009, 6:52pm

Â * Support multi-threaded (one thread per disk) traversal of the
cache during cache manager start up.

does this patch solve the nginx multi-threaded problem?Or it can just used
for your dht upstream module?

The patches enable enough multi-threading as necessary for the
traversal; it does not actually make multi-threading in general work
in nginx.