Subject: keeping hash database in sync or very light-weight redundant database
To: None <netbsd-help@netbsd.org>
From: Jeremy C. Reed <reed@reedmedia.net>
List: netbsd-help
Date: 02/21/2007 13:02:35
I want to keep a near identical database on five or more (maybe up to 25)
different servers.
Currently the database is in hash(3) format. Currently it has around 3000
entries (around 600KB of disk space), but will grow to maybe over 100,000
entries.
The database is looked at once per minute per server. (The data is also
loaded by something else that may use the data thousands of times per
minute.)
Any suggestions on how I can easily share this data?
Note the data entries have an expiration time.
Some ideas I have:
1) Create a log of the local additions and deletions from the database for
every system. Then every minute, have every system copy that to each of
the other servers. Then they read that log to do the additions and
deletions. But systems not available (even temporarily) will get out of
sync for additions. As for deletions, the expiration time will work
automatically (unless there was a manual deletion).
2) Somehow merge all the databases on every available system. But if one
system adds an entry but another does a deletion of same entry, then it
won't be consistent.
3) On every local database addition or deletion, also send the details via
some UDP broadcast. Have a listener on all the systems that: verifies
received data and then does the addition or deletion to its own database
respectively. (I can use packet filter to make sure no other access to
submit to this; or I can do this over TCP with SSH or SSL tunnel). But
again the data will get out of sync for systems that are unavailable.
4) Maybe I need to use a more advanced version of Berkeley DB (or move
away from it). I see db4 has Distributed Transactions and replication
groups. I even found example db4 code for network-based master and clients
with election priorities and clients can become masters. I don't know
anything about this. But a system using elections to choose the master
database server seems like another way to do this.
I do not want to have a central database server. Every individual system
must be self-contained -- and can not expect other servers to be
available.
I do not want to use a heavy SQL server.
I'd prefer not to use DNS to store my data. (But if you can convince me
that could be easiest let me know.)
Jeremy C. Reed