Subject: troubleshoot hash(3) database issues
To: None <tech-userlevel@netbsd.org>
From: Jeremy C. Reed <reed@reedmedia.net>
List: tech-userlevel
Date: 08/30/2007 07:29:06
I am using latest spamd from OpenBSD on multiple NetBSD/i386 3.1 systems.
It uses hash(3).
I continually get corrupted data. Sometimes when spamd reads in the
database, it says the data size is too large. In its case, it should
always be 20, but often dbd.size is thousands instead. I added debugging
to spamd to tell me how bug the dbd.size was.
The strange thing is that I can't reproduce it or see it using the spamdb
utility to list or db(1) (modified to add my -s switch to show size as
seen in my other email).
Any ideas on how I can debug this or troubleshoot my spamd db corruption?
I think the db file itself is fine, but the in memory usage of it is
corrupted.
I compared src/lib/libc/db/hash/ on NetBSD and OpenBSD and saw some
differences. One thing was that in hash_buf.c, OpenBSD uses memset of 0xff
and malloc while NetBSD uses calloc (zero). For example:
/* Allocate a new one */
- if ((bp = calloc(1, sizeof(BUFHEAD))) == NULL)
+ if ((bp = (BUFHEAD *)malloc(sizeof(BUFHEAD))) == NULL)
return (NULL);
- if ((bp->page = calloc(1, (size_t)hashp->BSIZE)) == NULL) {
+ memset(bp, 0xff, sizeof(BUFHEAD));
+ if ((bp->page = (char *)malloc(hashp->BSIZE)) == NULL) {
free(bp);
return (NULL);
}
+ memset(bp->page, 0xff, hashp->BSIZE);
I don't understand purpose of using 0xff instead of 0. Also it does second
memset.
And OpenBSD does:
if (do_free) {
- if (bp->page)
+ if (bp->page) {
+ (void)memset(bp->page, 0, hashp->BSIZE);
free(bp->page);
+ }
But I don't understand value of that.
OpenBSD also "Avoid overwriting the cursor page when the cursor page
becomes the LRU page":
bp = LRU;
+
+ /* It is bad to overwrite the page under the cursor. */
+ if (bp == hashp->cpage) {
+ BUF_REMOVE(bp);
+ MRU_INSERT(bp);
+ bp = LRU;
+ }
+
/*
* If LRU buffer is pinned, the buffer pool is too small. We need to
* allocate more buffers.
I don't know if that is related to my corruption. (I may update my libc to
test with these ideas, but since I can't reproduce my problems manually, I
have to wait.)
Any ideas on how I can troubleshoot?
Jeremy C. Reed