Subject: New hash algorithm - FNV - for use in the kernel
To: None <tech-perform@netbsd.org>
From: Luke Mewburn <lukem@wasabisystems.com>
List: tech-kern
Date: 11/27/2001 15:52:18
--bp/iNruPH9dso1Pn
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
I recently added <sys/fnv_hash.h> (from FreeBSD), which provides an
implementation of the Fowler / Noll / Vo (FNV) hash, as described at:
http://www.isthe.com/chongo/tech/comp/fnv/
I've done some initial investigation and testing of the effectiveness
of this hash algorithm, and I'm getting similar improvements that
Peter Wemm (of FreeBSD) found when he switched parts of FreeBSD over
to using the FNV hash.
There is a potential issue however; FNV uses multiply (*) and xor (^),
whereas a lot of the existing hashes on strings and buffers just use
addition (+). Whilst FNV has much better distribution than the latter,
it will be slower on platforms without hardware multiply. The question
is as to whether the reduction in hash collisions (and the subsequent
following of collision list chains) is effective on these systems.
Using the nfs client node (nfsnodehash) as an example, here's some
results I achieved by:
- applying the attached diff to sys/nfs/nfs_node.c
- running the attached benchmark
- changing the value of nfs_hash_fnv to 1 to enable the
use of fnv_32_buf() in nfs_hash()
- re-running the attached benchmark
Results
-------
1) vmstat -h nfsnodehash:
total used util num average maximum
hash table buckets buckets % items chain chain
----------------------------------------------------------------------
with existing nfs_hash():
nfsnodehash 65536 1000 1.53 34688 34 105
using fnv_32_buf()
nfsnodehash 65536 27054 41.28 34867 1 6
There's much better distribution with the fnv_32 code.
2) kernel profiling, and examining the hits on nfs_nget() and nfs_hash():
% cumulative self self total
time seconds seconds calls ns/call ns/call name
--------------------------------------------------------------
with existing nfs_hash():
0.60 194.78 1.48 83303 17.77 21.47 nfs_nget
0.01 228.36 0.02 83303 0.24 0.24 nfs_hash
using fnv_32_buf()
0.03 213.41 0.06 83305 0.72 2.35 nfs_nget
0.02 213.87 0.05 83305 0.60 0.60 nfs_hash
nfs_hash() is more expensive in the second case, but nfs_nget is
much more efficient because of the reduced hash collisions.
3) The actual benchmark
This is an 'ls -Rf' of an nfs mountpoint which has
approximately 330,000 files. The actual time to run
this took approximately the same amount of time in
both cases.
It would be interesting to see the test results from other people.
Here's the patch, and the benchmark. You'll need a -current vmstat(1).
I'm going to cut over to using fnv_32_buf() as the guts of nfs_hash()
in a couple of days unless there's some results that show that this
will be a bad idea on the `no hardware multiply' platforms.
Luke.
--
Luke Mewburn <lukem@wasabisystems.com> http://www.wasabisystems.com
Luke Mewburn <lukem@netbsd.org> http://www.netbsd.org
Wasabi Systems - NetBSD hackers for hire
NetBSD - the world's most portable UNIX-like operating system
--bp/iNruPH9dso1Pn
Content-Type: text/plain; charset=us-ascii
Content-Description: nfs_node.c.diff
Content-Disposition: attachment; filename=diff
Index: nfs/nfs_node.c
===================================================================
RCS file: /cvsroot/syssrc/sys/nfs/nfs_node.c,v
retrieving revision 1.47
diff -p -p -u -r1.47 nfs_node.c
--- nfs/nfs_node.c 2001/11/10 10:59:09 1.47
+++ nfs/nfs_node.c 2001/11/27 04:49:37
@@ -53,6 +53,7 @@ __KERNEL_RCSID(0, "$NetBSD: nfs_node.c,v
#include <sys/malloc.h>
#include <sys/pool.h>
#include <sys/lock.h>
+#include <sys/fnv_hash.h>
#include <nfs/rpcv2.h>
#include <nfs/nfsproto.h>
@@ -143,6 +144,8 @@ nfs_nhdone()
pool_destroy(&nfs_vattr_pool);
}
+int nfs_hash_fnv = 0;
+
/*
* Compute an entry in the NFS hash table structure
*/
@@ -155,6 +158,9 @@ nfs_hash(fhp, fhsize)
u_long fhsum;
int i;
+ if (nfs_hash_fnv)
+ return fnv_32_buf(fhp->fh_bytes, fhsize, FNV1_32_INIT);
+
fhpp = &fhp->fh_bytes[0];
fhsum = 0;
for (i = 0; i < fhsize; i++)
--bp/iNruPH9dso1Pn
Content-Type: text/plain; charset=us-ascii
Content-Description: nfsnodebench
Content-Disposition: attachment; filename=nfsnodebench
#!/bin/sh
MOUNTPOINT=`mktemp -d /tmp/nfsXXXXXX` || exit 1
DATE=`date +'%y%m%d-%H%M%S'`
trap "umount $MOUNTPOINT; rmdir $MOUNTPOINT; exit 0" EXIT INT QUIT PIPE
LISTDIR=$MOUNTPOINT
(
mount -o rdonly argo:/z $MOUNTPOINT || exit 1
echo ""
echo "nfs_fnv_hash value:"
echo 'print nfs_hash_fnv' | gdb -q /netbsd
echo ""
echo ""
echo "nfs_node hash stats:"
vmstat -h nfsnodehash
echo ""
echo "starting kernel profile:"
kgmon -br
echo ""
echo "ls -Rf $LISTDIR"
echo "begin: "`date`
ls -Rf $LISTDIR >/dev/null
echo "end: "`date`
echo ""
echo "stopping kernel profile:"
kgmon -h
echo ""
echo "nfs_node hash stats:"
vmstat -h nfsnodehash
echo ""
echo "generating $DATE.gmon"
kgmon -p
mv gmon.out $DATE.gmon
echo ""
echo "generating $DATE.profile from $DATE.gmon"
gprof /netbsd.gdb $DATE.gmon > $DATE.profile
echo ""
) | tee $DATE.out
exit 0
--bp/iNruPH9dso1Pn--