Subject: libkvm problems on alpha, but not sparc or i386 (netbsd-1-6)
To: NetBSD/alpha Discussion List <port-alpha@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: port-alpha
Date: 09/29/2004 19:37:40
I've been having some problems with various users of libkvm
w.r.t. getting proper proc, vnode, and mount data, etc. from the live
kernel, and I'm unsure if I've done something weird to bugger up my
source tree, or if this is a known bug on alpha, or what.
For example fstat has long giving me the classic "proc size mismatch"
even with a fresh new build, with a fresh new libkvm, running a fresh
new kernel all built at the same time from the same source tree (using a
safe, normal, build.sh style build).
fstat: proc size mismatch (1496 total, 1256 chunks)
The same source tree has given me no problems on i386 or sparc so I'm
assuming it has something to do with LP64 issues.....
I did some searches of open PRs but didn't find anything that seemed
related or relevant.
So after scratching my head for way too long I decided to try patching
up my local tree with revs 1.58 and 1.59 of fstat.c (to use
kvm_getprocs2()).
As expected the proc list was now readable, but why wasn't it before?
However now the only file descriptors fstat can give any reliable
information about are internet and unix sockets (and perhaps partly
pipes).
# ./fstat
USER CMD PID FD MOUNT INUM MODE SZ|DV R/W
root ksh 3053 wd - - ?(fffffc0 -
root ksh 3053 0* pipe 0xfffffc000d7d7068 <- 0xffffffffffffffff r
root ksh 3053 1 - - none -
root ksh 3053 2 - - none -
root ksh 3053 4 - - none -
root fstat 3052 wd - - ?(fffffc0 -
root fstat 3052 0 - - none -
root fstat 3052 1* pipe 0xfffffc000d7d73b0 -> 0xffffffffffffffff w
root fstat 3052 2 - - none -
root fstat 3052 3 - - none -
root fstat 3052 4 - - none -
root fstat 3052 5 - - none -
root fstat 3052 6 - - none -
root fstat 3052 7 - - ?(fffffc0 -
root fstat 3052 8 - - none -
root xterm 1045 wd - - ?(fffffc0 -
root xterm 1045 0* internet stream tcp fffffc0014ae3ca0 204.92.254.24:514 <-> 204.92.254.3:929
root xterm 1045 1* internet stream tcp fffffc0014ae3ca0 204.92.254.24:514 <-> 204.92.254.3:929
root xterm 1045 2* pipe 0xfffffc000d7d68c0 -> 0xffffffffffffffff w
root xterm 1045 3 - - none -
root xterm 1045 4 - - none -
root xterm 1045 5* internet stream tcp fffffc0014ae3ab8 204.92.254.24:65454 <-> 204.92.254.3:6000
root rshd 1043 wd - - ?(fffffc0 -
root rshd 1043 3* internet stream tcp fffffc0014ae38d0 204.92.254.24:1016 <-> 204.92.254.3:928
root rshd 1043 4* pipe 0xfffffc000d7d6f50 <- 0xffffffffffffffff rn
[[ .... ]]
root inetd 262 wd - - ?(fffffc0 -
root inetd 262 0 - - none -
root inetd 262 1 - - none -
root inetd 262 2 - - none -
root inetd 262 3* unix dgram fffffe000032a480 <-> fffffe0000288580
root inetd 262 4* internet stream tcp fffffc0011f82f40 *:21
root inetd 262 5* internet stream tcp fffffc0011f83128 *:23
root inetd 262 6* internet stream tcp fffffc0011f83310 *:514
root inetd 262 7* internet stream tcp fffffc0011f834f8 *:513
root inetd 262 8* internet stream tcp fffffc0011f836e0 *:79
root inetd 262 9* internet stream tcp fffffc0011f838c8 *:113
root inetd 262 10* internet stream tcp fffffc0011f83ab0 *:17
root inetd 262 11* internet dgram udp fffffc000fb010e0 *:518
root inetd 262 12* internet stream tcp fffffc0011f83c98 *:7
root inetd 262 13* internet stream tcp fffffc0014ae2008 *:9
root inetd 262 14* internet stream tcp fffffc0014ae21f0 *:13
root inetd 262 15* internet stream tcp fffffc0014ae23d8 *:37
root inetd 262 16* internet dgram udp fffffc000fb013b0 *:7
root inetd 262 17* internet dgram udp fffffc000fb01440 *:9
root inetd 262 18* internet dgram udp fffffc000fb014d0 *:13
root inetd 262 19* internet dgram udp fffffc000fb01560 *:37
Similarly systat can't read the mount table for reasons I can't quite
figure out. After adding some better error checking to the code the
best I could get were some new error messages and a whole bunch of
garbage in the second half of the bufcache display.
As well vmstat has some similar breakage too:
# vmstat -H
total used util num average maximum
hash table buckets buckets % items chain chain
bufhash 16384 499 3.05 535 1.07 3
vmstat: kptr 37: hash chain corrupted: kvm_read: Bad address
Pstat seems to be able to print open files though, but not vnodes:
# pstat -T
146/13196 files
pstat: vnode size mismatch
# pstat -v
pstat: vnode size mismatch
# pstat -f
146/13196 open files
LOC TYPE FLG CNT MSG DATA OFFSET
fffffc000d247638 inode WA 1 0 fffffc001492e948 0
fffffc000d2476c8 inode WA 1 0 fffffc001492eac0 6290
fffffc000d246048 inode RW 3 0 fffffc001503cf38 1152
fffffc000d2473b0 inode RW 3 0 fffffc000f50b638 0
fffffc000d247830 socket RW 1 0 fffffc0013a19850 0
[[ .... ]]
I'm guessing pretty much everything that still uses kvm_read() is
busted.
It's almost as if some commonly used data type is a different width
inside the kernel and out, or maybe /dev/kmem is busted, or maybe
something's wrong with the nlist reader, or....
I haven't made any local changes to any of the kernel data structures in
question, nor any of the type definitions, nor as far as I can tell to
anything else that could be related, and since as I say all works well
on i386 and sparc from the same source tree I'm at a bit of a loss.
Any hints or clues or suggestions about further tests I could do would
be much appreciated. Debugging some system-level stuff is a bit of a
nightmare as-is....
--
Greg A. Woods
+1 416 218-0098 VE3TCP RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com> Secrets of the Weird <woods@weird.com>