Subject: Re: Need advice how to debug VM problem on port-hp700.
To: Matt Thomas <matt@3am-software.com>
From: Jochen Kunz <jkunz@unixag-kl.fh-kl.de>
List: port-hp700
Date: 10/29/2003 10:13:22
On 2003.10.29 01:05 Matt Thomas wrote:
> It looks like you are getting a NULL pointer deference.
Ahaaa!
> Why not drop into DDB there and look at the stack?
Because the last lines of output are:
kernel: DTLB miss trap, code=0
kernel: DTLB miss trap, code=0
--db_more--
So it looks like the kernel is already in ddb(4), but I can't get to the
ddb(4) prompt. Regarless what I type or when I send a serial break, it
spits out some more lines and the "--db_more--" "prompt".
Hmmm.
[single step session with ddb(4)]
There it is. There is one line in sys/nfs/nfs_boot.c:
if (error && nfs_boot_rfc951) {
and it crashes on the assembler instruction that loads the value of the
int nfs_boot_rfc951. When I understand the assembler instruction correct
it tries to load the word from the address 0x168. (0x168!?) This int
variable is defined in nfs_boot.c together with the int
nfs_boot_bootparam. So I run:
$ nm netbsd | egrep 'nfs_boot_bootparam|nfs_boot_rfc951|my_debug'
0060b48c D my_debug_nfs_boot
0060b490 D my_debug_nfs_bootdhcp
0000016c ? nfs_boot_bootparam
00000168 ? nfs_boot_rfc951
my_debug_nfs_boot is a int defined at the top of sys/nfs/nfs_boot.c to
controll my debug-printfs. When I remove my debug #define on the top
of the file (i.e. #define it to a NULL macro) I get:
$ nm netbsd | egrep 'nfs_boot_bootparam|nfs_boot_rfc951|my_debug|curlwp'
0060bb04 D curlwp
0060b498 D my_debug_nfs_bootdhcp
0060b48c D my_debug_test_2
0060b494 D nfs_boot_bootparam
0060b490 D nfs_boot_rfc951
I use the same debug code in the same way in other files, but no problem
there. Do I oversee some C side effects or do we have a toolchain problem?
With the debug code removed I don't get a DTLB miss trap. Then it hangs
in cpu_switchto when writing NULL to curlwp. (Third assembler instruction.)
--
tschüß,
Jochen
Homepage: http://www.unixag-kl.fh-kl.de/~jkunz/