Subject: Re: Boot Floppy Image Available
To: Curt Sampson <cjs@portal.ca>
From: Andrew Gallatin <gallatin@cs.duke.edu>
List: port-alpha
Date: 12/08/1997 17:32:32
Curt Sampson writes:
> On Mon, 8 Dec 1997, Andrew Gallatin wrote:
>
> > Well, I've finally had time to play with it, and the problem appears
> > to be isolated to the kernel on that floppy. Neither the generic
> > kernel from the last 1.3-ALPHA snapshot (NetBSD 1.3_ALPHA (GENERIC)
> > #2: Sat Nov 15 17:04:50 PST 1997) nor a kernel built from this
> > morning's source tarballs exhibits the problem.
>
> Did you build a GENERIC or an INSTALL kernel from this morning's
> source tarballs?
It was a GENERIC kernel that I slimmed down to match my hardware.
It seems that building & booting the INSTALL kernel exhibits the same
problem -- the interrupt dispatch code jumps into never-never land
immediately after bringing up de1.
...
de0 at pci1 dev 0 function 0
de0: interrupting at kn20aa irq 16
de0: DEC 21040 [10Mb/s] pass 2.3
de0: address 08:00:2b:e7:e6:d6
...
de1 at pci0 dev 9 function 0
de1: interrupting at kn20aa irq 12
de1: DEC DE500-XA 21140 [10-100Mb/s] pass 1.1
de1: address 00:00:f8:00:99:ba
Looking at kn20aa_pci_intr[] in a coredump, I see:
For de0:
(gdb) p kn20aa_pci_intr[16]->intr_q->tqh_first
$35 = (struct alpha_shared_intrhand *) 0xfffffe004a597e80
(gdb) p *kn20aa_pci_intr[16]->intr_q->tqh_first
$36 = {ih_q = {tqe_next = 0x0, tqe_prev = 0xfffffe004a59be00},
ih_fn = 0xfffffc0000425728 <tulip_intr_normal>, ih_arg = 0xfffffe004a59f000,
ih_level = 2}
For de1:
(gdb) p kn20aa_pci_intr[12]->intr_q->tqh_first
$37 = (struct alpha_shared_intrhand *) 0xfffffe004a597580
(gdb) p *kn20aa_pci_intr[12]->intr_q->tqh_first
$38 = {ih_q = {tqe_next = 0xfffffc000077f318, tqe_prev = 0xfffffc000077f470},
ih_fn = 0xfffffc000077f628 <cia_configuration+792>,
ih_arg = 0xfffffe0000000001, ih_level = 2}
The ih_fn, and most of ih_q is obviously bogus for de1.
> We've sometimes seen problems with kernels that don't have DIAGNOSTIC
> or other options turned on. If the INSTALL kernel still fails, the
> best thing to do is to start adding back the things in GENERIC but
> not INSTALL, one by one, until we figure out which option it is
> that causes things to break.
>
I've tried removing DIAGNOSTIC, adding the MEMORY_DISK options, and
changing the load address back to 0xfffffc0000300000 in my 'GENERIC'
kernel & it still works. I'll try some other options later.. Is it
possible that its just a size problem? The install kernel is
quite large:
text data bss dec hex filename
1330392 95184 357984 1783560 1b3708 /netbsd (my normal, slim kernel)
1624768 2217456 881200 4723424 4812e0 /netbsd.install (the failing kernel)
1330848 2192400 358048 3881296 3b3950 /netbsd.testing (the still-working kernel)
And I've found another problem -- if I boot NetBSD twice on this
machine with the tga driver built in, the second boot will always fail
with panic in the tga code:
tga0 at pci0 dev 11 function 0panic: tga_bt485_init: already have private struct
halted.
If I comment out the test at line 165 in tga_bt485.c, this panic goes
away. I don't care about this, since I'm running with a serial
console (hence the hackish "fix"), but thought I should bring it up.
Powering off the box also fixes it. Perhaps there's a missing bzero
someplace?
Drew