Port-mips archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD/sgimips 10.0 RELEASE on SGI Challenge S: INSTALL32_IP2x works, GENERIC32_IP2x kernel panics on boot after install



FYI—I finally got it to boot, with the ELF kernel nonetheless.

I made many changes trying to get remote-gdb working (mostly related to toolchain issues encountered trying to build a toolchain with gdb on macOS 14.5), but the key change seems to be the following; I was still hitting the same problem up until this change. In the end, I didn’t need to even use kgdb.

When trying to get remote-gdb working with `boot --d` (after many small changes to get the KGDB config to build), I noticed that kgdb_connect(0) in mach_init() was causing a similar panic to what I had been seeing all along with the ELF kernel.

I finally realized that the “PANIC: Unexpected exception” string here was coming from the ARCS PROM, not the NetBSD kernel itself, which made me wonder whether we were supposed to be running on PROM vectors at this point in the init process.

Checking machdep.c for evbmips (mipssim and loongson), the call to mips_vector_init() seemed to be done much earlier on those platforms, just ahead of uvm_md_init().  So, I tried moving that ahead, and voila—I was able to boot all the way to multiuser.

> Tue May 21 20:54:49 PDT 2024
> 
> NetBSD/sgimips (hoth.astro.net) (console)
> 
> login: root
> Password:
> May 21 20:54:55 hoth login: ROOT LOGIN (root) on tty console
> Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003,
>     2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013,
>     2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023,
>     2024
>     The NetBSD Foundation, Inc.  All rights reserved.
> Copyright (c) 1982, 1986, 1989, 1991, 1993
>     The Regents of the University of California.  All rights reserved.
> 
> NetBSD 10.0_STABLE (INSTALL32_IP2x) #6: Tue May 21 22:30:10 PDT 2024
> 
> Welcome to NetBSD!
> 
> You have new mail.
> Terminal type is vt100.                                                 
> We recommend creating a non-root account and using su(1) for root access.
> hoth# 


My next step will be to try rebuilding an unmodified GENERIC32_IP2x kernel to confirm that it’s still fine.

Here is the change.  Perhaps someone can confirm this on their own system?

> diff --git a/sys/arch/sgimips/sgimips/machdep.c b/sys/arch/sgimips/sgimips/machdep.c
> index 0a4bbd329..9ba1914ae 100644
> --- a/sys/arch/sgimips/sgimips/machdep.c
> +++ b/sys/arch/sgimips/sgimips/machdep.c
> @@ -286,6 +286,13 @@ mach_init(int argc, int32_t argv32[], uintptr_t magic, int32_t bip32)
>           cpu_setmodel("%s", arcbios_system_identifier);
>   +       /*
> +        * Copy exception-dispatch code down to exception vector.
> +        * Initialize locore-function vector.
> +        * Clear out the I and D caches.
> +        */
> +       mips_vector_init(NULL, false);
> +
>         uvm_md_init();
>           /* set up bootinfo structures */
> @@ -661,13 +668,6 @@ mach_init(int argc, int32_t argv32[], uintptr_t magic, int32_t bip32)
>          */
>         arcbios_tree_walk(sgimips_count_cpus, NULL);
>   -       /*
> -        * Copy exception-dispatch code down to exception vector.
> -        * Initialize locore-function vector.
> -        * Clear out the I and D caches.
> -        */
> -       mips_vector_init(NULL, false);
> -
>         /*
>          * Initialize error message buffer (at end of core).
>          */

I’m not familiar with enough details of the implementation to know why this would work on some configurations and not others, but I’m guessing it has to do with MMU implementation variants between the CPUs.

Thanks,

Tim McIntosh


> On May 21, 2024, at 4:24 PM, Tim McIntosh <tmcintos%eskimo.com@localhost> wrote:
> 
>> On May 21, 2024, at 12:26 AM, <iwama%t3.rim.or.jp@localhost> <iwama%t3.rim.or.jp@localhost> wrote:
>> I had a similar problem on NetBSD-9.3.
>> A similar problem occurred on O2 with NetBSD-9.4.
> 
> Thanks for this information!
> 
>> I get around this problem by putting the kernel in IRIX's / partition and starting it using sash.
> 
> Interesting. It looks like I still have sash in the volume header, but I no longer have any IRIX partitions on this machine, unfortunately.
> 
>> I suspect that there is a problem with the bootloader.
> 
> 
> I rolled back to the 5.0.2 boot loader and it appears to exhibit the same problems with the problematic kernels. E.g.
> 
>>>> boot
>> 
>> NetBSD/sgimips 5.0.2 Bootstrap, Revision 1.5
>> (builds%b7.netbsd.org@localhost, Sat Feb  6 21:26:53 UTC 2010)
>> 
>> devopen: scsi(0)disk(3)rdisk(0)partition(0) type scsi file netbsd.gdb
>> 3446288+113248 [243344+233661]=0x3d9ca4
>> 
>> Exception: <vector=UTLB Miss>
>> Status register: 0x2<IPL=8,MODE=KERNEL,EXL>
>> Cause register: 0x30008008<CE=3,IP8,EXC=RMISS>
>> Exception PC: 0x0, Exception RA: 0x881eadb4
>> exception, bad address: 0x0
>> Saved user regs in hex (&gpda 0xa8740e48, &_regs 0xa8741048):
>> arg: a8740000 0 5f 10
>> tmp: a8740000 883c0000 388c 1 88400be4 8832c15c 883bb960 1fae
>> sve: a8740000 0 0 0 0 0 0 0
>> t8 a8740000 t9 0 at 0 v0 0 v1 0 k1 883b0000
>> gp a8740000 fp 0 sp 0 ra 0
>> 
>> PANIC: Unexpected exception
>> 
>> [Press reset or ENTER to restart.]
> 
> 
> It is able to boot the old kernel that I still had on the local disk, which turned out to be a 5.1.2 kernel that I built myself (I must have done this to work around the problem that I had booting the 5.1.2 install image back in 2012):
> 
>> NetBSD 5.1.2 (GENERIC32_IP2x) #0: Mon May  7 04:43:09 PDT 2012
>>       root%hoth.astro.net@localhost:/usr/src/sys/arch/sgimips/compile/GENERIC32_IP2x
> 
> Of course, that kernel now fails to mount the updated root file system, presumably due to incompatible FFS changes since 5.1.2.
> 
> But if this is indeed a boot loader issue, it seems that it must be a longstanding bug that is sensitive to something about the kernel image that has changed since 5.1.2.
> 
> Thanks again,
> 
> Tim McIntosh
> 
> 
>> On May 21, 2024, at 12:26 AM, <iwama%t3.rim.or.jp@localhost> <iwama%t3.rim.or.jp@localhost> wrote:
>> 
>> Hi all,
>> 
>> A similar problem occurred on O2 with NetBSD-9.4.
>> 
>>> hinv
>>                 System: IP32
>>              Processor: 400 Mhz R12000, with FPU
>>   Primary I-cache size: 32 Kbytes
>>   Primary D-cache size: 32 Kbytes
>>   Secondary cache size: 2 Mbytes
>>            Memory size: 448 Mbytes
>>               Graphics: CRM, Rev C
>>                  Audio: A3 version 1
>>              SCSI Disk: scsi(0)disk(2)
>>             SCSI CDROM: scsi(0)cdrom(4)
>>              SCSI Disk: scsi(1)disk(2)
>>              SCSI Disk: scsi(1)disk(3)
>>> version
>> 
>> 
>> PROM Monitor (BE)
>> Tue Oct 22 10:58:00 PDT 2002 
>> VERSION 4.18
>> O2 R5K/R7K/R10K/R12K
>> IRIX 6.5.x IP32prom IP32PROM-v4
>> 
>>> boot -f scsi(1)disk(2)rdisk(0)partition(8)boot scsi(1)disk(2)rdisk(0)partition(0)netbsd
>> 54464+1792 entry: 0x80002000
>> 
>> NetBSD/sgimips 9.4 Bootstrap, Revision 1.5 (Sat Apr 20 13:32:22 UTC 2024)
>> 
>> devopen: scsi(1)disk(2)rdisk(0)partition(0) type scsi file netbsd
>> 7738672+131920 [396528+388532]=0x8419b0
>> 
>> Exception: <vector=Normal>
>> Status register: 0x2<IPL=8,MODE=KERNEL>
>> Cause register: 0x8008<CE=0,IP8,EXC=RMISS>
>> Exception PC: 0x0, Exception RA: 0x8048a508
>> Read TLB miss exception, bad address: 0x0
>> Saved user regs in hex (&gpda 0x81061838, &_regs 0x81061a38):
>> arg: 81070000 46 49 2
>> tmp: 81070000 305e 5be6 80842bfc 80842bfc 8083dd5c 80842bfc 4
>> sve: 81070000 0 0 0 0 0 0 0
>> t8 81070000 t9 0 at 0 v0 0 v1 0 k1 807c0000
>> gp 81070000 fp 0 sp 0 ra 0
>> 
>> PANIC: Unexpected exception
>> 
>> [Press reset or ENTER to restart.]
>> -- 
>> Yoshihiko Iwama
>> 
>> -----Original Message-----
>> From: iwama%t3.rim.or.jp@localhost <iwama%t3.rim.or.jp@localhost> 
>> Sent: Tuesday, May 21, 2024 3:42 PM
>> To: 'port-mips%NetBSD.org@localhost' <port-mips%NetBSD.org@localhost>
>> Subject: RE: NetBSD/sgimips 10.0 RELEASE on SGI Challenge S: INSTALL32_IP2x works, GENERIC32_IP2x kernel panics on boot after install
>> 
>> Hi all,
>> 
>> I had a similar problem on NetBSD-9.3.
>> 
>>>> hinv
>>                 System: IP22
>>              Processor: 200 Mhz R4400, with FPU
>>   Primary I-cache size: 16 Kbytes
>>   Primary D-cache size: 16 Kbytes
>>   Secondary cache size: 1024 Kbytes
>>            Memory size: 256 Mbytes
>>               Graphics: Indy 24-bit
>>              SCSI Disk: scsi(0)disk(1)
>>              SCSI Disk: scsi(0)disk(2)
>>              SCSI Disk: scsi(0)disk(3)
>>             SCSI CDROM: scsi(0)cdrom(6)
>>                  Audio: Iris Audio Processor: version A2 revision 4.1.0
>>>> boot
>> 
>> NetBSD/sgimips 9.3 Bootstrap, Revision 1.5 (Thu Aug  4 15:30:37 UTC 2022)
>> 
>> devopen: scsi(0)disk(3)rdisk(0)partition(0) type scsi file netbsd.ecoff
>> 4929840+127440=0x4b3930
>> [   1.0000000] [ Kernel symbol table invalid! ]
>> [   1.0000000] phys segment: 0xe000 @ 0x8002000
>> [   1.0000000] adding 0xe000 @ 0x8002000 to freelist 0
>> [   1.0000000] phys segment: 0x730000 @ 0x8010000
>> [   1.0000000] adding 0x58000 @ 0x8010000 to freelist 0
>> [   1.0000000] adding 0x22c000 @ 0x8514000 to freelist 0
>> [   1.0000000] phys segment: 0xf800000 @ 0x8800000
>> [   1.0000000] adding 0xf800000 @ 0x8800000 to freelist 0
>> [   1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
>> [   1.0000000]     2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
>> [   1.0000000]     2018, 2019, 2020, 2021, 2022
>> [   1.0000000]     The NetBSD Foundation, Inc.  All rights reserved.
>> [   1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
>> [   1.0000000]     The Regents of the University of California.  All rights reserved.
>> 
>> [   1.0000000] NetBSD 9.3 (GENERIC32_IP2x) #0: Thu Aug  4 15:30:37 UTC 2022
>> [   1.0000000] mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/sgimips/compile/GENERIC32_IP2x
>> [   1.0000000] total memory = 256 MB
>> [   1.0000000] (768 KB reserved for ARCS)
>> [   1.0000000] avail memory = 245 MB
>> [   1.0000000] WARNING: module error: built-in module sequencer can't find builtin dependency `midi'
>> [   1.0000000] WARNING: module error: built-in module sequencer prerequisite midi failed, error 2
>> [   1.0000000] mainbus0 (root): SGI-IP22 [SGI, 6909c4ea], 1 processor
>> [   1.0000000] cpu0 at mainbus0: MIPS R4400 CPU (0x460) Rev. 6.0 with MIPS R4010 FPC Rev. 0.0
>> [   1.0000000] cpu0: 48 TLB entries, 16MB max page size
>> [   1.0000000] cpu0: 16KB/16B direct-mapped L1 instruction cache
>> [   1.0000000] cpu0: 16KB/16B direct-mapped write-back L1 data cache
>> [   1.0000000] cpu0: 1024KB/128B direct-mapped write-back L2 unified cache
>> [   1.0000000] int0 at mainbus0 addr 0x1fbd9880
>> [   1.0000000] int0: bus 100MHz, CPU 200MHz
>> [   1.0000000] imc0 at mainbus0 addr 0x1fa00000: revision 3
>> [   1.0000000] gio0 at imc0
>> [   1.0000000] newport0 at gio0: SGI NG1 (board revision 1, cmap revision 5, xmap revision 5, vc2 revision 0), depth 24
>> [   1.0000000] wsdisplay0 at newport0 kbdmux 1
>> [   1.0000000] hpc0 at gio0: SGI HPC3 (onboard)
>> [   1.0000000] zsc0 at hpc0 offset 0x59830
>> [   1.0000000] zstty0 at zsc0 channel 1 (console i/o)
>> [   1.0000000] zstty1 at zsc0 channel 0
>> [   1.0000000] pckbc0 at hpc0 offset 0x59840
>> [   1.0000000] pckbd0 at pckbc0 (kbd slot)
>> [   1.0000000] wskbd0 at pckbd0 mux 1
>> [   1.0000000] pms0 at pckbc0 (aux slot)
>> [   1.0000000] wsmouse0 at pms0 mux 0
>> [   1.0000000] sq0 at hpc0 offset 0x54000: SGI Seeq 80c03
>> [   1.0000000] sq0: Ethernet address 08:00:69:09:c4:ea
>> [   1.0000000] wdsc0 at hpc0 offset 0x44000: WD33C93B (20.0 MHz clock, BURST DMA, SCSI ID 0)
>> [   1.0000000] wdsc0: microcode revision 0x0d, Fast SCSI
>> [   1.0000000] scsibus0 at wdsc0: 8 targets, 8 luns per target
>> [   1.0000000] haltwo0 at hpc0 offset 0x58000: HAL2 revision 4.1.0
>> [   1.0000000] audio0 at haltwo0: playback
>> [   1.0000000] audio0: slinear_be:16 2ch 48000Hz, blk 4096 bytes (21.3ms) for playback
>> [   1.0000000] spkr0 at audio0: PC Speaker (synthesized)
>> [   1.0000000] wsbell at spkr0 not configured
>> [   1.0000000] pi1ppc0 at hpc0 offset 0x59800
>> [   1.0000000] pi1ppc0: capabilities=0x8<PS2>
>> [   1.0000000] ppbus0 at pi1ppc0
>> [   1.0000000] ppbus0: No IEEE1284 device found.
>> [   1.0000000] lpt0 at ppbus0: port mode = 0x1<COMPATIBLE>
>> [   1.0000000] button0 at hpc0 offset 0x59850
>> [   1.0000000] dsclock0 at mainbus0 addr 0x1fbe0000
>> [   1.0000000] ioc0 at mainbus0 addr 0x1fbd9800: rev 0, machine Indy (Guinness), board rev 0
>> [   1.0000030] scsibus0: waiting 2 seconds for devices to settle...
>> [   1.0687213] sysctl: log 0x27bdffd8 root mismatch (0x8850c150)
>> [   1.2677407] pid 0(system): trap: cpu0, TLB miss (load or instr. fetch) in kernel mode
>> [   1.2677407] status=0x4ff03, cause=0x8, epc=0x88428100, vaddr=0
>> [   1.2677407] tf=0x88003c40 ksp=0x88003ce0 ra=0x88335f28 ppl=0x8850c81c
>> [   1.2677407] kernel: TLB miss (load or instr. fetch) trap
>> Stopped in pid 0.1 (system) at  88428100:       lbu     t0,0(a0)
>> db> q
>> panic: utlbmod: 0: no pte
>> [   1.2677407] cpu0: Begin traceback...
>> [   1.2677407] pid -2013251120 not found
>> [   1.2677407] cpu0: End traceback...
>> [   1.2677407] kernel: breakpoint trap
>> Stopped in pid 0.1 (system) at  88075e30:       jr      ra
>>              bdslot: nop
>> db>
>> 
>> 
>>>> boot
>> 
>> NetBSD/sgimips 9.3 Bootstrap, Revision 1.5 (Thu Aug  4 15:30:37 UTC 2022)
>> 
>> devopen: scsi(0)disk(3)rdisk(0)partition(0) type scsi file netbsd
>> 4929840+127440 [285568+277831]=0x55ca10
>> 
>> Exception: <vector=UTLB Miss>
>> Status register: 0x2<IPL=8,MODE=KERNEL,EXL>
>> Cause register: 0x30008008<CE=3,IP8,EXC=RMISS>
>> Exception PC: 0x0, Exception RA: 0x882f12cc
>> exception, bad address: 0x0
>> Local I/O interrupt register 1: 0x80 <VR/GIO2>
>> Local I/O interrupt register 2: 0x4 <>
>> Saved user regs in hex (&gpda 0xa8740e48, &_regs 0xa8741048):
>> arg: a8740000 5f 0 2
>> tmp: a8740000 27c0 41ef 88578cc8 88578cc8 88575028 88578cc8 4
>> sve: a8740000 0 0 0 0 0 0 0
>> t8 a8740000 t9 0 at 0 v0 0 v1 0 k1 88510000
>> gp a8740000 fp 0 sp 0 ra 0
>> 
>> PANIC: Unexpected exception
>> 
>> [Press reset or ENTER to restart.]
>> 
>> I suspect that there is a problem with the bootloader.
>> I get around this problem by putting the kernel in IRIX's / partition and starting it using sash.
>> -- 
>> Yoshihiko Iwama
>> 
>> -----Original Message-----
>> From: port-mips-owner%NetBSD.org@localhost <port-mips-owner%NetBSD.org@localhost> On Behalf Of Tim McIntosh
>> Sent: Tuesday, May 21, 2024 2:21 PM
>> To: Michael <macallan1888%gmail.com@localhost>
>> Cc: port-mips%NetBSD.org@localhost
>> Subject: Re: NetBSD/sgimips 10.0 RELEASE on SGI Challenge S: INSTALL32_IP2x works, GENERIC32_IP2x kernel panics on boot after install
>> 
>>> On May 20, 2024, at 6:36 PM, Tim McIntosh <tmcintos%eskimo.com@localhost> wrote:
>>> 
>>>> What I would do in your position is:
>>>> - setup netboot - IIRC it's really just bootp/dhcp, load kernel via tftp, mount root via nfs.
>>>> - build an INSTALL32_IP2x kernel, try to netboot it
>>>> - INSTALL* is just GENERIC* with a bunch of 'no *' statements to strip
>>>> it down. Enable a few of them, try again. If my theory is correct it
>>>> should start crashing when the kernel image hits a certain size
>>>> - you probably don't want all the stuff disabled in INSTALL32_IP2x, if
>>>> you get to an image that's small enough to work and contains what you
>>>> need, I'd run with that for the time being.
>> 
>> I don’t have a viable netboot environment currently, but what I was able to do to transfer files to the local disk was boot the installation image and then use FTP to transfer files via a temporary Python-based FTP server running on my development machine. Tedious, but it works.
>> 
>> So I built a modified INSTALL32_IP2x kernel with the following changes:
>> 
>>> --- a/sys/arch/sgimips/conf/INSTALL32_IP2x
>>> +++ b/sys/arch/sgimips/conf/INSTALL32_IP2x
>>> @@ -8,11 +8,11 @@ include       "arch/sgimips/conf/GENERIC32_IP2x"
>>> makeoptions    COPTS="-Os -mmemcpy"
>>> # Enable the hooks used for initializing the root memory-disk.
>>> -options         MEMORY_DISK_HOOKS
>>> -options         MEMORY_DISK_IS_ROOT     # force root on memory disk
>>> -options         MEMORY_DISK_SERVER=0    # no userspace memory disk support
>>> -options         MEMORY_DISK_ROOT_SIZE=6600 # size of memory disk in blocks (3300k)
>>> -options         MEMORY_DISK_RBFLAGS=RB_SINGLE   # boot in single-user mode
>>> +#options         MEMORY_DISK_HOOKS
>>> +#options         MEMORY_DISK_IS_ROOT     # force root on memory disk
>>> +#options         MEMORY_DISK_SERVER=0    # no userspace memory disk support
>>> +#options         MEMORY_DISK_ROOT_SIZE=6600 # size of memory disk in blocks (3300k)
>>> +#options         MEMORY_DISK_RBFLAGS=RB_SINGLE   # boot in single-user mode
>>> # shrink kernel since ARC BIOS seems to have 8MB limit
>>> options        FFS_NO_SNAPSHOT
>> 
>> I tried loading the ELF debug version and it appears to panic in the same way:
>> 
>>>>> boot
>>> 
>>> NetBSD/sgimips 10.0 Bootstrap, Revision 1.5 (Thu Mar 28 08:33:33 UTC 2024)
>>> 
>>> devopen: scsi(0)disk(3)rdisk(0)partition(0) type scsi file netbsd.gdb
>>> 3446288+113248 [243344+233661]=0x3d9e04
>>> 
>>> Exception: <vector=UTLB Miss>
>>> Status register: 0x2<IPL=8,MODE=KERNEL,EXL>
>>> Cause register: 0x30008008<CE=3,IP8,EXC=RMISS>
>>> Exception PC: 0x0, Exception RA: 0x881eadb4
>>> exception, bad address: 0x0
>>> Saved user regs in hex (&gpda 0xa8740e48, &_regs 0xa8741048):
>>> arg: a8740000 0 5f 10
>>> tmp: a8740000 883c0000 388c 1 88400d44 8832c15c 883bb960 1fae
>>> sve: a8740000 0 0 0 0 0 0 0
>>> t8 a8740000 t9 0 at 0 v0 0 v1 0 k1 883b0000
>>> gp a8740000 fp 0 sp 0 ra 0
>>> 
>>> PANIC: Unexpected exception
>>> 
>>> [Press reset or ENTER to restart.]
>> 
>> I’ll have to try the ECOFF variant and attempt debugging when I get some more time to investigate this.
>> 
>> Thanks,
>> Tim



Home | Main Index | Thread Index | Old Index