Port-mips archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD/sgimips 10.0 RELEASE on SGI Challenge S: INSTALL32_IP2x works, GENERIC32_IP2x kernel panics on boot after install



> On May 20, 2024, at 3:51 PM, Michael <macallan1888%gmail.com@localhost> wrote:
> 
> I don't have a CDROM drive on my Indy, I always installed everything by
> netbooting it.

I don’t have a CDROM attached, either.  I was originally trying to do an in-place upgrade, having downloaded the installation kernel image to the local disk.  Somehow I must have accidentally deleted that image before rebooting. I wanted to avoid having to netboot since my home network is rather complicated and I’ve reworked the DHCP architecture since the last time I did anything with net booting, probably 10+ years ago.  So I pulled out the ZuluSCSI RP2040 that I had lying around, and just loaded up the ISO image on that.

>>> NetBSD/sgimips 10.0 Bootstrap, Revision 1.5 (Thu Mar 28 08:33:33
>>> UTC 2024) 
>>> devopen: scsi(0)disk(3)rdisk(0)partition(0) type scsi file
>>> netbsd-GENERIC32_IP2x 5004432+126464 [270656+264073]=0x567868
>>> Starting at 0x88069000
>>> 
>>> nsym 0x1 ssym 0x88544a90 esym 0x885c7868
>>> 
>>> Exception: <vector=UTLB Miss>
>>> Status register: 0x2<IPL=8,MODE=KERNEL,EXL>
>>> Cause register: 0x30008008<CE=3,IP8,EXC=RMISS>
>>> Exception PC: 0x0, Exception RA: 0x882e3c0c
>>> exception, bad address: 0x0
>>> Saved user regs in hex (&gpda 0xa8740e48, &_regs 0xa8741048):
>>> arg: a8740000 5f 0 2
>>> tmp: a8740000 88510000 2764 41ff 885870dc 885870dc 8848c01c
>>> 8853b200 sve: a8740000 0 0 0 0 0 0 0
>>> t8 a8740000 t9 0 at 0 v0 0 v1 0 k1 88520000
>>> gp a8740000 fp 0 sp 0 ra 0
>>> 
>>> PANIC: Unexpected exception
>>> 
>>> [Press reset or ENTER to restart.]
> 
> If I read this correctly something jumped to NULL in early kernel
> startup. Shouldn't be too difficult to find.

Yeah, that’s what I suspected; I noticed that the ecoff kernel log output also has vaddr=0:

>> [   1.0000030] pid 0(system): trap: cpu0, TLB miss (load or instr. fetch) in kernel mode                                                                              
>> [   1.0000030] status=0xf003, cause=0x8, epc=0x882b1c3c, vaddr=0                                                                                                      
>> [   1.0000030] tf=0x8804bbe8 ksp=0x8804bc88 ra=0x882b1f64 ppl=0                                                                                                       
>> [   1.0000030] kernel: TLB miss (load or instr. fetch) trap                                                                                                           

I didn’t have any development environment set up to investigate, though.

>> Since the installation image works, but booting the installed kernel
>> fails, I’m guessing there’s something wrong with the GENERIC32_IP2x
>> config?
> 
> These kernels should differ only in the amount of drivers /
> non-essential kernel subsystems built into them, the crash happens
> *way* early, before any of those should make a difference. I wonder if
> the kernel image got too fat and we hit some hidden firmware limit...

That sounds like a good theory, which I had not considered.  This seems like something that could explain both panics I’ve observed with the NetBSD 10 kernel variants, and even possibly the problem that I observed back in 2012 with NetBSD 5.1.2.

If there is such a limit, and it can be identified/confirmed, I would advocate for putting a check in the kernel build process so that it fails if this limit is exceeded.

>> Does someone have the ability to build and post a standalone kernel
>> equivalent to the INSTALL32_IP2x config? I’m hoping that would allow
>> me to boot the system.
> 
> As said above, I'll dust off my R5k Indy, last time I checked it worked
> just fine. That was a few years ago though, and it will take me a while.

That would be awesome, thanks.  It would be good to see if the behavior is different.  I also have an R5K Indy in the closet, but it’s not easily accessible and never had NetBSD on it.

> What I would do in your position is:
> - setup netboot - IIRC it's really just bootp/dhcp, load kernel via tftp, mount root via nfs.
> - build an INSTALL32_IP2x kernel, try to netboot it
> - INSTALL* is just GENERIC* with a bunch of 'no *' statements to strip
>  it down. Enable a few of them, try again. If my theory is correct it
>  should start crashing when the kernel image hits a certain size
> - you probably don't want all the stuff disabled in INSTALL32_IP2x, if
>  you get to an image that's small enough to work and contains what you
>  need, I'd run with that for the time being.

Makes sense.  I started trying to get a build environment set up earlier today, and finally got the toolchain built after working around this problem:

toolchain/58271: external/gpl3/gcc/dist/gcc/system.h: Early inclusion of "safe-type.h" causes toolchain build failure on macOS 14.5
<https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=58271>

I confirmed that my tftp server is still configured; I’ll have to see what I can do about bootpd/dhcp.

In hindsight, I shouldn’t have attempted this upgrade, because I didn’t really have time to spend a day or more on it right now, but I had forgotten that I ran into trouble last time I tried it.

It would be good to get this sorted out, though, especially if it leads to someone making a change in NetBSD that would avoid this problem in the future.

Thanks for your help,

Tim



Home | Main Index | Thread Index | Old Index