Subject: Re: 4.99.17 still panics on TS7250
To: None <ali@df.lth.se>
From: Chris Gilbert <chris@dokein.co.uk>
List: port-arm
Date: 04/12/2007 00:30:54
On Wed, 11 Apr 2007 01:20:05 +0100
Chris Gilbert <chris@dokein.co.uk> wrote:
> Anders Lindgren wrote:
> > Ok, noticing there have been a lot of updates in sys/kern etc during
> > easter, I cvs up'd tonight and rebuilt a complete distribution, put a
> > copy of the TS7200 epe0 kernel in DESTDIR, MAKEDEV'd its /dev etc and
> > TFTP-booted the corresponding netbsd-epe0.bin image. Still *boom* with
> > an unmodified kernel.
> >
> > Noticed the following interesting tidbits:
> >
> > With default TS7200 kernel, at:
> >
> > ---8<---
> > nfs_boot: my_addr=192.168.1.12
> > nfs_boot: my_mask=255.255.255.0
> > nfs_boot: gateway=192.168.1.1
> > root on 192.168.1.6:/export/tsarm
> > /etc/rc.conf is not configured. Multiuser boot aborted.
> > Enter pathname of shell or RETURN for /bin/sh:
> > ---8<---
> >
> > If I press return or type /bin/sh, I get an immediate "locking against
> > myself" panic as described earlier.
> >
> > If I type "/bin/ksh" instead... it works.
> >
> > With an "opions LOCKDEBUG" kernel, I don't seem to get a kernel panic
> > at all; at least I can configure rc and customize some /etc files with
> > vi, create a user and set passwords, set time with ntpdate, and boot all
> > the way to multi-user and run "find /" on the entire fs without problems
> > -- seems to work ok so far. Without it, I get the mutex error panic
> > pretty much instantly on attempt to start multiuser boot.
> >
> > The lock the kernel is crashing on (via sys_read ... pipe_read) is:
> >
> > COMMON 0x00000000c0516d8c 0x118 kern_synch.o
> > 0x00000000c0516d8c sched_mutex
> >
> > ..which seems like a pretty bad thing to happen. :) I'm going to see
> > what happens if I boot a stock 3.1 release build instead later this week.
> >
> > Any help on how to proceed from here greatly appreciated.
>
> My best guess is that we've messed up locking on arm somewhere. I'll try
> to get time to fully boot an arm box with -current and see if I can
> repro this.
>
> It's odd that LOCKDEBUG makes it go away which suggests a timing issue,
> the only arm code I can find that uses LOCKDEBUG is the pmap code, and
> this seem unrelated to to that. Although it's worth a shot, add a
> #define LOCKDEBUG to pmap.c and see if the problems go away.
>
> cpuswitch.S does make calls to sched_lock and unlock, but I'm not sure
> if this is the same mutex or not.
>
> Might also be worth asking on tech-kern, see if anyone else has seen this.
I've just sync'd and built a fresh cats kernel, I'm not seeing any problems (I probably need to update the userland as it's probably quite old)
can you try a non-lockdebug kernel with:
option ARM_LOCK_CAS_DEBUG
It'll enable some event counters, which you'll be able to see in ddb with show event
No idea if it'll help, but it might provide a bit more information.
Are you able to try with a local/usb disk, rather than nfs? See if it's something to do with nfs.
Thanks,
Chris