Subject: Re: 1.5S vs sparc/MP
To: None <pk@cs.few.eur.nl>
From: Simon J. Gerraty <sjg@quick.com.au>
List: tech-smp
Date: 03/07/2001 01:17:12
> I fixed a few things last week-end that might help a bit against those
> watchdogs. Also, the kernel lock is now acquired when a process enters
> kernel mode (where it matters, I hope). So, modulo cache flushe issues,
> a MP kernel should run again without violating the locking protocol.
I've had mixed results today. No watchdog resets, but without any of
the printf's in my semaphore routines, I again get a "lockmgr: no
context" panic plus the machine locks solid - cannot get to ddb. I'm
wondering if the semaphores are helping at all...
Matt Green, booted a kernel with the semaphore stuff on his SS20 with
dual supersparcs (which doesn't need them so is a good test to see if
I broke anything) and it seemed to work ok, so I've uploaded the
actual kernel that panics and locks up on my machine to
ftp.netbsd.org:~sjg/tmp/netbsd.mp to see what it does on his, feel
free to try it ;-)
If I enable the printfs for say smp_cache_flush only, all runs fine
but we eventually hang after getting to:
IPsec: Initialized Security Association Processing.
root on sd0a dumps on sd0b
root file system type: ffs
...
{0}sema_init(0xf02b1534, 0, semcflush)
{0}sema_signal(0xf02b1534,1) == 1
{0}sema_wait(0xf02b1534) == 0
{1}sema_signal(0xf02b1534,1) == 1
{0}sema_wait(0xf02b1534) == 0
{0}sema_clear(0xf02b1534) count==0, sleepers==0
[BREAK]
Stopped at cpu_Debugger+0x4: jmpl [%o7 + 0x8], %g0
db{0}> ps
PID PPID PGRP UID S FLAGS COMMAND WAIT
5 0 0 0 3 0xa0204 aiodoned semvseg
4 0 0 0 3 0xa0204 ioflush syncer
3 0 0 0 3 0x20204 reaper reaper
2 0 0 0 3 0xa0204 pagedaemon pgdaemo
1 0 1 0 3 0x84004 init vmmaplk
0 -1 0 0 3 0xa0204 swapper schedul
Does this suggest that the reaper has the kernel lock? Doesn't look
like he'd ever hold it when he goes to sleep.
All the others have P_BIGLOCK set though. Interesting that aiodoned is
shown sleeping on semvsegment, which is cache_semaphore when used for
smp_vcache_flush_segment(), yet the semaphore was last used by
smp_cache_flush() and is clear (see below). So not sure why aiodoned
is sleeping on semvsegment still. Because he couldn't be woken up due
to P_BIGLOCK? Hmm, should sema_signal() be doing anything with the
kernel lock? Should probably have the sleeper decrement sleepers when
he wakesup, rather than when sema_signal calls wakeup_one(), I'll try
that shortly.
db{0}> x/x cache_semaphore
cache_semaphore: 0
db{0}>
cache_semaphore+0x4: f02371c0
db{0}>
cache_semaphore+0x8: 0
db{0}>
cache_semaphore+0xc: 0
db{0}>
cache_semaphore+0x10: 0
db{0}>
cachestats: 88
db{0}> x/s f02371c0
openboot_special4m.194+0x498: semcflush
db{0}> trace
zsc_intr_hard(0x8, 0xf0600ed0, 0xf0254800, 0xfe000000, 0x809c4000, 0xa00) at zsc
_intr_hard+0x68
zshard(0x0, 0xf01a514c, 0x0, 0xf00, 0xf0002000, 0xf00) at zshard+0x40
sparc_interrupt44c(0x1e9000e5, 0xf0293c00, 0xfe000004, 0x0, 0xf0002000, 0xf00020
00) at sparc_interrupt44c+0x120
mi_switch(0xf605d588, 0x80, 0xf606b220, 0xf605d588, 0xf0257a40, 0x3) at mi_switc
h+0x1cc
ltsleep(0x0, 0x28, 0xf02122c8, 0x64, 0x0, 0xf0269800) at ltsleep+0x2b0
sched_sync(0xf0258400, 0xf0258400, 0xf0254400, 0xf0212000, 0xf0259800, 0xf026340
0) at sched_sync+0x210
proc_trampoline(0x0, 0x0, 0x0, 0x0, 0x0, 0x0) at proc_trampoline+0x8
db{0}>
At least we got as far as exec'ing init (sort of ;-)
Anything jump out you?
--sjg