Subject: AS1200 SMP instability
To: None <port-alpha@netbsd.org>
From: David Hopper <dhop@nwlink.com>
List: port-alpha
Date: 02/05/2002 11:21:40
I've narrowed the search down for the source of the instability on this
tincup platform. By removing the multiprocessor code from the kernel, I am
able to survive full builds now without the disk crashes that I had
mentioned earlier on the list.
The instability dates all the way back to when the multiprocessor code was
enabled (which makes a lot of sense now that I think of it-- I ran SMP
since the changes were committed). In other words, my kernel right now is
1.5ZA (Jan 26), but the errors have been going back all the way through
1.5Y (November) and beyond.
Since going single-processor, I'm solid.
The first clue that the MP code was the culprit was this halt I received
yesterday on cc1plus during a build; it's different than the other debugger
halts outlined previously:
db{0}> show registers
v0 0x6
t0 0x1
t1 0x1
t2 0x10001 rn+0xffe1
t3 0xf423f rn+0xf421f
t4 0xfffffc000062d798 lasttime.132
t5 0xfffffc00005e0168 microtime_slock.133
t6 0xfffffc0004df7640 end+0x47aefe8
t7 0xfffffc0004aaf14c end+0x4466af4
s0 0x4
s1 0x8 rettmp
s2 0x102 rn+0xe2
s3 0x200086d20000
s4 0x200099122000
s5 0x31 rn+0x11
s6 0x12029b91c
a0 0x6
a1 0x1
a2 0x199 rn+0x179
a3 0
a4 0
a5 0
t8 0x1e framesz+0xe
t9 0xfffffc00004f7d30 microtime+0xb0
t10 0x1116faa6ca6c0
t11 0x1fc1e058
ra 0xfffffe0023eb1688
t12 0xfffffc0000392280 spinlock_acquire_count
at 0xfffffc00005e0690 sched_whichqs
gp 0xfffffc00005d4fb8 special_symbols+0x8160
sp 0xfffffe0023eb1550
pc 0xfffffe0023eb168c
ps 0x6
ai 0x1fc1e058
pv 0xfffffc0000392280 spinlock_acquire_count
0xfffffe0023eb168c: call_pal halt
db{0}> ps
PID PPID PGRP UID S FLAGS COMMAND WAIT
>22799 22797 22796 0 7 0X84006 cc1plus
22797 22796 22796 0 3 0x84086 c++ wait
22796 22790 22796 0 3 0x84086 sh wait
22790 22782 22233 0 3 0x84086 nbmake select
22782 22237 22233 0 3 0x84086 sh wait
22237 222333 22233 0 3 0x84086 nbmake wait
22233 22232 22233 0 3 0x84086 sh wait
22232 22231 241 0 3 0x84086 nbmake select
22231 21324 241 0 3 0x84086 sh wait
21324 21323 241 0 3 0x84086 nbmake wait
21323 21320 241 0 3 0x84086 sh wait
21320 21319 241 0 3 0x84086 nbmake wait
21319 20599 241 0 3 0x84086 sh wait
20599 20598 241 0 3 0x84086 nbmake wait
20598 20597 241 0 3 0x84086 sh wait
20597 20596 241 0 3 0x84086 nbmake wait
20596 1102 241 0 3 0x84086 sh wait
1102 241 241 0 3 0x84086 nbmake wait
241 237 241 0 3 0x84086 sh wait
237 234 237 0 3 0x84086 tcsh pause
234 210 234 0 3 0x84086 csh pause
228 224 228 0 3 0x84086 tcsh ttyin
224 212 224 0 3 0x84086 csh pause
212 211 212 150 3 0x84086 tcsh pause
211 187 187 0 3 0x80084 sshd select
210 1 210 150 3 0X84086 tcsh pause
209 178 178 32767 3 0x80184 httpd lockf
208 178 178 32767 3 0x80184 httpd lockf
206 1 206 0 3 0x80084 cron nanosle
205 178 178 32767 3 0x80184 httpd lockf
204 178 178 32767 3 0x80184 httpd lockf
203 178 178 32767 3 0x80184 httpd select
198 1 198 0 3 0x80084 inetd select
190 1 190 0 3 0x80084 sendmail select
187 1 187 0 3 0x80084 sshd select
178 1 178 0 3 0x80085 httpd select
177 160 8 1000 3 0x84186 mysqld select
160 1 8 0 3 0x84086 sh wait
145 1 145 0 3 0x80084 ntpd pause
76 1 76 0 3 0x80084 syslogd select
14 0 0 0 7 0xa0204 raid
7 0 0 0 3 0xa0204 aiodoned aiodone
6 0 0 0 3 0xa0204 ioflush drainvp
5 0 0 0 3 0x20204 reaper reaper
4 0 0 0 3 0xa0204 pagedaemon pgdaemo
3 0 0 0 3 0xa0204 isp0:0 sccomp
2 0 0 0 3 0xa0204 siop0:0 sccomp
1 0 1 0 3 0x84084 init wait
0 -1 0 0 3 0xa0204 swapper schedul
spinlock_acquire_count: 27bb0024