Port-powerpc archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
locking problems since 7.0?
Hi,
I'm running two sandpoint NAS servers 24h and repeatedly observed two
problems since upgrading them to 7.0:
1. Sometimes /dev/null turns into a regular file with size 0 during reboot,
so running multiuser fails and I have to attach a serial console to fix the
problem.
2. Sometimes the file system locks up over night, usually during /etc/daily.
I can still ping the system, but everything accessing the file system, like
a log in, locks up.
For the first case I inserted some debugging output in /etc, but only to
find out that /dev/null is still correct shortly before the reboot/halt,
but is changed into a regular when the system comes up again.
For the second case I have established a constant serial connection with a
second server, so I can drop into ddb(4) and analyze the problem when it
happens. I also equipped one of the sandpoint servers with a LOCKDEBUG
kernel (no problem since then).
Attached you find output from a ddb-session after such a lock-up. It seems
like it happened while executing find(1) during /etc/daily. Many processes,
like find, are in "biowait", waiting for "softbio", which is stuck in
"tstile" and cannot handle softbio interrupts.
Anybody seen similar problems since 7.0, or has any idea how to debug or fix
that?
--
Frank Wille
db> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
1997 1 3 0 0 2be7a00 find biowait
20743 1 3 0 80 1fd2300 postdrop netio
15398 1 3 0 80 23c16e0 sendmail pipe_rd
17386 1 3 0 80 1fd25c0 tee pipe_rd
3380 1 3 0 80 3c3e0c0 sh wait
12519 1 3 0 80 2be7480 sh wait
19166 1 3 0 80 38ee940 cron pipe_rd
11696 5 3 0 0 3b5e0e0 slave tstile
11696 4 3 0 0 38ee3c0 slave tstile
11696 3 3 0 0 23c1c60 slave biowait
11696 2 3 0 0 c0b1a0 slave tstile
11696 1 3 0 80 3b5e3a0 master select
15768 1 3 0 80 31c6c20 rpc.lockd select
7473 1 3 0 80 1fd2040 rpc.statd select
2898 1 3 0 80 38ee100 dhcpd select
134 1 3 0 0 3edc0a0 getty biowait
700 1 3 0 80 2c05980 httpd socket
719 1 3 0 80 3c3e380 httpd socket
725 1 3 0 80 3c3e900 httpd socket
899 1 3 0 80 3c3e640 httpd socket
894 1 3 0 80 3c3ebc0 httpd socket
651 1 3 0 80 c0b720 httpd select
660 11 3 0 80 c0b9e0 mysqld sigwait
660 10 3 0 80 122a700 mysqld parked
660 9 3 0 80 c0bca0 mysqld select
660 8 3 0 80 122a180 mysqld select
660 7 3 0 80 122a440 mysqld select
660 5 3 0 80 122a9c0 mysqld parked
660 4 3 0 80 122ac80 mysqld parked
660 3 3 0 80 23c1420 mysqld parked
660 2 3 0 80 23c1160 mysqld parked
660 1 3 0 80 2c05140 mysqld select
603 1 3 0 0 23c19a0 cron biowait
543 1 3 0 1000000 2c05400 qmgr tstile
562 1 3 0 80 2c05c40 inetd kqueue
601 1 3 0 80 2c056c0 sh wait
602 1 3 0 1000000 31c63e0 master biowait
356 1 3 0 0 31c6120 sshd biowait
221 1 3 0 80 3b5ebe0 mountd select
173 1 3 0 0 3b5e920 rpcbind biowait
150 1 3 0 80 38eec00 syslogd kqueue
1 1 3 0 80 3edd340 init wait
0 44 5 0 200 2be7740 (zombie)
0 41 3 0 200 3b5e660 physiod physiod
0 40 3 0 200 3edc360 aiodoned aiodoned
0 39 3 0 200 3eddb80 ioflush biolock
0 38 3 0 200 3edc620 pgdaemon pgdaemon
0 35 3 0 200 3f39060 usb2 usbevt
0 34 3 0 200 3f39320 usb1 usbevt
0 33 3 0 200 3edd8c0 usbtask-dr usbtsk
0 32 3 0 200 3edd600 usbtask-hc usbtsk
0 31 3 0 200 3edc8e0 usb0 usbevt
0 30 3 0 200 3edd080 unpgc unpgc
0 29 3 0 200 3edcba0 vmem_rehash vmem_rehash
0 19 3 0 200 3f395e0 atabus1 atath
0 18 3 0 200 3f398a0 atabus0 atath
0 17 3 0 200 3f39b60 sysmon smtaskq
0 16 3 0 200 3f4e040 iic0 iicintr
0 15 3 0 200 3f4e300 pmfsuspend pmfsuspend
0 14 3 0 200 3f4e5c0 pmfevent pmfevent
0 13 3 0 200 3f4e880 sopendfree sopendfr
0 12 3 0 200 3f4eb40 nfssilly nfssilly
0 11 3 0 200 3f68020 cachegc cachegc
0 10 3 0 200 3f682e0 vrele vrele
0 9 3 0 200 3f685a0 vdrain tstile
0 8 3 0 200 3f68860 modunload mod_unld
0 7 3 0 200 3f68b20 xcall/0 xcall
0 32 3 0 200 3edd600 usbtask-hc usbtsk
0 31 3 0 200 3edc8e0 usb0 usbevt
0 30 3 0 200 3edd080 unpgc unpgc
0 29 3 0 200 3edcba0 vmem_rehash vmem_rehash
0 19 3 0 200 3f395e0 atabus1 atath
0 18 3 0 200 3f398a0 atabus0 atath
0 17 3 0 200 3f39b60 sysmon smtaskq
0 16 3 0 200 3f4e040 iic0 iicintr
0 15 3 0 200 3f4e300 pmfsuspend pmfsuspend
0 14 3 0 200 3f4e5c0 pmfevent pmfevent
0 13 3 0 200 3f4e880 sopendfree sopendfr
0 12 3 0 200 3f4eb40 nfssilly nfssilly
0 11 3 0 200 3f68020 cachegc cachegc
0 10 3 0 200 3f682e0 vrele vrele
0 9 3 0 200 3f685a0 vdrain tstile
0 8 3 0 200 3f68860 modunload mod_unld
0 7 3 0 200 3f68b20 xcall/0 xcall
0 6 1 0 200 3f6c000 softser/0
0 5 1 0 200 3f6c2c0 softclk/0
0 4 3 0 200 3f6c580 softbio/0 tstile
0 3 1 0 200 3f6c840 softnet/0
0 > 2 7 0 201 3f6cb00 idle/0
0 1 3 0 200 304220 swapper uvm
db> show event
evcnt type 0: vmcmd kills = 648284
evcnt type 0: vmcmd calls = 4772700
evcnt type 0: vmemev bt_inuse = 240
evcnt type 0: vmemev bt_count = 1660
evcnt type 0: vmemev bt_pages = 10
evcnt type 0: pdpolicy reactanon = 29378421
evcnt type 0: pdpolicy reactfile = 2170945
evcnt type 0: pdpolicy reactexec = 58945706
evcnt type 0: softint net/0 = 6731232
evcnt type 0: softint net block/0 = 635
evcnt type 0: softint bio/0 = 535231551
evcnt type 0: softint bio block/0 = 1607207
evcnt type 0: softint clk/0 = 677594406
evcnt type 0: softint clk block/0 = 93
evcnt type 0: softint ser/0 = 4033653
evcnt type 0: callout late/0 = 1047121
evcnt type 0: crosscall unicast = 2
evcnt type 0: crosscall broadcast = 253866
evcnt type 0: namecache entries scanned = 4954107613
evcnt type 0: namecache entries collected = 4286155112
evcnt type 0: namecache over scan target = 5467673
evcnt type 0: namecache under scan target = 38964477
evcnt type 0: namecache forced reclaims = 16177057
evcnt type 1: cpu0 clock = 2825668254
evcnt type 2: cpu0 traps = 309712409
evcnt type 2: cpu0 kernel DSI traps = 78352599
evcnt type 2: cpu0 user DSI traps = 102150711
evcnt type 2: cpu0 user ISI traps = 46654819
evcnt type 2: cpu0 system call traps = 10299863570
evcnt type 2: cpu0 PGM traps = 424
evcnt type 2: cpu0 FPU unavailable traps = 14043008
evcnt type 2: cpu0 FPU context switches = 12724359
evcnt type 1: mpcpic irq 40 = 3695739
evcnt type 1: mpcpic irq 41 = 366925
evcnt type 1: mpcpic irq 16 = 549630106
evcnt type 1: mpcpic irq 17 = 222024
evcnt type 1: mpcpic irq 18 = 9479612
"find" process:
db> show proc 0t1997
find: pid 1997 proc 3a46398 vmspace/map 3e881e0 flags 4000
lwp 1 2be7a00 pcb d2430000
stat 3 flags 0 cpu 0 pri 42
wmesg biowait wchan 329be34
db> trace/t 0t1997
trace: pid 1997 lid 1 at 0xd2433b70
0xd2433bd0: at cpu_switchto+0x30
0xd2433be0: at mi_switch+0x17c
0xd2433c20: at sleepq_block+0xe0
0xd2433c40: at cv_wait+0x9c
0xd2433c60: at biowait+0x38
0xd2433c70: at bread+0x30
0xd2433c90: at ffs_read+0x40c
0xd2433d10: at VOP_READ+0x3c
0xd2433d40: at ufs_readdir+0x190
0xd2433dd0: at VOP_READDIR+0x44
0xd2433e00: at vn_readdir+0x128
0xd2433e90: at sys___getdents30+0x98
0xd2433ec0: at syscall_plain+0x1cc
0xd2433f20: user SC trap #390 by 0xfdf231cc: srr1=0xd032
r1=0xffffdb10 cr=0x28224042 xer=0x20000000 ctr=0xfdf231c4
db> show sched_qs
FL: flags
#define LW_IDLE 0x00000001 /* Idle lwp. */
#define LW_LWPCTL 0x00000002 /* Adjust lwpctl in userret */
#define LW_SINTR 0x00000080 /* Sleep is interruptible. */
#define LW_SYSTEM 0x00000200 /* Kernel thread */
#define LW_WSUSPEND 0x00020000 /* Suspend before return to user */
#define LW_BATCH 0x00040000 /* LWP tends to hog CPU */
#define LW_WCORE 0x00080000 /* Stop for core dump on return to user
#*/
#define LW_WEXIT 0x00100000 /* Exit before return to user */
#define LW_PENDSIG 0x01000000 /* Pending signal for us */
#define LW_CANCELLED 0x02000000 /* tsleep should not sleep */
#define LW_WREBOOT 0x08000000 /* System is rebooting, please suspend */
#define LW_UNPARKED 0x10000000 /* Unpark op pending */
#define LW_RUMP_CLEAR 0x40000000 /* Clear curlwp in RUMP scheduler */
#define LW_RUMP_QEXIT 0x80000000 /* LWP should exit ASAP */
ST: RQ(run queue), SQ(sleep squeue)
TCI: target cpu (not used)
LRTICKS: ticks since last run
Run-queue (CPU = 0):
pid.lid = 0.2, r_count = 0, r_avgcount = 0, maxpri = 0, mlwp = 0x0
bitmap[6] => [ 0 (0x0) ]
bitmap[5] => [ 0 (0x0) ]
bitmap[4] => [ 0 (0x0) ]
bitmap[3] => [ 0 (0x0) ]
bitmap[2] => [ 0 (0x0) ]
bitmap[1] => [ 0 (0x0) ]
bitmap[0] => [ 0 (0x0) ]
LID PRI EPRI FL ST LWP CPU TCI LRTICKS
/- 1997 (find)
| 1 42 85 0x00000000 SQ 0x2be7a00 0 -1 48189742
/- 20743 (postdrop)
| 1 43 85 0x00000080 SQ 0x1fd2300 0 -1 48483584
/- 15398 (sendmail)
| 1 43 85 0x00000080 SQ 0x23c16e0 0 -1 48483584
/- 17386 (tee)
| 1 43 85 0x00000080 SQ 0x1fd25c0 0 -1 48483595
/- 3380 (sh)
| 1 43 85 0x00000080 SQ 0x3c3e0c0 0 -1 48483578
/- 12519 (sh)
| 1 43 85 0x00000080 SQ 0x2be7480 0 -1 48483638
/- 19166 (cron)
| 1 43 85 0x00000080 SQ 0x38ee940 0 -1 48483639
/- 11696 (nfsd)
| 5 43 117 0x00000000 SQ 0x3b5e0e0 0 -1 31202903
| 4 43 117 0x00000000 SQ 0x38ee3c0 0 -1 39836233
| 3 43 117 0x00000000 SQ 0x23c1c60 0 -1 39843211
| 2 43 117 0x00000000 SQ 0xc0b1a0 0 -1 39829343
| 1 43 85 0x00000080 SQ 0x3b5e3a0 0 -1 52772301
/- 15768 (rpc.lockd)
| 1 43 85 0x00000080 SQ 0x31c6c20 0 -1 352
/- 7473 (rpc.statd)
| 1 43 85 0x00000080 SQ 0x1fd2040 0 -1 1417
/- 2898 (dhcpd)
| 1 43 85 0x00000080 SQ 0x38ee100 0 -1 51632865
/- 134 (getty)
| 1 43 85 0x00000000 SQ 0x3edc0a0 0 -1 189797
/- 700 (httpd)
| 1 38 83 0x00000080 SQ 0x2c05980 0 -1 2825662762
/- 719 (httpd)
| 1 38 83 0x00000080 SQ 0x3c3e380 0 -1 2825662763
/- 725 (httpd)
| 1 38 83 0x00000080 SQ 0x3c3e900 0 -1 2825662765
/- 899 (httpd)
| 1 38 83 0x00000080 SQ 0x3c3e640 0 -1 2825662761
/- 894 (httpd)
| 1 38 83 0x00000080 SQ 0x3c3ebc0 0 -1 2825662766
/- 651 (httpd)
| 1 43 85 0x00000080 SQ 0xc0b720 0 -1 75
/- 660 (mysqld)
| 11 40 84 0x00000080 SQ 0xc0b9e0 0 -1 2825663109
| 10 40 40 0x00000080 SQ 0x122a700 0 -1 2825663205
| 9 43 85 0x00000080 SQ 0xc0bca0 0 -1 303
| 8 43 85 0x00000080 SQ 0x122a180 0 -1 86
| 7 43 85 0x00000080 SQ 0x122a440 0 -1 57
| 5 40 40 0x00000080 SQ 0x122a9c0 0 -1 2825663375
| 4 40 40 0x00000080 SQ 0x122ac80 0 -1 2825663313
| 3 40 40 0x00000080 SQ 0x23c1420 0 -1 2825663310
| 2 40 40 0x00000080 SQ 0x23c1160 0 -1 2825663375
| 1 40 84 0x00000080 SQ 0x2c05140 0 -1 2825663247
/- 603 (cron)
| 1 43 85 0x00000000 SQ 0x23c19a0 0 -1 48189700
/- 543 (qmgr)
| 1 43 117 0x01000000 SQ 0x2c05400 0 -1 48183652
/- 562 (inetd)
| 1 43 85 0x00000080 SQ 0x2c05c40 0 -1 121562489
/- 601 (sh)
| 1 41 84 0x00000080 SQ 0x2c056c0 0 -1 2825663535
/- 602 (master)
| 1 43 117 0x01000000 SQ 0x31c63e0 0 -1 48188711
/- 356 (sshd)
| 1 43 85 0x00000000 SQ 0x31c6120 0 -1 233420
/- 221 (mountd)
| 1 43 85 0x00000080 SQ 0x3b5ebe0 0 -1 2488
/- 173 (rpcbind)
| 1 43 85 0x00000000 SQ 0x3b5e920 0 -1 10864375
/- 150 (syslogd)
| 1 43 85 0x00000080 SQ 0x38eec00 0 -1 1740
/- 1 (init)
| 1 43 85 0x00000080 SQ 0x3edd340 0 -1 51497003
/- 0 (system)
| 44 96 96 0x00000200 - 0x2be7740 0 -1 2184306522
| 41 123 123 0x00000200 SQ 0x3b5e660 0 -1 56346222
| 40 125 125 0x00000200 SQ 0x3edc360 0 -1 48210347
| 39 124 221 0x00000200 SQ 0x3eddb80 0 -1 48189743
| 38 126 126 0x00000200 SQ 0x3edc620 0 -1 48453627
| 35 96 96 0x00000200 SQ 0x3f39060 0 -1 4264
| 34 96 96 0x00000200 SQ 0x3f39320 0 -1 809
| 33 96 96 0x00000200 SQ 0x3edd8c0 0 -1 2825668203
| 32 96 96 0x00000200 SQ 0x3edd600 0 -1 2825668203
| 32 96 96 0x00000200 SQ 0x3edd600 0 -1 2825668203
| 31 96 96 0x00000200 SQ 0x3edc8e0 0 -1 809
| 30 96 96 0x00000200 SQ 0x3edd080 0 -1 2825668224
| 29 125 125 0x00000200 SQ 0x3edcba0 0 -1 775
| 19 96 96 0x00000200 SQ 0x3f395e0 0 -1 2825668220
| 18 96 96 0x00000200 SQ 0x3f398a0 0 -1 2825667916
| 17 96 96 0x00000200 SQ 0x3f39b60 0 -1 2825668254
| 16 96 96 0x00000200 SQ 0x3f4e040 0 -1 78
| 15 96 96 0x00000200 SQ 0x3f4e300 0 -1 2825668254
| 14 96 96 0x00000200 SQ 0x3f4e5c0 0 -1 2825668254
| 13 96 96 0x00000200 SQ 0x3f4e880 0 -1 51596332
| 12 96 96 0x00000200 SQ 0x3f4eb40 0 -1 2825668254
| 11 125 125 0x00000200 SQ 0x3f68020 0 -1 78
| 10 125 125 0x00000200 SQ 0x3f682e0 0 -1 78
| 9 125 221 0x00000200 SQ 0x3f685a0 0 -1 48189643
| 8 125 125 0x00000200 SQ 0x3f68860 0 -1 52740406
| 7 127 127 0x00000200 SQ 0x3f68b20 0 -1 52124382
| 6 223 223 0x00000200 - 0x3f6c000 0 -1 2825668254
| 5 220 220 0x00000200 - 0x3f6c2c0 0 -1 52256642
| 4 221 221 0x00000200 SQ 0x3f6c580 0 -1 48189861
| 3 222 222 0x00000200 - 0x3f6c840 0 -1 51649303
| 2 0 0 0x00000201 - 0x3f6cb00 0 -1 2825668254
| 1 125 125 0x00000200 SQ 0x304220 0 -1 78
db> show callout
hardclock_ticks now: -1469299042
ticks wheel arg func
1 0/159 0 pfslowtimo
5 0/163 0 pffasttimo
15 0/173 122a180 sleepq_timeout
22 0/180 3f68020 sleepq_timeout
22 0/180 304220 sleepq_timeout
22 0/180 3f682e0 sleepq_timeout
22 0/180 3f4e040 sleepq_timeout
26 0/184 c0b720 sleepq_timeout
34 0/192 0 if_slowtimo
34 0/192 0 nd6_timer
34 0/192 0 rt_timer_timer
44 0/202 122a440 sleepq_timeout
83 0/241 0 rnd_skew
225 1/327 0 vmem_rehash_all_kick
198 1/327 c0bca0 sleepq_timeout
513 1/328 3b5ebe0 sleepq_timeout
1261 1/331 38eec00 sleepq_timeout
1584 1/332 1fd2040 sleepq_timeout
1736 1/333 3f39060 sleepq_timeout
2649 1/336 31c6c20 sleepq_timeout
2716 1/337 0 arptimer
5191 1/346 3f39320 sleepq_timeout
5191 1/346 3edc8e0 sleepq_timeout
331773 2/625 0 nd6_slowtimo
490604 2/627 223cbd8 tcp_timer_keep
487128 2/627 223cdd0 tcp_timer_keep
480654 2/627 30e9008 tcp_timer_keep
5665767 2/706 0 in6_tmpaddrtimer
db> show uvm
Current UVM status:
pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12, ncolors=1 15062 VM pages: 4367 active, 231 inactive, 2 wired, 4454 free
pages 2025 anon, 231 file, 2353 exec
freemin=75, free-target=100, wired-max=5020
cpu0:
faults=227175182, traps=309712409, intrs=3379844322, ctxswitch=1491617966
softint=1223590842, syscalls=10299863570
fault counts:
noram=33, noanon=0, pgwait=1653, pgrele=0
ok relocks(total)=48976745(48976768), anget(retrys)=16227774(2146116), amapcopy=12790303
neighbor anon/obj pg=14312066/159921221, gets(lock/unlock)=93450491/46830654
cases: anon=9920567, anoncow=6150518, obj=79370017, prcopy=14080449, przero=44485986
daemon and swap counts:
woke=403132, revs=401151, scans=235944725, obscans=131324495, anscans=2648851
busy=697292, freed=133973344, reactivate=10779015, deactivate=238086308
pageouts=126077, pending=1788090, nswget=2147440
nswapdev=1, swpgavail=262205
swpages=262205, swpginuse=5126, swpgonly=4517, paging=0
Home |
Main Index |
Thread Index |
Old Index