tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
frequent 5.0_RC2 lockups cvs co / upd'ing to RF volume on SunFire v100
I have a RF mirror over 2 IDE disks on my V100; I've moved a bunch of data
off that mirror to an external NAS box, so I've got a lot of space I'd like
to consolidate all various source trees to.
However, this has been very much less than succesfull; I've now hung the
machine four or five times attempting to check out a fresh 'src' or
'pkgsrc' from cvs.netbsd.org. Each of these times, the machine was
still responsive enough to get into ddb(4) but didn't answer pings
and dropped all network connections.
Some interesting tidbits that might be related:
* In at least 2 or 3 of the hangs, RF parity rebuild was likely
running.
* In almost all the crashes, the disks had downgraded to PIO
mode 4 from UDMA. This machine has a crappy IDE controller
that throws lots and lots of recoverable errors when both
channels of the IDE interface are running; I haven't put
my hack to fix this into NetBSD 5.0 sources yet. The down-
grade to PIO mode 4 was most likely caused by the error
spew.
Per Greg's suggestion, I'll try some parallel dd tests to the disk without
RaidFrame in the picture to see how the machine holds up, any other clues
on what to look for?
Thanks,
--rafal
Here's some ddb info from when the machine hung:
wd1: transfer error, downgrading to PIO mode 4
wd1d: DMA error writing fsbn 107152720 of 107152720-107152723 (wd1 bn
123928864; cn 122945 tn 4 sn 52), retrying
wd1: soft error (corrected)
Stopped in pid 0.3 (system) at netbsd:cpu_Debugger+0x4: nop
db> bt
intr_list_handler(36ed380, 7, e0017c30, 8000000000000000, 14245e0, 1897e30) at n
etbsd:intr_list_handler+0x10
sparc_interrupt(d4a64b0, 7, e0017cf0, 8000000000000000, 1423400, d450930) at net
bsd:sparc_interrupt+0x1dc
sparc_interrupt(d4a6000, 0, e0017ed0, 3c, 13a1e00, 0) at netbsd:sparc_interrupt+
0x1dc
sparc_interrupt(d4a64b0, 7, e0017cf0, 8000000000000000, 1423400, d450930) at net
bsd:sparc_interrupt+0x1dc
sparc_interrupt(4a8eaa0, 0, 0, c000000, 2e, 1) at netbsd:sparc_interrupt+0x1dc
in_arpinput(4a8eaa0, 1c14000, e0017ed0, a2, 13a1e00, 0) at netbsd:in_arpinput+0x
18
arpintr(4a8eaa0, c94f740, c94f740, 6, 1, de) at netbsd:arpintr+0x188
softint_thread(c90c230, c94f740, 0, c903aa6, 1c05d18, c94ebf0) at netbsd:softint
_thread+0x68
lwp_trampoline(f0061134, 0, 10fc00, fffc5d00, 10e9d0, fffc5e00) at netbsd:lwp_tr
ampoline+0x8
db> ps/l
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
4657 1 3 0 84 e131420 fping select
8326 1 3 0 84 13a1b3e0 pickup kqueue
1723 1 3 0 84 e4de800 tail kqueue
5303 1 2 0 4 e59a060 ssh
6796 1 3 0 4 13a1a460 cvs biowait
2260 1 3 0 84 e59bb80 imap-login kqueue
5543 1 3 0 84 e4de040 screen-4.0.3 pause
5708 1 3 0 84 e59afe0 ksh pause
5417 1 3 0 84 e59a440 sshd select
2116 1 3 0 84 e4df3a0 sshd netio
3691 1 3 0 84 13a1ac20 top select
5331 1 3 0 84 13a1a840 ksh pause
4814 1 2 0 4 c978440 sshd
4294 1 3 0 84 13a1a080 sshd netio
2120 1 3 0 84 c978820 getty ttyraw
635 1 3 0 84 e4defc0 ksh pause
377 1 3 0 84 e59a820 screen-4.0.3 select
741 1 3 0 84 e4dfb60 imap-login kqueue
707 1 3 0 84 e41c400 imap-login kqueue
98 1 3 0 84 e41c7e0 pop3-login kqueue
97 1 3 0 84 e41cbc0 pop3-login kqueue
96db_more1-3 0 84 e41cfa0 pop3-login kqueue
775 1 3 0 84 e41d380 dovecot-auth kqueue
789 1 3 0 84 e1300c0 qmgr kqueue
484 132 5 0 4 e4de420 apcupsd
2 3 0 84 e4df780 apcupsd socket
1 2 0 4 e41d760 apcupsd
750 1 3 0 84 e2fcba0 cron nanoslp
495 1 3 0 84 e41db40 dovecot kqueue
645 1 3 0 84 e2fc7c0 httpd select
684 1 3 0 84 e2fc000 inetd kqueue
572 1 3 0 84 e2fcf80 master kqueue
466 1 3 0 84 e2fc3e0 smbd pause
422 1 3 0 84 e1304a0 smbd select
406 1 2 0 4 dfbc480 nmbd
393 1 3 0 84 e130c60 perl piperd
347 1 3 0 84 e131800 sshd select
296 1 3 0 84 e130880 sdpd select
285 1 2 0 1000004 e131040 ntpd
248 1 3 0 84 e131be0 perl nanoslp
244 1 3 0 84 dfbc0a0 bthcid kqueue
240 1 2 0 4 dfbc860 arpwatch
205 5 2 0 4 dfbcc40 slave
4 2 0 4 dfbd020 slave
--db_more-- 3 2 0 4 dfbd400 slave
2 2 0 4 dd7fba0 slave
1 3 0 84 dfbd7e0 master select
203 1 3 0 84 dd7ec20 mountd select
151 1 3 0 84 dd7f000 rpcbind select
113 1 3 0 84 dd7f3e0 syslogd kqueue
89 1 3 0 84 c978060 dhclient select
1 1 3 0 84 c964be0 init wait
0 46 5 0 204 e2fdb20 (zombie)
44 3 0 204 dfbdbc0 nfsio nfsiod
43 3 0 204 dd7e080 nfsio nfsiod
42 3 0 204 dd7e460 nfsio nfsiod
41 3 0 204 dd7e840 nfsio nfsiod
40 3 0 204 dd7f7c0 physiod physiod
39 3 0 204 c978c00 vmem_rehash vmem_reha
sh
38 3 0 204 c978fe0 aiodoned aiodoned
37 3 0 204 c9793c0 ioflush syncer
36 3 0 204 c9797a0 pgdaemon pgdaemon
35 2 0 204 c979b80 raidio1
34 3 0 204 c964040 raid1 rfwcond
33 3 0 204 c964420 raidio0 raidiow
32 3 0 204 c954020 raid0 rfwcond
--db_more-- 31 3 0 204 c954400 cryptoret crypto_wa
it
30 3 0 204 c965b60 atapibus0 sccomp
27 3 0 204 c965780 usbtask-dr usbtsk
26 3 0 204 c9653a0 usbtask-hc usbtsk
25 3 0 204 c964fc0 usb0 usbevt
24 3 0 204 c964800 unpgc unpgc
15 3 0 204 c9547e0 atabus1 atath
14 3 0 204 c954bc0 atabus0 atath
13 3 0 204 c954fa0 iic0 iicintr
12 3 0 204 c955380 sysmon smtaskq
11 3 0 204 c955760 pmfevent pmfevent
10 3 0 204 c955b40 nfssilly nfssilly
9 3 0 204 c94e000 cachegc cachegc
8 3 0 204 c94e3e0 vrele vrele
7 3 0 204 c94e7c0 xcall/0 xcall
6 1 0 204 c94eba0 softser/0
5 1 0 204 c94ef80 softclk/0
4 1 0 204 c94f360 softbio/0
> 3 7 0 204 c94f740 softnet/0
2 1 0 205 c94fb20 idle/0
1 3 0 204 181b9a0 swapper schedule
Another hang (with slightly less info gathered):
wd1: transfer error, downgrading to PIO mode 4
wd1d: DMA error writing fsbn 101127400 of 101127400-101127403 (wd1 bn
117903544; cn 116967 tn 12 sn 52), retrying
wd1: soft error (corrected)
Stopped in pid 0.5 (system) at netbsd:cpu_Debugger+0x4: nop
db> tr
intr_list_handler(36eb380, 7, e0017cf0, 8000000000000000, 141c7c0, 181c400) at n
etbsd:intr_list_handler+0x10
sparc_interrupt(d4a2000, 0, e0017ed0, 8000000000000000, 1399f80, c90fa40) at net
bsd:sparc_interrupt+0x1dc
sparc_interrupt(0, 0, e0017ed0, 8000000000000000, 1399f80, 0) at netbsd:sparc_in
terrupt+0x1dc
sched_lwp_stats(c947f70, 0, c94ef80, 0, 1399f80, 49c1a0ea) at netbsd:sched_lwp_s
tats+0x114
sched_pstats(e517cc0, e50c7e0, 0, d44c000, 1399f80, c94f3b0) at netbsd:sched_pst
ats+0xf8
callout_softclock(1896be8, c947f80, 3, 6, 10e9d0, 18b6800) at netbsd:callout_sof
tclock+0x170
softint_thread(c90c0c0, c94ef80, 0, 1fff, 18c1800, c94f3b0) at netbsd:softint_th
read+0x64
lwp_trampoline(f0061134, 0, 10fc00, fffc5d00, 10e9d0, fffc5e00) at netbsd:lwp_tr
ampoline+0x8
db> reboot
syncing disks... Mutex error: mutex_vector_enter: locking against myself
lock address : 0x000000000c947fa0
current cpu : 0
current lwp : 0x000000000c94ef80
owner field : 0x000000000c94ef80 wait/spin: 0/0
panic: lock error
Stopped in pid 0.5 (system) at netbsd:cpu_Debugger+0x4: nop
db> reboot
rebooting
--
Time is an illusion; lunchtime, doubly so. |/\/\| Rafal Boni
-- Ford Prefect |\/\/|
rafal%pobox.com@localhost
Home |
Main Index |
Thread Index |
Old Index