tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

frequent 5.0_RC2 lockups cvs co / upd'ing to RF volume on SunFire v100



I have a RF mirror over 2 IDE disks on my V100; I've moved a bunch of data
off that mirror to an external NAS box, so I've got a lot of space I'd like
to consolidate all various source trees to.

However, this has been very much less than succesfull; I've now hung the
machine four or five times attempting to check out a fresh 'src' or 
'pkgsrc' from cvs.netbsd.org.  Each of these times, the machine was
still responsive enough to get into ddb(4) but didn't answer pings
and dropped all network connections.

Some interesting tidbits that might be related:
        * In at least 2 or 3 of the hangs, RF parity rebuild was likely
          running.
        * In almost all the crashes, the disks had downgraded to PIO 
          mode 4 from UDMA.  This machine has a crappy IDE controller
          that throws lots and lots of recoverable errors when both
          channels of the IDE interface are running; I haven't put
          my hack to fix this into NetBSD 5.0 sources yet.  The down-
          grade to PIO mode 4 was most likely caused by the error
          spew.

Per Greg's suggestion, I'll try some parallel dd tests to the disk without
RaidFrame in the picture to see how the machine holds up, any other clues
on what to look for?

Thanks,
--rafal

Here's some ddb info from when the machine hung:

wd1: transfer error, downgrading to PIO mode 4
wd1d: DMA error writing fsbn 107152720 of 107152720-107152723 (wd1 bn 
123928864; cn 122945 tn 4 sn 52), retrying
wd1: soft error (corrected)
Stopped in pid 0.3 (system) at  netbsd:cpu_Debugger+0x4:        nop
db> bt
intr_list_handler(36ed380, 7, e0017c30, 8000000000000000, 14245e0, 1897e30) at n
etbsd:intr_list_handler+0x10
sparc_interrupt(d4a64b0, 7, e0017cf0, 8000000000000000, 1423400, d450930) at net
bsd:sparc_interrupt+0x1dc
sparc_interrupt(d4a6000, 0, e0017ed0, 3c, 13a1e00, 0) at netbsd:sparc_interrupt+
0x1dc
sparc_interrupt(d4a64b0, 7, e0017cf0, 8000000000000000, 1423400, d450930) at net
bsd:sparc_interrupt+0x1dc
sparc_interrupt(4a8eaa0, 0, 0, c000000, 2e, 1) at netbsd:sparc_interrupt+0x1dc
in_arpinput(4a8eaa0, 1c14000, e0017ed0, a2, 13a1e00, 0) at netbsd:in_arpinput+0x
18
arpintr(4a8eaa0, c94f740, c94f740, 6, 1, de) at netbsd:arpintr+0x188
softint_thread(c90c230, c94f740, 0, c903aa6, 1c05d18, c94ebf0) at netbsd:softint
_thread+0x68
lwp_trampoline(f0061134, 0, 10fc00, fffc5d00, 10e9d0, fffc5e00) at netbsd:lwp_tr
ampoline+0x8
db> ps/l
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
4657     1 3   0        84            e131420              fping select
8326     1 3   0        84           13a1b3e0             pickup kqueue
1723     1 3   0        84            e4de800               tail kqueue
5303     1 2   0         4            e59a060                ssh
6796     1 3   0         4           13a1a460                cvs biowait
2260     1 3   0        84            e59bb80         imap-login kqueue
5543     1 3   0        84            e4de040       screen-4.0.3 pause
5708     1 3   0        84            e59afe0                ksh pause
5417     1 3   0        84            e59a440               sshd select
2116     1 3   0        84            e4df3a0               sshd netio
3691     1 3   0        84           13a1ac20                top select
5331     1 3   0        84           13a1a840                ksh pause
4814     1 2   0         4            c978440               sshd
4294     1 3   0        84           13a1a080               sshd netio
2120     1 3   0        84            c978820              getty ttyraw
635      1 3   0        84            e4defc0                ksh pause
377      1 3   0        84            e59a820       screen-4.0.3 select
741      1 3   0        84            e4dfb60         imap-login kqueue
707      1 3   0        84            e41c400         imap-login kqueue
98       1 3   0        84            e41c7e0         pop3-login kqueue
97       1 3   0        84            e41cbc0         pop3-login kqueue
96db_more1-3   0        84            e41cfa0         pop3-login kqueue
775      1 3   0        84            e41d380       dovecot-auth kqueue
789      1 3   0        84            e1300c0               qmgr kqueue
484    132 5   0         4            e4de420            apcupsd
               2 3   0        84            e4df780            apcupsd socket
               1 2   0         4            e41d760            apcupsd
750      1 3   0        84            e2fcba0               cron nanoslp
495      1 3   0        84            e41db40            dovecot kqueue
645      1 3   0        84            e2fc7c0              httpd select
684      1 3   0        84            e2fc000              inetd kqueue
572      1 3   0        84            e2fcf80             master kqueue
466      1 3   0        84            e2fc3e0               smbd pause
422      1 3   0        84            e1304a0               smbd select
406      1 2   0         4            dfbc480               nmbd
393      1 3   0        84            e130c60               perl piperd
347      1 3   0        84            e131800               sshd select
296      1 3   0        84            e130880               sdpd select
285      1 2   0   1000004            e131040               ntpd
248      1 3   0        84            e131be0               perl nanoslp
244      1 3   0        84            dfbc0a0             bthcid kqueue
240      1 2   0         4            dfbc860           arpwatch
205      5 2   0         4            dfbcc40              slave
               4 2   0         4            dfbd020              slave
--db_more--    3 2   0         4            dfbd400              slave
               2 2   0         4            dd7fba0              slave
               1 3   0        84            dfbd7e0             master select
203      1 3   0        84            dd7ec20             mountd select
151      1 3   0        84            dd7f000            rpcbind select
113      1 3   0        84            dd7f3e0            syslogd kqueue
89       1 3   0        84            c978060           dhclient select
1        1 3   0        84            c964be0               init wait
0       46 5   0       204            e2fdb20           (zombie)
              44 3   0       204            dfbdbc0              nfsio nfsiod
              43 3   0       204            dd7e080              nfsio nfsiod
              42 3   0       204            dd7e460              nfsio nfsiod
              41 3   0       204            dd7e840              nfsio nfsiod
              40 3   0       204            dd7f7c0            physiod physiod
              39 3   0       204            c978c00        vmem_rehash vmem_reha
sh
              38 3   0       204            c978fe0           aiodoned aiodoned
              37 3   0       204            c9793c0            ioflush syncer
              36 3   0       204            c9797a0           pgdaemon pgdaemon
              35 2   0       204            c979b80            raidio1
              34 3   0       204            c964040              raid1 rfwcond
              33 3   0       204            c964420            raidio0 raidiow
              32 3   0       204            c954020              raid0 rfwcond
--db_more--   31 3   0       204            c954400          cryptoret crypto_wa
it
              30 3   0       204            c965b60          atapibus0 sccomp
              27 3   0       204            c965780         usbtask-dr usbtsk
              26 3   0       204            c9653a0         usbtask-hc usbtsk
              25 3   0       204            c964fc0               usb0 usbevt
              24 3   0       204            c964800              unpgc unpgc
              15 3   0       204            c9547e0            atabus1 atath
              14 3   0       204            c954bc0            atabus0 atath
              13 3   0       204            c954fa0               iic0 iicintr
              12 3   0       204            c955380             sysmon smtaskq
              11 3   0       204            c955760           pmfevent pmfevent
              10 3   0       204            c955b40           nfssilly nfssilly
               9 3   0       204            c94e000            cachegc cachegc
               8 3   0       204            c94e3e0              vrele vrele
               7 3   0       204            c94e7c0            xcall/0 xcall
               6 1   0       204            c94eba0          softser/0
               5 1   0       204            c94ef80          softclk/0
               4 1   0       204            c94f360          softbio/0
           >   3 7   0       204            c94f740          softnet/0
               2 1   0       205            c94fb20             idle/0
               1 3   0       204            181b9a0            swapper schedule

Another hang (with slightly less info gathered):

wd1: transfer error, downgrading to PIO mode 4
wd1d: DMA error writing fsbn 101127400 of 101127400-101127403 (wd1 bn 
117903544; cn 116967 tn 12 sn 52), retrying
wd1: soft error (corrected)
Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:        nop
db> tr
intr_list_handler(36eb380, 7, e0017cf0, 8000000000000000, 141c7c0, 181c400) at n
etbsd:intr_list_handler+0x10
sparc_interrupt(d4a2000, 0, e0017ed0, 8000000000000000, 1399f80, c90fa40) at net
bsd:sparc_interrupt+0x1dc
sparc_interrupt(0, 0, e0017ed0, 8000000000000000, 1399f80, 0) at netbsd:sparc_in
terrupt+0x1dc
sched_lwp_stats(c947f70, 0, c94ef80, 0, 1399f80, 49c1a0ea) at netbsd:sched_lwp_s
tats+0x114
sched_pstats(e517cc0, e50c7e0, 0, d44c000, 1399f80, c94f3b0) at netbsd:sched_pst
ats+0xf8
callout_softclock(1896be8, c947f80, 3, 6, 10e9d0, 18b6800) at netbsd:callout_sof
tclock+0x170
softint_thread(c90c0c0, c94ef80, 0, 1fff, 18c1800, c94f3b0) at netbsd:softint_th
read+0x64
lwp_trampoline(f0061134, 0, 10fc00, fffc5d00, 10e9d0, fffc5e00) at netbsd:lwp_tr
ampoline+0x8
db> reboot
syncing disks... Mutex error: mutex_vector_enter: locking against myself

lock address : 0x000000000c947fa0
current cpu  :                  0
current lwp  : 0x000000000c94ef80
owner field  : 0x000000000c94ef80 wait/spin:                0/0

panic: lock error
Stopped in pid 0.5 (system) at  netbsd:cpu_Debugger+0x4:        nop
db> reboot
rebooting

-- 
  Time is an illusion; lunchtime, doubly so.     |/\/\|           Rafal Boni
                   -- Ford Prefect               |\/\/|      
rafal%pobox.com@localhost


Home | Main Index | Thread Index | Old Index