Re: unresponsive domu

To: Sarton O'Brien <bsd-xen%roguewrt.org@localhost>
Subject: Re: unresponsive domu
From: Andrew Doran <ad%netbsd.org@localhost>
Date: Thu, 24 Apr 2008 00:54:43 +0100
On Thu, Apr 24, 2008 at 09:49:05AM +1000, Sarton O'Brien wrote:

> This has happened before but I thought I'd update kernel and userland to see 
> if it happened again ... and it has.
> 
> I really don't know what I'm doing, so here goes:
> 
> gogeta# xm console spike
> 
> Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x4:  popl    %ebp
> db> bt
> breakpoint(65c87,0,caa75f78,11,1) at netbsd:breakpoint+0x4
> xencons_tty_input(c0ea1ef0,c05be011,1,c03a1afd,3b9aca00) at 
> netbsd:xencons_tty_i
> nput+0xa6
> xencons_handler(c0ea1ef0,caa3ac90,0,caa75fec,20) at 
> netbsd:xencons_handler+0x5f
> evtchn_do_event(2,caa3ac90,caa3ac48,0,0) at netbsd:evtchn_do_event+0xbd
> --- switch to interrupt stack ---
> call_evtchn_do_event(caa3ac90,0,11,ca000031,c0100011) at 
> netbsd:call_evtchn_do_e
> vent+0x1e
> hypervisor_callback(ca27dbb4,1,c02d6489,0,0) at 
> netbsd:hypervisor_callback+0x65
> idle_loop(ca286d80,0,c010006b,c0100063,c010006b) at netbsd:idle_loop+0xf9
> db> ps/l
>  PID         LID S     FLAGS       STRUCT LWP *               NAME WAIT
>  6248          1 3         4           cd1f7000             master tstile
>  18497         1 3         4           cd1f7d80               find vnode
>  22163         1 3        84           cda8e260           postdrop netio
>  13874         1 3        84           cda8e020           sendmail piperd
>  29998         1 3        84           cda8eb60                tee piperd
>  4904          1 3        84           cc71c260                 sh wait
>  16457         1 3        84           cc609000                 sh wait
>  6994          1 3        84           cc71c6e0               cron piperd
>  16586         1 3        84           cda8e4a0                ssh select
>  8291          1 3        80           cd1f7b40                cvs piperd
>  3980          1 3         0           cda8eda0                cvs vnode
>  12492         1 3        80           cda8e920                 sh wait
>  4448          1 3        80           cc609b40               cron piperd
>  664           1 3        80           ca2894a0               smbd pause
>  647           1 3         4           ca289260              getty tstile
>  628           1 3         4           cc609480               cron tstile
>  496           1 3        84           cc71c920           postgres select
>  533           1 3        84           cc71cda0           postgres select
>  383           1 3         4           cc71cb60           postgres tstile
>  510           1 3        84           cc609d80               qmgr kqueue
>  530           1 3        84           cc6096c0             master kqueue
>  328           8 3        84           cc71c020              slapd parked
>                7 3        80           cc71c4a0              slapd parked
>                6 3        80           ce131000              slapd parked
>                5 3        84           cda8e6e0              slapd parked
>                4 3        84           cc609900              slapd parked
>                3 3        84           cc609240              slapd parked
>                2 3        84           cc4ceb60              slapd select
>                1 3        80           cc4ce020              slapd parked
>  325           1 3         4           cc4ce260               smbd tstile
>  313           1 3         4           cc4ce4a0               nmbd tstile
>  300           1 3        80           cc4ce6e0               sshd select
>  230           1 3        80           ca2896e0             powerd kqueue
>  271           1 3   1000004           cc4ce920               ntpd tstile
>  214           1 3        84           cc4ceda0          rpc.lockd select
>  209           1 3        84           cb696000          rpc.statd select
>  206           5 3        84           cb696240              slave nfsd
>                4 3        84           cb696480              slave nfsd
>                3 3        84           cb6966c0              slave nfsd
>                2 3        84           cb696900              slave nfsd
>                1 3        80           cb696b40             master select
>  187           1 3        84           cb445260             mountd select
>  156           1 3        84           cb696d80       lfs_cleanerd segment
>  151           1 3        84           cb4454a0      rpc.yppasswdd select
>  141           1 3         4           cb4456e0             ypbind tstile
>  137           1 3         4           cb445920             ypserv tstile
>  131           1 3         4           cb445b60            rpcbind tstile
>  106           1 3        84           ca28e000            syslogd kqueue
>  1             1 3        84           ca28e900               init wait
> >0            28 3       204           cb445020         lfs_writer lfswriter
>               27 3       204           cb445da0            physiod physiod
>               26 3       204           ca289020        vmem_rehash vmem_rehash
>               25 3       204           ca28ed80           aiodoned aiodoned
>               24 3       204           ca28eb40            ioflush syncer
>               23 3       204           ca28e6c0           pgdaemon pgdaemon
>               22 3       204           ca289920          cryptoret crypto_wait
>               21 3       204           ca28e240             xenbus rdst
>               20 3       204           ca28e480           xenwatch evtsq
>               10 3       204           ca289b60           pmfevent pmfevent
>                9 3       204           ca289da0            cachegc cachegc
>                8 3       204           ca286000              vrele vrele
>                7 3       204           ca286240            xcall/0 xcall
>                6 1       204           ca286480          softser/0
>                5 1       204           ca2866c0          softclk/0
>                4 1       204           ca286900          softbio/0
>                3 1       204           ca286b40          softnet/0
>            >   2 7  20000205           ca286d80             idle/0
>                1 3       204           c044b240            swapper schedule
> db> ps
>  PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    
> WAIT
>  6248           530      530          0 2   0x100    1           master  
> tstile
>  18497         4904    16457          0 2  0x4000    1             find   
> vnode
>  22163        13874    16457          0 2  0x4100    1         postdrop   
> netio
>  13874        16457    16457          0 2  0x4000    1         sendmail  
> piperd
>  29998        16457    16457          0 2  0x4000    1              tee  
> piperd
>  4904         16457    16457          0 2  0x4000    1               sh    
> wait
>  16457         6994    16457          0 2  0x4000    1               sh    
> wait
>  6994           628      628          0 2       0    1             cron  
> piperd
>  16586         8291    12492          0 2  0x4000    1              ssh  
> select
>  8291          3980    12492          0 2       0    1              cvs  
> piperd
>  3980         12492    12492          0 2  0x4000    1              cvs   
> vnode
>  12492         4448    12492          0 2  0x4000    1               sh    
> wait
>  4448           628      628          0 2       0    1             cron  
> piperd
>  664            325      325          0 2   0x101    1             smbd   
> pause
>  647              1      647          0 2  0x4000    1            getty  
> tstile
>  628              1      628          0 2       0    1             cron  
> tstile
>  496            383      496       1003 2       0    1         postgres  
> select
>  533            383      533       1003 2       0    1         postgres  
> select
>  383              1        2       1003 2  0x4000    1         postgres  
> tstile
>  510            530      530         12 2  0x4100    1             qmgr  
> kqueue
>  530              1      530          0 2  0x4100    1           master  
> kqueue
>  328              1      328       1001 2   0x101    8            slapd       
> *
>  325              1      325          0 2   0x101    1             smbd  
> tstile
>  313              1      313          0 2     0x1    1             nmbd  
> tstile
>  300              1      300          0 2       0    1             sshd  
> select
>  230              1      230          0 2       0    1           powerd  
> kqueue
>  271              1      271          0 2       0    1             ntpd  
> tstile
>  214              1      214          0 2       0    1        rpc.lockd  
> select
>  209              1      209          0 2 0xa0000    1        rpc.statd  
> select
>  206              1      206          0 2       0    5             nfsd       
> *
>  187              1      187          0 2       0    1           mountd  
> select
>  156              1      156          0 2       0    1     lfs_cleanerd 
> segment
>  151              1      151          0 2       0    1    rpc.yppasswdd  
> select
>  141              1      141          0 2       0    1           ypbind  
> tstile
>  137              1      137          0 2 0xa0000    1           ypserv  
> tstile
>  131              1      131          0 2       0    1          rpcbind  
> tstile
>  106              1      106          0 2       0    1          syslogd  
> kqueue
>  1                0        1          0 2  0x4001    1             init    
> wait
> >0               -1        0          0 2 0x20002   19           system       
> *
> db> reboot
> syncing disks... panic: assert_sleepable: interrupt caller=0xc0348fa1
> Stopped in pid 0.2 (system) at  netbsd:breakpoint+0x4:  popl    %ebp
> 
> I don't know how to reproduce this, it just happens. If there are more 
> commands I can issue to assist ... please let me know. Is there a basic set 
> of commands or procedure to follow when dropping to ddb. Sorry for my 
> naivety.

It's a file system / disk I/O problem. What is the domain's exact file
system configuration?

Andrew
Follow-Ups:
- Re: unresponsive domu
  - From: Sarton O'Brien
- Re: unresponsive domu
  - From: Andrew Doran
References:
- unresponsive domu
  - From: Sarton O'Brien
Prev by Date: unresponsive domu
Next by Date: Re: unresponsive domu
Previous by Thread: unresponsive domu
Next by Thread: Re: unresponsive domu
Indexes:
Home | Main Index | Thread Index | Old Index