Re: Hang when starting sshd on MP machine

To: Martin Husemann <martin%duskware.de@localhost>
Subject: Re: Hang when starting sshd on MP machine
From: Andrew Doran <ad%netbsd.org@localhost>
Date: Sat, 22 Mar 2008 14:25:17 +0000

On Sat, Mar 22, 2008 at 01:38:44PM +0100, Martin Husemann wrote:

> With a SMP kernel from some minutes ago, I get a hang when starting sshd.
> 
> ...
> Starting local daemons:.
> Updating motd.
> Starting ntpd.
> Starting sshd.
> [machine hangs here - I break in ddb:]
> Stopped in pid 220.1 (ntpd) at  netbsd:cpu_Debugger+0x4:        nop
> db{0}> bt                                                           
> zsc_intr_hard(1, 11d2350, 1203c00, 1237400, ca1c8c0, badcafe) at 
> netbsd:zsc_intr
> _hard+0xfc                                                                    
>  
> zshard(24f1000, 5a1baab0d, 5a1e7c05e, 8000000000000000, 5a1e7c05e, 50) at 
> netbsd
> :zshard+0x18                                                                  
>  
> intr_list_handler(0, a, e0017ed0, 1, 11d35c0, 0) at 
> netbsd:intr_list_handler+0x1
> 4                                                                             
>  
> sparc_interrupt(0, 1203800, 0, ca1c8c0, ca4f090, badcafe) at 
> netbsd:sparc_interr
> upt+0x29c                                                                     
>  
> mutex_vector_enter(ca4f090, 1232238, 5d5, 1232270, ca1c8c0, ca1c8c0) at 
> netbsd:m
> utex_vector_enter+0xa0                                                        
>  
> knote_activate(ca4f090, 0, 14892f8, 0, 0, 0) at netbsd:knote_activate+0x38
> knote(c93ec90, 0, 0, 2502138, 0, 0) at netbsd:knote+0x44                  
> sowakeup(27edba0, 27edc90, 1, 2502200, 30, badcafe) at netbsd:sowakeup+0x18
> unp_output(0, 0, 1204598, 27edba0, badcafe, badcafe) at netbsd:unp_output+0xb0
> uipc_usrreq(27ed860, 9, ca1c8c0, 0, 0, 39) at netbsd:uipc_usrreq+0x4f4        
> sosend(0, 0, cbe5bb8, 3b, 0, 0) at netbsd:sosend+0x470                
> do_sys_sendmsg(3b, 3, 0, 0, cbe5e00, badcafe) at netbsd:do_sys_sendmsg+0x31c
> sys_sendto(ca1c8c0, cbe5dc0, cbe5e00, 19, 1471000, ca1c8c0) at 
> netbsd:sys_sendto
> +0x4c                                                                         
>  
> syscall_plain(cbe5ed0, 6, 40e3a2ec, 19, 40e3a2ec, 800) at 
> netbsd:syscall_plain+0
> x2d4                                                                          
>  
> ?(3, ffffffffffff9f60, 3b, 0, 0, 0) at 0x10093e4
> db{0}> mach cpu 1                               
> db{1}> bt        
> sys_kevent(b972000, c9d7dc0, c9d7e00, ffffffffffffb160, 409f6d18, 4) at 
> netbsd:s
> ys_kevent+0x2c                                                                
>  
> syscall_plain(c9d7ed0, 6, 4093b3fc, 9, 4093b3fc, 800) at 
> netbsd:syscall_plain+0x
> 138                                                                           
>  
> ?(5, 0, 0, ffffffffffffb9a0, 10, 0) at 0x10093e4
> db{0}> ps/w                                                                   
>  
>  PID        LID          COMMAND     EMUL  PRI WAIT-MSG    WAIT-CHANNEL
>  234          1               sh   netbsd   41 biowait      27e25d8    
>  249          1               sh   netbsd   41 wait         b95d7d8
> >220          1             ntpd   netbsd   41              0      
>  111          1          syslogd   netbsd   43              0
>  85           1         dhclient   netbsd   43 select       1489300
>  2            1               sh   netbsd   41 wait         b95da78
>  1            1             init   netbsd   43 wait         b95dd18
>  0           24           system   netbsd  123 physiod      bf8e308
>  0           23           system   netbsd  125 vmem_rehash  bf8e208
>  0           22           system   netbsd  125 aiodoned     bf8e148
>  0           21           system   netbsd  124 syncer       1452a30
>  0           20           system   netbsd  126 pgdaemon     146ffc8
>  0           19           system   netbsd   96 raidiow      27790f8
>  0           18           system   netbsd   96 rfwcond      27791b8
>  0           17           system   netbsd   96 nellevt      2755918
>  0           16           system   netbsd   96 sccomp       274e650
>  0           15           system   netbsd  127 xcall        c038310
>  0           14           system   netbsd  223              0      
>  0           13           system   netbsd  220              0
>  0           12           system   netbsd  221              0
>  0           11           system   netbsd  222              0
>  0           10           system   netbsd    0              0
>  0            9           system   netbsd   96 pmfevent     bf8e048
>  0            8           system   netbsd  125 vrele        14529e0
>  0            7           system   netbsd  127 xcall        1814310
>  0            6           system   netbsd  223              0      
>  0            5           system   netbsd  220              0
>  0            4           system   netbsd  221              0
>  0            3           system   netbsd  222              0
>  0            2           system   netbsd    0              0
>  0            1           system   netbsd  125 schedule     1470008
> 
> cpu 1 is running syslogd

It's a deadlock involving kernel_lock. Haven't figured out what the problem
is yet, but I've put kernel_lock back around the kqueue stuff which should
sort it for the time being.

Thanks,
Andrew

Follow-Ups:
- Re: Hang when starting sshd on MP machine
  - From: Chris Ross

References:
- Hang when starting sshd on MP machine
  - From: Martin Husemann

Prev by Date: Re: 4.99.56 build and named
Next by Date: Re: Hang when starting sshd on MP machine
Previous by Thread: Re: Hang when starting sshd on MP machine
Next by Thread: Re: Hang when starting sshd on MP machine
Indexes:

Home | Main Index | Thread Index | Old Index