NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/58871: Stuck processes



The following reply was made to PR kern/58871; it has been noted by GNATS.

From: Benny Siegert <bsiegert%gmail.com@localhost>
To: Taylor R Campbell <riastradh%NetBSD.org@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%NetBSD.org@localhost
Subject: Re: kern/58871: Stuck processes
Date: Tue, 10 Dec 2024 20:54:45 +0100

 This happened again, so I did some of the things you said at least:
 
 Am 03.12.24 um 21:03 schrieb Taylor R Campbell:
 > Can you start crash(8) and get output from `ps', `ps/w', and `show all
 > tstiles'?
 
 Crash version 10.0_STABLE, image version 10.0_STABLE.
 Kernel compiled without options LOCKDEBUG.
 Output from a running system is unreliable.
 crash> ps/w
 PID     LID          COMMAND     EMUL  PRI WAIT-MSG         WAIT-CHANNEL
 21931>21931            crash   netbsd   37                  0
 24382 24382               sh   netbsd   36 wait             9262b88c
 15343>15343           screen   netbsd   37                  0
 10543 10543           screen   netbsd   38 pause            92c32c80
 8677   8677               sh   netbsd   35 wait             957b854c
 3770   3770             find   netbsd   26 tstile           92b31f80
 28839 28839         postdrop   netbsd   43 netio            929e5150
 21098 21098         sendmail   netbsd   43 pipe_rd          92d0c814
 13750 13750              tee   netbsd   43 pipe_rd          91d21df4
 7775   7775               sh   netbsd   42 wait             95b6bacc
 24218 24218               sh   netbsd   43 wait             9170ea4c
 28302 28302             cron   netbsd   43 pipe_rd          92d0c8bc
 2374   2374               go   netbsd   40 tstile           95430f80
 8089 > 8089               go   netbsd   25                  0
 2534  17497    pipeline.test   netbsd   43 lwpwait          92a53a54
 2534  28139    pipeline.test   netbsd   43                  0
 24155 24155               sh   netbsd   43                  0
 24215 18770               go   netbsd   43 parked           95594c80
 24215  8412               go   netbsd   43 parked           92bdb3c0
 24215 24883               go   netbsd   43 parked           92c40a00
 24215  8759               go   netbsd   43 parked           9519e940
 24215 19943               go   netbsd   40 pipe_rd          9558aabc
 24215 23618               go   netbsd   43 wait             91ec930c
 24215  9006               go   netbsd   40 parked           92bdb680
 24215  5455               go   netbsd   41 wait             91ec930c
 24215 26737               go   netbsd   43 kqueue           95776eb8
 24215 13345               go   netbsd   43 parked           92d18d00
 24215 11347               go   netbsd   42 parked           9519d900
 24215  7163               go   netbsd   43 parked           92a678c0
 24215 24215               go   netbsd   38 parked           95531040
 4239  12336   result_adapter   netbsd   43 parked           955de1c0
 4239  26804   result_adapter   netbsd   43 kqueue           94980678
 4239   7242   result_adapter   netbsd   43 parked           92b5d900
 4239  10307   result_adapter   netbsd   42 parked           91f28780
 4239  10406   result_adapter   netbsd   43 parked           92b5d380
 4239  20061   result_adapter   netbsd   43 parked           91d153c0
 4239   4430   result_adapter   netbsd   43 parked           925377c0
 4239  19249   result_adapter   netbsd   43 wait             955d40cc
 4239   2513   result_adapter   netbsd   43 parked           951b36c0
 4239   4239   result_adapter   netbsd   42 parked           95538b80
 2274  27130              rdb   netbsd   43 parked           9557ec40
 2274  27888              rdb   netbsd   43                  0
 2274  18470              rdb   netbsd   43 parked           955949c0
 2274   2725              rdb   netbsd   43 parked           9557bc00
 2274  22316              rdb   netbsd   43 parked           95538340
 2274   9005              rdb   netbsd   43 wait             955d484c
 2274   8849              rdb   netbsd   43 parked           9519d0c0
 2274  22878              rdb   netbsd   43 parked           95461180
 2274  15929              rdb   netbsd   43                  0
 2274   2274              rdb   netbsd   43                  0
 19250 13439          bbagent   netbsd   43                  0
 19250  6648          bbagent   netbsd   43                  0
 19250 15191          bbagent   netbsd   43 lwpwait          95b6b0d4
 19250 10220          bbagent   netbsd   43                  0
 20545 20545       python3.11   netbsd   43 wait             957b804c
 3732   1996       python3.11   netbsd   43 lwpwait          9170e554
 3732   3732       python3.11   netbsd   43                  0
 3053   3053            getty   netbsd   39 ttyraw           915e0c28
 3187   3187            getty   netbsd   39 ttyraw           915e0a28
 3497   3497            getty   netbsd   39 ttyraw           915e0828
 2344   2344            login   netbsd   42 wait             9170e2cc
 662     662             cron   netbsd   43 nanoslp          929b1300
 658     658             estd   netbsd   43 nanoslp          92537a80
 
 648     648            inetd   netbsd   40 kqueue           9295dab8
 2892   2979    node_exporter   netbsd   41 parked           929b1b40
 2892    671    node_exporter   netbsd   43 parked           91f284c0
 2892    669    node_exporter   netbsd   43 parked           929b1880
 2892    668    node_exporter   netbsd   43 kqueue           92a09d38
 2892    664    node_exporter   netbsd   43 parked           929b15c0
 2892   2892    node_exporter   netbsd   43 parked           918bb300
 2853   2853             qmgr   netbsd   43 kqueue           9295d1b8
 2859   2859           master   netbsd   43                  0
 333   29950       python3.11   netbsd   43 parked           91808d00
 333     333       python3.11   netbsd   43 wait             91d1a2cc
 411     411             sshd   netbsd   43                  0
 394     394             ntpd   netbsd   43 pause            91f28200
 2358    328   bootstrapswarm   netbsd   43 parked           91f0f480
 2358    396   bootstrapswarm   netbsd   43 parked           91d15100
 2358    389   bootstrapswarm   netbsd   40 parked           91f0f740
 2358    385   bootstrapswarm   netbsd   42 parked           91f0f1c0
 2358    384   bootstrapswarm   netbsd   41 wait             91ec908c
 2358   1960   bootstrapswarm   netbsd   43                  0
 2358   2641   bootstrapswarm   netbsd   43 nanoslp          91ee4700
 
 2358   2358   bootstrapswarm   netbsd   43 parked           91ec16c0
 1998   1998         multilog   netbsd   43 pipe_rd          91d212cc
 2674   2674         multilog   netbsd   41 pipe_rd          91d21224
 1501   1501               sh   netbsd   40 wait             918ecacc
 2339   2339               sh   netbsd   41 wait             91d1aa4c
 2235   2235               sh   netbsd   41 wait             91d1a54c
 2100   2100        supervise   netbsd   43 poll             915d0280
 2128   2128        supervise   netbsd   43                  0
 2238   2238        supervise   netbsd   43 poll             915d0280
 
 1952   1952        supervise   netbsd   43 poll             913c5340
 2236   2236         multilog   netbsd   43 pipe_rd          918438ac
 2303   2303           svscan   netbsd   43 nanoslp          918bb5c0
 1884   1884          syslogd   netbsd   43 kqueue           91835f38
 2033   2033            mdnsd   netbsd   43 select           915c6cc0
 932     932           dhcpcd   netbsd   43 poll             915d0280
 990     990           dhcpcd   netbsd   36 poll             915c6f40
 879     879           dhcpcd   netbsd   43                  0
 991     991           dhcpcd   netbsd   43 poll             915c6cc0
 570     570          devpubd   netbsd   33 devmon           80aad06c
 1         1             init   netbsd   41 wait             9170e04c
 0       366           system   netbsd  123 physiod          9180fe04
 0       218           system   netbsd   43 bwfm0            9180f444
 
 0       217           system   netbsd   96 lnxcmplt         917c2a18
 0       216           system   netbsd  125 pooldrain        80abe900
 0       215           system   netbsd  124 syncer           91808200
 0       214           system   netbsd  126 pgdaemon         80abdcf0
 0       213           system   netbsd  123 data             917bdad4
 0       212           system   netbsd   96 semacv           80a86408
 0       211           system   netbsd   96 semacv           80a863f4
 0       210           system   netbsd   96 semacv           80a863e0
 0       207           system   netbsd   43 swwreboot        917dea44
 0       205           system   netbsd   96 sccomp           917c23dc
 0       203           system   netbsd   96 npfgcw           917bf144
 0       202           system   netbsd  222 rt_free          91709444
 0       201           system   netbsd   96 unpgc            80b274c8
 0       200           system   netbsd  222 key_timehandler  91709384
 
 0       199           system   netbsd  222 icmp6_wqinput    91707484
 0       198           system   netbsd  222 icmp6_wqinput    91707444
 0       197           system   netbsd  222 icmp6_wqinput    91707404
 0       196           system   netbsd  222 icmp6_wqinput    917073c4
 0       195           system   netbsd   96 usbevt           913cbb98
 0       194           system   netbsd  222 nd6_timer        916e5744
 0       193           system   netbsd  222 carp6_wqinput    913cad44
 0       192           system   netbsd  222 carp6_wqinput    913cad04
 0       170           system   netbsd  222 carp6_wqinput    913cacc4
 0       177           system   netbsd  222 carp6_wqinput    913cac84
 0       174           system   netbsd  222 carp_wqinput     913cabc4
 0       179           system   netbsd  222 carp_wqinput     913cab84
 0       176           system   netbsd  222 carp_wqinput     913cab44
 0        31           system   netbsd  222 carp_wqinput     913cab04
 
 0        63           system   netbsd  222 icmp_wqinput     913caa44
 0       126           system   netbsd  222 icmp_wqinput     913caa04
 0       125           system   netbsd  222 icmp_wqinput     913ca9c4
 0       124           system   netbsd  222 icmp_wqinput     913ca984
 0       123           system   netbsd  222 rt_timer         916e5684
 0       122           system   netbsd  125 vmem_rehash      915ebc84
 0       121           system   netbsd   43 vcmbox0          916e5444
 0       120           system   netbsd   96 usbtsk           80aacdac
 0       119           system   netbsd   96 usbtsk           80aacd8c
 0       118           system   netbsd   43 dwc2             916d58c4
 0       117           system   netbsd  221 mmctaskq         916e438c
 0       116           system   netbsd  221 mmctaskq         916e408c
 0       107           system   netbsd   43 xclocv           809e0c04
 0       105           system   netbsd  127 xcall            807c2448
 0       104           system   netbsd  223                  0
 0       103           system   netbsd  220                  0
 
 0       102           system   netbsd  221                  0
 0       101           system   netbsd  222                  0
 0    >  100           system   netbsd    0                  0
 0        99           system   netbsd  127                  0
 0        98           system   netbsd  223                  0
 0        97           system   netbsd  220                  0
 0        96           system   netbsd  221                  0
 0        30           system   netbsd  222                  0
 0        29           system   netbsd    0                  0
 0        28           system   netbsd  127 xcall            807c15c8
 0        27           system   netbsd  223                  0
 0        26           system   netbsd  220                  0
 0        25           system   netbsd  221                  0
 0        24           system   netbsd  222                  0
 0        23           system   netbsd    0                  0
 
 0        22           system   netbsd   43 lnxsyswq         913c5844
 0        21           system   netbsd   43 lnxubdwq         913c57c4
 0        20           system   netbsd   43 lnxpwrwq         913c5744
 0        19           system   netbsd   43 lnxlngwq         913c56c4
 0        18           system   netbsd   43 lnxhipwq         913c5644
 0        17           system   netbsd   43 lnxrcugc         809d7fc4
 0        16           system   netbsd   96 smtaskq          80aaf33c
 0        15           system   netbsd   43 pmfsuspend       913a2a44
 0        14           system   netbsd   43 pmfevent         913a2984
 0        13           system   netbsd   96 sopendfr         80b27484
 0        12           system   netbsd  222 ifwdog           913a28c4
 0        11           system   netbsd  222 iflnkst          913a2804
 0        10           system   netbsd   43 nfssilly         913a2744
 0         9           system   netbsd  125 vdrain           80b27f9c
 0         8           system   netbsd  125 mod_unld         80b1fd7c
 
 0         7           system   netbsd  127 xcall            807c0e88
 0         6           system   netbsd  223                  0
 0         5           system   netbsd  220                  0
 0         4           system   netbsd  221                  0
 0         3           system   netbsd  222                  0
 0         2           system   netbsd    0                  0
 0         0           system   netbsd  125 uvm              807fa9c0
 
 
 Is it normal that ps/w prints output continuously until you press Ctrl+C?
 
 > Can you start crash(8) and stack traces from the processes not in RUN
 > state, like the tstile one with `bt 0t18154'?
 
 I tried looking at the "find" process in tstile (3770) and the 
 "pipeline.test" process (2534) but got the following. Is this a bug in 
 crash(8)?
 
 According to htop, process 2534 was hogging 100% of one core. It looks 
 like it was actually spinning on the CPU?
 
 crash> bt/t 2534
 trace: pid 9524 not found
 crash> bt/t 3770
 trace: pid 14192 not found
 
 > Can you run dtrace to sample what's happening?
 > 
 > dtrace -n 'profile:::profile-97 { @[stack()] = count() }'
 
 Next time :) I had to enable dtrace in modules.conf.
 
 Sorry, this is probably not the most helpful answer. When this 
 invariably happens again, I will try the other debugging techniques.
 
 FWIW, typing "sync" made the whole machine hang, so it could also be 
 storage-related. The go processes are running off USB storage, and there 
 is also a swap partition on the USB storage. dmesg did not contain 
 anything relevant though.
 
 Would it be useful to run with a LOCKDEBUG kernel?
 
 -- 
 Benny
 


Home | Main Index | Thread Index | Old Index