NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/58871: Stuck processes



The following reply was made to PR kern/58871; it has been noted by GNATS.

From: Taylor R Campbell <riastradh%NetBSD.org@localhost>
To: Benny Siegert <bsiegert%gmail.com@localhost>
Cc: gnats-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost,
	gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/58871: Stuck processes
Date: Sat, 28 Dec 2024 15:13:44 +0000

 > Date: Sat, 14 Dec 2024 15:13:02 +0100
 > From: Benny Siegert <bsiegert%gmail.com@localhost>
 > 
 > Here is another set of logs from a system in this state.
 > 
 > What struck me is that like last time, there is a "find" process in 
 > tstile state. I actually have a trace of where it is stuck, see below.
 > 
 > I don't think the Go tests (which are the main thing running on these 
 > boxes) run find at all, so maybe this is some cronjob?
 
 Probably something in /etc/daily or /etc/security, yes.  You can
 disable these in /etc/daily.conf or /etc/security.conf; see the man
 pages for details.  But the symptoms suggests there's a deep problem
 here which we should really find the source of.
 
 > Am 03.12.24 um 21:05 schrieb Taylor R Campbell via gnats:
 > >   Can you start crash(8) and get output from `ps', `ps/w', and `show all
 > >   tstiles'?
 > 
 > See below.
 > 
 > >   Can you start crash(8) and stack traces from the processes not in RUN
 > >   state, like the tstile one with `bt 0t18154'?
 > 
 > Did that for the one process that was in tstile, the "find" one.
 
 Thanks, that's pretty weird: find is waiting for a lock but nobody is
 holding it!
 
   PID   LID          COMMAND      WAITING-FOR     WAIT-CHANNEL
 11160 11160             find                0         95e9ea40
 
 Just to be absolutely sure, can you send output of the following on
 your kernel?
 
 $ ident /netbsd | grep -F cpuswitch.S
 
 And can you share the disassembly of the function cpu_switchto in your
 kernel, with objdump or gdb?
 
 Also, if you have netbsd.gdb for the kernel, can you force a crash
 dump (enter ddb and run `sync'), so we can see the arguments and
 locals in the stack trace?  Curious to see what file find is stuck on.
 
 > >   Can you run dtrace to sample what's happening?
 > >   
 > >   dtrace -n 'profile:::profile-97 { @[stack()] = count() }'
 > 
 > This made the system hang, for whatever reason.
 
 Does it make the system hang only while in this state, or does it
 happen every time?  How does it hang -- is it completely unresponsive,
 can you enter ddb, can you interact with crash, or what?
 
 (See earlier messages on this ticket for ways to enter ddb if C-A-ESC
 isn't working.)
 


Home | Main Index | Thread Index | Old Index