NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/58871: Stuck processes



> Date: Sat, 14 Dec 2024 15:13:02 +0100
> From: Benny Siegert <bsiegert%gmail.com@localhost>
> 
> Here is another set of logs from a system in this state.
> 
> What struck me is that like last time, there is a "find" process in 
> tstile state. I actually have a trace of where it is stuck, see below.
> 
> I don't think the Go tests (which are the main thing running on these 
> boxes) run find at all, so maybe this is some cronjob?

Probably something in /etc/daily or /etc/security, yes.  You can
disable these in /etc/daily.conf or /etc/security.conf; see the man
pages for details.  But the symptoms suggests there's a deep problem
here which we should really find the source of.

> Am 03.12.24 um 21:05 schrieb Taylor R Campbell via gnats:
> >   Can you start crash(8) and get output from `ps', `ps/w', and `show all
> >   tstiles'?
> 
> See below.
> 
> >   Can you start crash(8) and stack traces from the processes not in RUN
> >   state, like the tstile one with `bt 0t18154'?
> 
> Did that for the one process that was in tstile, the "find" one.

Thanks, that's pretty weird: find is waiting for a lock but nobody is
holding it!

  PID   LID          COMMAND      WAITING-FOR     WAIT-CHANNEL
11160 11160             find                0         95e9ea40

Just to be absolutely sure, can you send output of the following on
your kernel?

$ ident /netbsd | grep -F cpuswitch.S

And can you share the disassembly of the function cpu_switchto in your
kernel, with objdump or gdb?

Also, if you have netbsd.gdb for the kernel, can you force a crash
dump (enter ddb and run `sync'), so we can see the arguments and
locals in the stack trace?  Curious to see what file find is stuck on.

> >   Can you run dtrace to sample what's happening?
> >   
> >   dtrace -n 'profile:::profile-97 { @[stack()] = count() }'
> 
> This made the system hang, for whatever reason.

Does it make the system hang only while in this state, or does it
happen every time?  How does it hang -- is it completely unresponsive,
can you enter ddb, can you interact with crash, or what?

(See earlier messages on this ticket for ways to enter ddb if C-A-ESC
isn't working.)


Home | Main Index | Thread Index | Old Index