Subject: Re: tset lossage on sparc64/1.6ZE
To: None <port-sparc64@netbsd.org>
From: Sean Davis <dive-nb@endersgame.net>
List: port-sparc64
Date: 12/10/2003 02:10:54
On Tue, Dec 09, 2003 at 10:48:25PM -0500, Sean Davis wrote:
> First some background: The machine in question is an Ultra 5, running NetBSD
> 1.6ZE. It has been up for 20 & 1/2 days, and this is the first time this has
> happened, but it is now happening constantly.
> 
> When logging in via rsh or ssh, the machine sticks (for lack of a better
> word) in this part of .login:
> eval `tset -s -m 'network:?xterm'
> 
> I have no idea why, as it has been working perfectly until today. Just last
> night I was able to rsh and ssh in just fine. Nothing on the machine has
> changed. Hitting control-C kills tset, and the login continues successfully,
> but obviously without the correct terminal voodoo happening.
> 

Okay, I tracked it down. sleep(1) in tset was sleeping forever. I wrote a
test program to sleep(1), and it hung in the exact same way. getty also
never respawned on console. 'sleep 1' in the shell would also hang until
killed. I rebooted the machine (not cleanly, unfortunately) and got some
fs corruption, but it appeared to be limited to /usr/bin/tset. I'm currently
trying a crossbuild of -current from i386 to sparc64 of userland, to replace
the running userland with. I'll also build a new kernel (this time with
KTRACE) but that doesn't solve the root issue. Why would nanosleep() sleep
forever?

After a reboot, sleep 1 sleeps for 1 second as expected, but as I have no
idea what caused it to behave the way it was before the reboot, I am
skeptical about whether it'll happen again or not. Hopefully a new -current
userland/kernel will fix it.

-Sean

--
/~\ The ASCII
\ / Ribbon Campaign                   Sean Davis
 X  Against HTML                       aka dive
/ \ Email!