Subject: Re: xsrc/15357: stack trashing bug crashing the sparc Xservers
To: NetBSD GNATS submissions and followups <gnats-bugs@gnats.netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: netbsd-bugs
Date: 03/07/2002 03:10:42
Well I've now been running an XsunMono server built by lang/gcc-ssp
(which is GCC 2.95.3 with the stack protector patch), with '-g -O
-static -fstack-protector', now for over 48 hours.

It still crashes.  Four times so far.  None so far have been aborts
genterated by the stack protector.  All have resulted in
unusable/incorrect stack backtraces (not even with gdb-5.1.1).

Two of the core dumps have almost identical backtraces.  One is
suspiciously like some of the crashes from the version built by the
system compiler (egcs-1.1.2 on 1.5W)

One of the crashes happened shortly after the window manager was unable
to change keyboard focus, and just after the client (emacs) which still
had focus was "killed" supposedly because some X operation was "bad"
(there was an Xlib error message but I didn't get time to copy it down
before the crash happened).  This suggests to me that something goes
rotten long before a pointer gets trashed.

What I don't understand is how the stack can be too broken for GDB to
walk it back up to main(), but yet the stack protector hasn't detected a
FUBAR'ed return address yet.

The only good thing I can say is that I think this binary, compiled with
just '-O', is slightly faster than the egcs-1.1.2 binary compiled with -O2.  :-)

I've now got quite a pile of core files available for anyone who wishes
to look at them.  Several have apparently consistent stacks, and what's
weird in them is that parameters (pointers IIRC) have different values
inside the called function from what they variables they are taken from
have in the calling function.

There doesn't appear to be any real pattern to the apparent cause.  The
only weird hint I've had so far is that if I have a scroll pane of xterm
or emacs window clipped by the edge of the screen then sometimes it
seems to crash faster.  For example I've been using this instance since
about 5pm (it's 3am here now), though not constantly, and I've done a
lot more with it than I have with other sessions that have lasted only
an hour or less and the only thing I've done different, as far as I
know, is to keep all the working panes fully on the screen.  (I use ctwm
and have it configured to normally not allow a window to be bigger than
the screen or to be shoved off the screen, but a full height window
sometimes has a wee bit extra.  What I'm suggesting is that the server
seems to last longer if I put the window title bar part-way off the
screen rather than leave the working pane a wee, i.e. a few pixels, bit
off the screen.)

Does anyone have whatever would be necessary to get hardware debugger
support for a SparcStation 1+?  :-)

-- 
								Greg A. Woods

+1 416 218-0098;  <gwoods@acm.org>;  <g.a.woods@ieee.org>;  <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>