Subject: xsrc/15357: stack trashing bug crashing the sparc Xservers
To: None <gnats-bugs@gnats.netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: netbsd-bugs
Date: 01/24/2002 21:36:21
>Number: 15357
>Category: xsrc
>Synopsis: stack trashing bug crashing the sparc Xservers
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: xsrc-manager
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jan 24 18:37:00 PST 2002
>Closed-Date:
>Last-Modified:
>Originator: Greg A. Woods
>Release: xsrc-2001/07/03
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Environment:
System: NetBSD 1.5W
Architecture: sparc
Machine: sparc
>Description:
I've been suffering occasional crashes of the Xserver on my
primary workstation, a SPARCstation-1, now a 1+, ever since I
first began to use it.
Originally (for me) it ran NetBSD-1.3.2/sparc.
Now it runs 1.5W from sources last updated 2001/06/24, and xsrc
built from sources last updated 2001/07/03.
It runs diskless, and has 16MB of ram and a bwtwo frame buffer.
Since upgrading last week I've been suffering these crashes even
more frequently it seems, every other day instead of every other
week (though since I know not exactly what causes them I'm not
sure how to rate their frequency).
It doesn't seem to make any difference whether I run Xsun or
XsunMono, but since I find the latter to perform slightly
better, and since it is sufficient for this hardware, that's
what I prefer to run.
It doesn't matter whether I start it from xdm or xinit.
I generally run it with xfs (xset fp= tcp/server:7100).
Yesterday I decided to suffer the overhead of gdb and I attached
gdb to the running XsunMono process shortly after I had started
it with xinit. Here are the results:
14:04 [19] $ gdb /usr/X11R6/bin/XsunMono 6720
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "sparc--netbsd"...
/proven/work/woods/NetBSD-xsrc/xc/programs/Xserver/6720: No such file or directry.
Attaching to program `/usr/X11R6/bin/XsunMono', process 6720
0x18e978 in select ()
(gdb) cont
Continuing.
Program received signal SIGPIPE, Broken pipe.
0x1a3758 in writev ()
(gdb) cont
Continuing.
Program received signal SIGBUS, Bus error.
0x19960 in DeliverEventsToWindow (pWin=0x1f02e9, pEvents=0x417808, count=8,
filter=2147483648, grab=0x0, mskidx=37159673) at events.c:1199
1199 if (filter != CantBeFiltered &&
(gdb) where
#0 0x19960 in DeliverEventsToWindow (pWin=0x1f02e9, pEvents=0x417808,
count=8, filter=2147483648, grab=0x0, mskidx=37159673) at events.c:1199
#1 0x1ac4c in DeliverFocusedEvent (keybd=0xb0680, xE=0x417808,
window=0x46e940, count=8) at events.c:1921
#2 0x382734 in ?? ()
Error accessing memory address 0x3d: Invalid argument.
(gdb) list
1194
1195 /* CantBeFiltered means only window owner gets the event */
1196 if ((filter == CantBeFiltered) || !(type & EXTENSION_EVENT_BASE))
1197 {
1198 /* if nobody ever wants to see this event, skip some work */
1199 if (filter != CantBeFiltered &&
1200 !((wOtherEventMasks(pWin)|pWin->eventMask) & filter))
1201 return 0;
1202 if ( (attempt = TryClientEvents(wClient(pWin), pEvents, count,
1203 pWin->eventMask, filter, grab)) )
(gdb)
Here's all gdb can tell me from an earlier dump of Xsun:
21:19 [28] $ gdb Xsun ~/Xsun.core
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "sparc--netbsd"...
Core was generated by `Xsun'.
Program terminated with signal 11, Segmentation fault.
#0 mieqProcessInputEvents () at mieq.c:191
191 }
(gdb) where
#0 mieqProcessInputEvents () at mieq.c:191
Cannot access memory at address 0x168.
(gdb) list
186 (*miEventQueue.pPtr->processInputProc)
187 (&xe, (DeviceIntPtr)miEventQueue.pPtr, 1);
188 break;
189 }
190 }
191 }
192 }
(gdb)
As I recall this was as much information as I was able to get
from the cores from the 1.3.2 release too.
It seems the stack is always so thouroughly trashed that any
possibility of finding the real backtrace is impossible.
I've no idea how to debug this further without getting much more
familiar with the Xserver code (I know almost nothing about it
now). I've thought of various compiler hacks which might be
possible to try and detect the stack trashing earlier by saving
the return address just after every call and comparing it to the
value still on the stack before executing the return
instruction, or to even save a copy of the entire stack just
after every function call (before executing the first
instruction of the function), etc., but there doesn't seem to be
any quick hack that would be both efficient enough to run with
and effective enough to catch the stack trashing.
Maybe if there were a compiler option that could be used in
conjunction with debugger watchpoints so that a watch would be
automatically set on every return address on the stack.....
Even then I suspect the overhead would make X unusable on any
sparcstation-1 or -2 class machine and thus make it impossible
to run long enough to trigger the bug.
>How-To-Repeat:
unknown
>Fix:
unknown
>Release-Note:
>Audit-Trail:
>Unformatted: