Subject: bin/13301: ksh will dump core sometimes if it gets a spurious SIGWINCH
To: None <gnats-bugs@gnats.netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: netbsd-bugs
Date: 06/24/2001 23:34:20
>Number: 13301
>Category: bin
>Synopsis: ksh will dump core sometimes if it gets a spurious SIGWINCH
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jun 24 20:32:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator: Greg A. Woods
>Release: 2001/06/19
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Environment:
all NetBSD IIRC, though most recently noticable on sparc
Architecture: sparc
Machine: sparc
>Description:
This problem has been "bugging" me for nearly forever (well ever
since ksh became a standard part of NetBSD, and maybe even from
before that. I don't know if I've reported this before or not,
but at least now I've got a half-useful traceback.
Normally I don't trip over it on fast machines, but when
openning an xterm on a slower or loaded machine I'll sometimes
go to resize the window before the shell has finished sourcing
~/.profile et al and some sub-shell being started by .profile or
.kshrc or whatever (never the parent) will dump core.
When not compiled with '-g', at least on sparc, the error is
SIGBUS or SIGILL, and the stack frames are always pretty much
totally corrupt and useless. This one at least starts out in an
apparently valid place, but as we'll see it's just as broken
Core was generated by `ksh'.
Program terminated with signal 4, Illegal instruction.
#0 0x11000 in c_pwd ()
(gdb) where
#0 0x11000 in c_pwd ()
#1 0xa78ec in ?? ()
Cannot access memory at address 0x2d703a58.
I finally tonight got bored enough while watching "make build"s
run to try deubgging this.
Now the tricky part is you can't just run your login shell under
the debugger, particularly if it's a shell being started by the
likes of: "rsh -n host "xterm -ls"
Luckily that's not necessary since the binary compiled with '-g'
seems to generate a valid core dump.
>How-To-Repeat:
1. make your ~/.profile fairly complex so that it takes a bit of
time and so that it needs to run several subshells.
2. start an xterm with '-ls' (i.e. so that it runs the shell as
a login shell).
3. resize the xterm window constantly while your .profile is
doing its thing
4. watch for error messages and look for the resultinga core
dump after you get your shell prompt....
$ gdb /bin/ksh ksh.core
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "sparc--netbsd"...
Core was generated by `ksh'.
Program terminated with signal 11, Segmentation fault.
#0 0x2fce0 in trapsig (i=87996)
at /proven/work/woods/NetBSD-src/bin/ksh/trap.c:117
117 trap = p->set = 1;
(gdb) where
#0 0x2fce0 in trapsig (i=87996)
at /proven/work/woods/NetBSD-src/bin/ksh/trap.c:117
#1 0xefffff74 in ?? ()
#2 0x2c154 in shf_flush (shf=0xacb68)
at /proven/work/woods/NetBSD-src/bin/ksh/shf.c:316
#3 0x1d3b4 in execute (t=0xacb68, flags=50)
at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:99
#4 0x1c570 in comsub (xp=0xefffedf0, cp=0xacb68 "")
at /proven/work/woods/NetBSD-src/bin/ksh/eval.c:877
#5 0x1b1c0 in expand (cp=0xa72b1 "expr \":$varvalue:\" : \".*:$1:.*\"",
wp=0xefffee78, f=11) at /proven/work/woods/NetBSD-src/bin/ksh/eval.c:243
#6 0x1adac in eval (ap=0xa7224, f=11)
at /proven/work/woods/NetBSD-src/bin/ksh/eval.c:95
#7 0x1d42c in execute (t=0xa71c8, flags=256)
at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:116
#8 0x1de14 in execute (t=0xa7198, flags=0)
at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:376
#9 0x1d8b4 in execute (t=0xa7168, flags=0)
at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:194
#10 0x1ddb0 in execute (t=0xa7050, flags=0)
at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:369
#11 0x1d8b4 in execute (t=0xa7020, flags=0)
at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:194
---Type <return> to continue, or q <return> to quit---
#12 0x1df28 in execute (t=0xa6858, flags=0)
at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:394
#13 0x1e794 in comexec (t=0xa7928, tp=0xa6820, ap=0xa6040, flags=0)
at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:664
#14 0x1d724 in execute (t=0xa7928, flags=0)
at /proven/work/woods/NetBSD-src/bin/ksh/exec.c:157
#15 0x28f88 in shell (s=0xa1820, toplevel=0)
at /proven/work/woods/NetBSD-src/bin/ksh/main.c:623
#16 0x28bd8 in include (name=0x9fe18 "/home/most/woods/.profile", argc=0,
argv=0x0, intr_ok=1) at /proven/work/woods/NetBSD-src/bin/ksh/main.c:504
#17 0x288a8 in main (argc=1, argv=0xeffff7c4)
at /proven/work/woods/NetBSD-src/bin/ksh/main.c:379
#18 0x10238 in ___start ()
(gdb) list
112 trapsig(i)
113 int i;
114 {
115 Trap *p = &sigtraps[i];
116
117 trap = p->set = 1;
118 if (p->flags & TF_DFL_INTR)
119 intrsig = 1;
120 if ((p->flags & TF_FATAL) && !p->trap) {
121 fatal_trap = 1;
(gdb) print p
$1 = (Trap *) 0x340888
(gdb) print *p
Cannot access memory at address 0x340888.
(gdb) print i
$2 = 87996
(gdb) print sizeof(sigtraps)
$3 = 1088
(gdb)
OK, well the value of 'i' is clearly wonky.
Can NetBSD really be calling signal handlers differently than
other systems?
>Fix:
unknown
>Release-Note:
>Audit-Trail:
>Unformatted: