Subject: install/15400: sysinst sometimes dumps core in curses routine
To: None <gnats-bugs@gnats.netbsd.org>
From: Duncan McEwan <duncan@mcs.vuw.ac.nz>
List: netbsd-bugs
Date: 01/28/2002 17:25:31
>Number: 15400
>Category: install
>Synopsis: sysinst sometimes dumps core in curses routine
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: install-manager
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jan 27 20:26:01 PST 2002
>Closed-Date:
>Last-Modified:
>Originator: Duncan McEwan
>Release: NetBSD 1.5ZA build with sources from early January.
>Organization:
Victoria University of Wellington, New Zealand
>Environment:
System: NetBSD turakirae.mcs.vuw.ac.nz 1.5ZA NetBSD 1.5ZA (GEN_X) #0: Fri Jan 4 12:56:58 NZDT 2002 mark@turakirae.mcs.vuw.ac.nz:/mnt/SAVE/build.obj/sys/arch/i386/compile/GEN_X i386
Architecture: i386
Machine: i386
>Description:
Our (slightly modified!) sysinst can be made to dump core repeatably
while extracting distribution sets with pax.
We are reasonably sure that our modifications have not caused the
problem. They do a couple of things: (a) set a few default answers
to values appropriate for our local environment; and (b) add a couple
of extra distribution sets of local software that we want installed
on all our NetBSD machines.
>How-To-Repeat:
The problem occurs most frequently when you (a) update the MBR on
the disk; and (b) answer "yes" when sysinst asks whether you want to
see files listed as they are extracted by pax.
Our previous workaround for this problem was to do the installation
in two stages. First update the MBR, then reboot and rerun sysinst.
However, today we discovered that saying "no" when asked whether we
want to see the extracted files listed seems to prevent the coredump
from occuring.
As stated above, due to the nature of our modifications we don't
believe they are *directly* to blame. However, as we are not aware of
anyone else reporting this problem, it is possible that a side-
effect of them could be to trigger an existing bug. For eg, perhaps
the fact that we extract more/larger distribution sets might cause
sysinst to exhaust memory?
>Fix:
We don't have a fix. However, we did compile a non-crunchgen'd
sysinst binary with '-g' and got that onto a machine we were about
to install using a floppy disk. We also used a floppy to get the
resulting corefile from the machine.
Running gdb on the core file showed that the coredump occured
at the following line in the _waddbytes curses routine.
Core was generated by `sysinst'.
Program terminated with signal 11, Segmentation fault.
#0 0x805ff13 in __waddbytes (win=0x8112880,
bytes=0xbfbfd3cb "<25 bytes of binary junk deleted>", count=0,
attr=0) at /src/work/src/lib/libcurses/addbytes.c:165
165 if (lp->line[x].ch != c ||
I used the gdb print command to look at a few variables that _waddchar
uses and found that x has the reasonable looking value of 0,
as does the variable y, but printing win->lines[0].line[0] (which is
what the above line is equivelent to) causes gdb to say "Cannot access
memory at address 0x732f7972".
Further debugging analysis is hard because I can't see any way of
running gdb on a live sysinst while it is installing a system.
So I'm hoping that someone who knows the code better might be able
to suggest what might be going wrong here, even if they are not able
to reproduce the problem we are seeing themselves. To help with this
I've made the sysinst binary we used (compiled with debugging symbols)
and the sysinst.core file available at
http://www.mcs.vuw.ac.nz/~duncan/{sysinst,sysinst.core}
for further postmortem analysis. I'll be more than happy to try some
additional experiments to gather more information if it will help.
>Release-Note:
>Audit-Trail:
>Unformatted: