Subject: Re: How to read a crash file?
To: Hubert Feyrer <hubert@feyrer.de>
From: Michael Bubb <michael.bubb@gmail.com>
List: current-users
Date: 11/17/2006 15:36:12
Thank you - that is an invaluable example. This is something I've
wondered about and have played with abit.
Have been very much enjoying your blog, btw.
Michael
On 11/17/06, Hubert Feyrer <hubert@feyrer.de> wrote:
>
> I've put some information on this into my NetBSD blog a few days ago, see
> <http://www.feyrer.de/NetBSD/blog.html/nb_20061115_0123.html>:
>
> ``Post mortem debugging, or: what happened before it crashed?
>
> So your machine paniced, and as you were running X you have no clue what
> went on? Here's a nice way to find out, assuming you have a kernel crash
> dump. To ensure the latter, set kern.dump_on_panic=1 in /etc/sysctl.conf.
> Now, what to do with those crashdumps?
>
> % ls -l /var/crash/
> total 3183838
> -rw-r--r-- 1 root wheel 3 Nov 2 02:09 bounds
> -rw-r--r-- 1 root wheel 5 Jun 30 2004 minfree
> ...
> -rw------- 1 root wheel 181265401 Nov 2 02:11 netbsd.26.core.gz
> -rw------- 1 root wheel 2162696 Nov 2 02:11 netbsd.26.gz
>
> In /var/crash, "bounds" contains an increasing counter for the crashdump
> number (it would be "27" in the above example), and "minfree" contains the
> minimum amount of free space in kilobytes that should keep free - both
> files are read by savecore(8) when /etc/rc.conf has "savecore=yes", which
> is the default.
>
> The actual crashdump consists of two gzipped files - the actual memory
> dump "netbsd.XX.core.gz" and a copy of the running kernel "netbsd.xx.gz".
> After uncompressing the files can be used for looking at the system at the
> point of it's panic:
>
> # gunzip netbsd.26*.gz
> #
>
> Note that the crashdump may contain sensitive data and is such only
> readable by root!
>
> The crashdump can be read by programs that use libkvm to read through the
> crashdump's kernel memory, e.g. gdb(1), dmesg(8), ps(1), fstat(8),
> ipcs(1), netstat(8), nfsstat(8), pmap(1), w(1), pstat(8), vmstat(8) etc.,
> using the -M and -N switches.
>
> Some examples:
>
> * To show the system's message buffer at the time of the crash:
>
> % dmesg -M netbsd.26.core -N netbsd.26
> ...
> unmounting /home (/dev/wd1e)...
> unmounting /tmp (mfs:371)...warning: mfs read during shutdown
> dev = 0xff00, block = 10496, fs = /tmp
> panic: blkfree: freeing free block
> Begin traceback...
> uvm_fault(0xcbfd07f0, 0x2000, 1) -> 0xe
> fatal page fault in supervisor mode
> trap type 6 code 0 eip c0305083 cs 8 eflags 10246 cr2 2900 ilevel 0
> panic: trap
> Faulted in mid-traceback; aborting...
> dumping to dev 0,1 offset 2024327
> dump 511 510 509 508 507 506 505 504 503 502 501 500 499 498 497 496
> 495 494 493 ...
>
> Apparently the system tried to free a block that was already fred
> here when umounting /tmp.
>
> * Display virtual memory parameters:
>
> % vmstat -M netbsd.26.core -N netbsd.26 -s
> 4096 bytes per page
> 8 page colors
> 127888 pages managed
> ...
>
> * Attach the GNU debugger gdb(1) to the system crash dumpQ, to poke
> around deeply:
>
> % gdb netbsd.26
> ...
> (gdb) target kcore netbsd.26.core
> panic: blkfree: freeing free block
> #0 0x0ac04000 in ?? ()
> (gdb) bt
> #0 0x0ac04000 in ?? ()
> #1 0xc03084b5 in cpu_reboot ()
> #2 0xc02a57aa in panic ()
> #3 0xc0313127 in trap ()
> #4 0xc0102dfd in calltrap ()
> #5 0xc0182544 in db_get_value ()
> #6 0xc03058f1 in db_stack_trace_print ()
> #7 0xc02a577c in panic ()
> #8 0xc0205db7 in ffs_blkfree ()
> #9 0xc020b8d5 in ffs_indirtrunc ()
> ...
>
> * Unfortunately there are a number of programs that I didn't get to
> work with my crashdump, but that may be due to its point
> after/during system shutdown, e.g. ps(1) didn't work.
>
> Still that should give some start for poking around...''
>
>
> - Hubert
>
--
Michael Bubb | Hoboken, NJ | 201.736.0870 | fax 201.377.1717