current-users: Re: How to read a crash file?

Subject: Re: How to read a crash file?
To: Hubert Feyrer <hubert@feyrer.de>
From: Michael Bubb <michael.bubb@gmail.com>
List: current-users
Date: 11/17/2006 15:36:12
Thank you - that is an invaluable example. This is something I've
wondered about and have played with abit.

Have been very much enjoying your blog, btw.

Michael

On 11/17/06, Hubert Feyrer <hubert@feyrer.de> wrote:
>
> I've put some information on this into my NetBSD blog a few days ago, see
> <http://www.feyrer.de/NetBSD/blog.html/nb_20061115_0123.html>:
>
> ``Post mortem debugging, or: what happened before it crashed?
>
> So your machine paniced, and as you were running X you have no clue what
> went on? Here's a nice way to find out, assuming you have a kernel crash
> dump. To ensure the latter, set kern.dump_on_panic=1 in /etc/sysctl.conf.
> Now, what to do with those crashdumps?
>
> % ls -l /var/crash/
> total 3183838
> -rw-r--r--  1 root  wheel          3 Nov  2 02:09 bounds
> -rw-r--r--  1 root  wheel          5 Jun 30  2004 minfree
> ...
> -rw-------  1 root  wheel  181265401 Nov  2 02:11 netbsd.26.core.gz
> -rw-------  1 root  wheel    2162696 Nov  2 02:11 netbsd.26.gz
>
> In /var/crash, "bounds" contains an increasing counter for the crashdump
> number (it would be "27" in the above example), and "minfree" contains the
> minimum amount of free space in kilobytes that should keep free - both
> files are read by savecore(8) when /etc/rc.conf has "savecore=yes", which
> is the default.
>
> The actual crashdump consists of two gzipped files - the actual memory
> dump "netbsd.XX.core.gz" and a copy of the running kernel "netbsd.xx.gz".
> After uncompressing the files can be used for looking at the system at the
> point of it's panic:
>
> # gunzip netbsd.26*.gz
> #
>
> Note that the crashdump may contain sensitive data and is such only
> readable by root!
>
> The crashdump can be read by programs that use libkvm to read through the
> crashdump's kernel memory, e.g. gdb(1), dmesg(8), ps(1), fstat(8),
> ipcs(1), netstat(8), nfsstat(8), pmap(1), w(1), pstat(8), vmstat(8) etc.,
> using the -M and -N switches.
>
> Some examples:
>
>      * To show the system's message buffer at the time of the crash:
>
>        % dmesg -M netbsd.26.core -N netbsd.26
>        ...
>        unmounting /home (/dev/wd1e)...
>        unmounting /tmp (mfs:371)...warning: mfs read during shutdown
>        dev = 0xff00, block = 10496, fs = /tmp
>        panic: blkfree: freeing free block
>        Begin traceback...
>        uvm_fault(0xcbfd07f0, 0x2000, 1) -> 0xe
>        fatal page fault in supervisor mode
>        trap type 6 code 0 eip c0305083 cs 8 eflags 10246 cr2 2900 ilevel 0
>        panic: trap
>        Faulted in mid-traceback; aborting...
>        dumping to dev 0,1 offset 2024327
>        dump 511 510 509 508 507 506 505 504 503 502 501 500 499 498 497 496
>        495 494 493 ...
>
>        Apparently the system tried to free a block that was already fred
>        here when umounting /tmp.
>
>      * Display virtual memory parameters:
>
>        % vmstat -M netbsd.26.core -N netbsd.26 -s
>             4096 bytes per page
>                8 page colors
>           127888 pages managed
>                  ...
>
>      * Attach the GNU debugger gdb(1) to the system crash dumpQ, to poke
>        around deeply:
>
>        % gdb netbsd.26
>        ...
>        (gdb) target kcore netbsd.26.core
>        panic: blkfree: freeing free block
>        #0  0x0ac04000 in ?? ()
>        (gdb) bt
>        #0  0x0ac04000 in ?? ()
>        #1  0xc03084b5 in cpu_reboot ()
>        #2  0xc02a57aa in panic ()
>        #3  0xc0313127 in trap ()
>        #4  0xc0102dfd in calltrap ()
>        #5  0xc0182544 in db_get_value ()
>        #6  0xc03058f1 in db_stack_trace_print ()
>        #7  0xc02a577c in panic ()
>        #8  0xc0205db7 in ffs_blkfree ()
>        #9  0xc020b8d5 in ffs_indirtrunc ()
>        ...
>
>      * Unfortunately there are a number of programs that I didn't get to
>        work with my crashdump, but that may be due to its point
>        after/during system shutdown, e.g. ps(1) didn't work.
>
> Still that should give some start for poking around...''
>
>
>   - Hubert
>


-- 
Michael Bubb | Hoboken, NJ | 201.736.0870 | fax 201.377.1717