NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/46096: uvmwait test case sometimes panics kernel



The following reply was made to PR kern/46096; it has been noted by GNATS.

From: Lars Heidieker <lars%heidieker.de@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: 
Subject: Re: kern/46096: uvmwait test case sometimes panics kernel
Date: Wed, 29 Feb 2012 17:50:52 +0100

 On 02/25/2012 05:10 PM, Andreas Gustafsson wrote:
 >> Number:         46096
 >> Category:       kern
 >> Synopsis:       uvmwait test case sometimes panics kernel
 >> Confidential:   no
 >> Severity:       critical
 >> Priority:       high
 >> Responsible:    kern-bug-people
 >> State:          open
 >> Class:          sw-bug
 >> Submitter-Id:   net
 >> Arrival-Date:   Sat Feb 25 16:10:01 +0000 2012
 >> Originator:     Andreas Gustafsson
 >> Release:        NetBSD-current, source date 2012.02.24.19.40.49
 >> Organization:
 >> Environment:
 > System: NetBSD
 > Architecture: i386
 > Machine: i386
 >> Description:
 >
 > There have now been multiple incidents where a kernel panic has
 > occurred while running the uvmwait test case of the rump/rumpkern/t_vm
 > test, for example:
 >
 >    http://releng.netbsd.org/b5reports/i386/build/2012.01.30.12.19.45/test.log
 >    http://www.gson.org/netbsd/bugs/build/build/2012.02.06.17.51.47/test.log
 >    
 > http://www.gson.org/netbsd/bugs/build.i386-debug/build/2012.02.24.19.40.49/test.log
 >
 > The uvmwait test case has been consistently failing since the vmem
 > commits of January 29, which would be worthy of a PR in itself, but
 > this PR is specifically about the kernel panics, not the test failures.
 >
 > Tracking down the problem should be easier than usual, because the
 > latest failure occurred on a test system that was built with full
 > debug symbols (using "build.sh -V MKDEBUG=yes -V DBG=-g"), installed
 > with full source, and run under a new test fixture that automatically
 > archived a full disk image of the failed system, including the kernel
 > crash dump.
 >
 > This disk image is available for downloading at:
 >
 >     http://www.gson.org/netbsd/bugs/i386-debug-2012.02.24.19.40.49.img.gz
 >
 > The compressed image is 832 MB in size and decompresses to 4 GB.
 >
 > To debug the problem while enjoying the comforts of source-level
 > debugging, download and gunzip the disk image, and then boot it with
 >
 >    qemu -snapshot -nographic -hda i386-debug-2012.02.24.19.40.49.img
 >
 > Note that you don't need to be running to i386 port, or even NetBSD,
 > to do this.
 >
 > Log in as root (there is no password).  To help gdb find the kernel
 > sources, type:
 >
 >    mkdir -p /tmp/bracket/build/2012.02.24.19.40.49-i386-debug
 >    ln -s /usr/src /tmp/bracket/build/2012.02.24.19.40.49-i386-debug/src
 >
 > Then type:
 >
 >    cd /var/crash
 >    gunzip netbsd*
 >    gdb /netbsd
 >    target kvm netbsd.0.core
 >    where
 >
 >> How-To-Repeat:
 >
 > Run the ATF tests enough times.  But there should be no need to
 > reproduce the problem since an exceptionally complete set of evidence
 > was collected from the latest crime scene.
 >
 >> Fix:
 >
 >
 
 The address in v is fine but how did pr_itemoffset in the pool struct 
 change to 1...
 
 (gdb)
 #4  0xc099ec2c in pool_get (pp=0xc0f805c0, flags=2)
      at 
 /tmp/bracket/build/2012.02.24.19.40.49-i386-debug/src/sys/kern/subr_pool.c:1113
 1113           KASSERT((((vaddr_t)v + pp->pr_itemoffset) & (pp->pr_align - 1)) 
== 0);
 (gdb) list
 1108                    * a caller's assumptions about interrupt protection, 
etc.
 1109                    */
 1110           }
 1111   
 1112           mutex_exit(&pp->pr_lock);
 1113           KASSERT((((vaddr_t)v + pp->pr_itemoffset) & (pp->pr_align - 1)) 
== 0);
 1114           FREECHECK_OUT(&pp->pr_freecheck, v);
 1115           return (v);
 1116   }
 1117   
 (gdb) print v
 $1 = (void *) 0xc1781000
 (gdb) print *pp
 $2 = {pr_poollist = {tqe_next = 0xc0f80800, tqe_prev = 0xc0f80c80},
    pr_emptypages = {lh_first = 0x0}, pr_fullpages = {lh_first = 
 0xc14b8414},
    pr_partpages = {lh_first = 0xc14b84ec}, pr_curpage = 0xc14b84ec,
    pr_phpool = 0xc0f7849c, pr_cache = 0xc0f805c0, pr_size = 4096,
    pr_align = 4096, pr_itemoffset = 1, pr_minitems = 0, pr_minpages = 0,
    pr_maxpages = 4294967295, pr_npages = 64, pr_itemsperpage = 16,
    pr_slack = 0, pr_nitems = 181, pr_nout = 843, pr_hardlimit = 4294967295,
    pr_refcnt = 0, pr_alloc = 0xc0f79c5c, pr_alloc_list = {tqe_next = 0x0,
      tqe_prev = 0xc0f80858}, pr_drain_hook = 0, pr_drain_hook_arg = 0x0,
    pr_wchan = 0xc0f79c88 "kva-4096", pr_flags = 0, pr_roflags = 3584,
    pr_lock = {u = {mtxa_owner = 1537}}, pr_cv = {cv_opaque = {0x0, 
 0xc0f80638,
        0xc0f79c88}}, pr_ipl = 6, pr_phtree = {sph_root = 0xc14b8798},
    pr_maxcolor = 0, pr_curcolor = 0, pr_phoffset = 0,
    pr_hardlimit_warning = 0x0, pr_hardlimit_ratecap = {tv_sec = 0,
      tv_usec = 0}, pr_hardlimit_warning_last = {tv_sec = 0, tv_usec = 0},
    pr_nget = 2651, pr_nfail = 0, pr_nput = 1808, pr_npagealloc = 73,
    pr_npagefree = 9, pr_hiwat = 68, pr_nidle = 0, pr_log = 0x0,
    pr_curlogentry = 0, pr_logsize = 0, pr_entered_file = 0x0,
    pr_entered_line = 0, pr_reclaimerentry = {ce_q = {tqe_next = 0x0,
        tqe_prev = 0x0}, ce_func = 0, ce_obj = 0x0}, pr_freecheck = 0x0,
    pr_qcache = 0xc0f79c80}
 (gdb)
 
 
 -- 
 ------------------------------------
 
 Mystische Erklärungen:
 Die mystischen Erklärungen gelten für tief;
 die Wahrheit ist, dass sie noch nicht einmal oberflächlich sind.
 
     -- Friedrich Nietzsche
     [ Die Fröhliche Wissenschaft Buch 3, 126 ]
 


Home | Main Index | Thread Index | Old Index