Subject: Re: 2.0_BETA weirdness (was Re: 1.6.2 weirdness)
To: Andrey Petrov <petrov@NetBSD.org>
From: ali (Anders Lindgren) <dat94ali@ludat.lth.se>
List: port-sparc64
Date: 09/19/2004 16:02:19
On Fri, 17 Sep 2004, Andrey Petrov wrote:
>
> That's exactly 24648. If you can find what application (ps/l) and under
> what conditions causes that, it would be big help in resolving an issue.
It's happened again. ps/l and some other output included.
It looks like it's a thread context switch in named9 (from pkgsrc) that
panics the machine? I was just running a "make install" in
/usr/pkgsrc/databases/p5-DBI when this happened, but I'm not sure it has
anything to do with disk load; the very first time I don't know what
was going on when it happened. The second time, with my DEBUG kernel,
it was probably doing nothing because it had finished the perl5 build
I started on it. Third time it was building mysql4-server from pkgsrc,
and now the fourth time, it was installing p5-DBI.
I have built several other fairly large packages (apache, perl5) without
triggering this bug, so I'm not going to make guesses... named9 was at
the time running in a chroot jail in /var/chroot/named. As always, the
load on this machine is quite low; rarely uses swap or runs low on memory.
Please let me know if there is anything more I can do to provide useful
information, and let me know if you want me to try to arrange access to
the serial console for you (it's on someone else's box, so I can't promise
anything).
ps/l output:
--------8<--------8<--------
db> ps/l
PID LID S FLAGS STRUCT LWP * UAREA * WAIT
9399 1 2 0x4 0x64b3320 0x6114000
20704 1 3 0x84 0x64b2e20 0x55a8000 wait
6441 1 3 0x84 0x64b3220 0x6118000 wait
11031 1 3 0x80 0x64b2520 0x6286000 wait
6651 1 3 0x80 0x64b2720 0x628c000 wait
673 1 3 0x80 0x64b2d20 0x626a000 wait
2290 1 3 0x80 0x64b2820 0x61e8000 wait
24084 7 3 0x30900084 0x64b2420 0x6220000 sawait
5 3 0xd00084 0x64b2f20 0x60dc000 select
3 3 0xd00084 0x64b2120 0x655a000 select
4 3 0xd00080 0x64b2020 0x60d8000 select
2 3 0xd00080 0x64b2b20 0x7bf0000 sigwait
27565 1 3 0x80 0x64b2220 0x66fa000 wait
28478 1 3 0x80 0x5e93100 0x7ea8000 pause
13482 1 3 0x80 0x5e93400 0x7ea0000 pause
9631 1 3 0x84 0x5e92d00 0x7ece000 select
25317 1 3 0x80 0x5e92700 0x7ed2000 netio
27583 1 3 0x80 0x5e93d00 0x6766000 netcon
18151 1 3 0x80 0x5e93c00 0x6718000 netcon
18530 1 3 0x80 0x5e93700 0x6750000 netcon
23673 1 3 0x80 0x5e93a00 0x6770000 netcon
16125 1 3 0x80 0x5e93600 0x6738000 netcon
22667 1 3 0x80 0x5e92600 0x6724000 netcon
17033 1 3 0x80 0x5e93b00 0x675c000 netcon
12776 1 3 0x80 0x5e93900 0x671c000 netcon
26954 1 3 0x80 0x5e93e00 0x6756000 netcon
21658 1 3 0x80 0x5e93500 0x65ce000 select
24542 1 3 0x84 0x5e92f00 0x6744000 select
23086 1 3 0x80 0x5e92e00 0x66bc000 select
>16856 5 8 0x900004 0x64b2320 0x60d4000
> 2 3 0x4900004 0x5e92a00 0x6710000 biowait
3 3 0xd00080 0x5e93300 0x667e000 select
1 3 0x500080 0x5e93200 0x6670000 sigwait
5006 1 3 0x80 0x5e92500 0x5eb8000 ttyin
269 1 3 0x84 0x5e92c00 0x63d6000 nanosleep
264 1 3 0x80 0x5e92b00 0x607e000 kqread
208 1 3 0x80 0x5e92900 0x606e000 select
120 1 3 0x80 0x5e92800 0x606a000 poll
5 1 3 0x4 0x5e92400 0x5ea6000 aiodoned
4 1 3 0x4 0x5e92300 0x5ea2000 syncer
3 1 3 0x4 0x5e92200 0x5e9e000 pgdaemon
2 1 3 0x4 0x5e92100 0x5e94000 sccomp
1 1 3 0x84 0x5e92000 0x5e80000 wait
0 1 3 0x4 0x1890eb8 0x1c02000 scheduler
--------8<--------8<--------
Output from "ps" command:
--------8<--------8<--------
db> ps
PID PPID PGRP UID S FLAGS LWPS COMMAND WAIT
9399 20704 2290 0 2 0x4002 1 perl
20704 6441 2290 0 2 0x4002 1 sh wait
6441 11031 2290 0 2 0x4002 1 make wait
11031 6651 2290 0 2 0x4002 1 sh wait
6651 673 2290 0 2 0x4002 1 make wait
673 2290 2290 0 2 0x4002 1 sh wait
2290 28478 2290 0 2 0x4002 1 make wait
24084 27565 3613 1001 2 0x4502 5 mysqld *
27565 1 3613 0 2 0x4002 1 sh wait
28478 13482 28478 0 2 0x4002 1 ksh pause
13482 9631 13482 100 2 0x4002 1 ksh pause
9631 25317 25317 100 2 0x100 1 sshd select
25317 208 25317 0 2 0 1 sshd netio
27583 24542 24542 1000 2 0x100 1 httpd netcon
18151 24542 24542 1000 2 0x100 1 httpd netcon
18530 24542 24542 1000 2 0x100 1 httpd netcon
23673 24542 24542 1000 2 0x100 1 httpd netcon
16125 24542 24542 1000 2 0x100 1 httpd netcon
22667 24542 24542 1000 2 0x100 1 httpd netcon
17033 24542 24542 1000 2 0x100 1 httpd netcon
12776 24542 24542 1000 2 0x100 1 httpd netcon
26954 21658 24542 1000 2 0x4000 1 perl netcon
21658 24542 24542 1000 2 0x100 1 httpd select
24542 1 24542 0 2 0 1 httpd select
23086 1 23086 8 2 0x100 1 exim-4.42-1 select
>16856 1 16856 14 2 0x500 4 named *
5006 1 5006 0 2 0x4002 1 ksh ttyin
269 1 269 0 2 0 1 cron nanosle
264 1 264 0 2 0 1 inetd kqread
208 1 208 0 2 0 1 sshd select
120 1 120 0 2 0 1 syslogd poll
5 0 0 0 2 0x20200 1 aiodoned aiodone
4 0 0 0 2 0x20200 1 ioflush syncer
3 0 0 0 2 0x20200 1 pagedaemon pgdaemo
2 0 0 0 2 0x20200 1 scsibus0 sccomp
1 0 1 0 2 0x4000 1 init wait
0 -1 0 0 2 0x20200 1 swapper schedul
--------8<--------8<--------
The "bt" output again, for reference.
--------8<--------8<--------
db> bt
ltsleep(1c64200, 204, 11bc680, 0, 6058bc8, 1891800) at netbsd:ltsleep+0x664
uvmfault_anonget(0, 5eb1ce0, 6058bc0, 11bc400, 1897800, 0) at netbsd:uvmfault_anonget+0x13c
uvm_fault(6000, 42000000, ffffffffffffffff, 6712da0, 1, 3) at netbsd:uvm_fault+0x420
data_access_fault(6712f90, 30, 100b0d4, 41ffe2a1, 11081f, 1814c00) at netbsd:data_access_fault+0x2d4
?(67100a8, 41ffebc0, 78, 6710000, 100b168, 1ca3200) at 0x1008e40
rwindow_save(41ffebc0, 0, 0, 620e050, 3e2a000, 2000) at netbsd:rwindow_save+0xcc
cpu_getmcontext(5e92a00, 620e080, 620e040, 1e74750, 6713440, 1891800) at netbsd:cpu_getmcontext+0x10
sa_upcall_getstate(620e040, 5e92a00, 372, 11b2568, 620e000, 64b2348) at netbsd:sa_upcall_getstate+0x10
sa_upcall0(620e000, 2, 5e92a00, 0, 0, 0) at netbsd:sa_upcall0+0x98
sa_switch(6083960, 6078000, 1ca, 11b84c0, 1, 1891800) at netbsd:sa_switch+0x2a4
ltsleep(212e230, 11, 0, 0, 212e240, 0) at netbsd:ltsleep+0x560
biowait(212e230, 212e230, 3, 11b2568, 67139cc, 1891800) at netbsd:biowait+0x50
uvm_swap_io(0, cfb0, 1, 100000, 1897990, 1ca3200) at netbsd:uvm_swap_io+0x1a8
uvm_swap_get(5, 11bf000, 1, 11bc4a8, 46, 0) at netbsd:uvm_swap_get+0x5c
uvmfault_anonget(6713ca0, 5eb1ce0, 6058bc0, 11bc400, 1897800, 0) at netbsd:uvmfault_anonget+0x5ac
uvm_fault(6000, 42000000, ffffffffffffffff, 6713ce0, 1, 3) at netbsd:uvm_fault+0x420
data_access_fault(6713ed0, 30, 41307250, 41ffe2a1, 11081d, 1814c00) at netbsd:data_access_fault+0x2d4
?(44200000, 44200000, badcafe, badcafe, badcafe, badcafe) at 0x1008e40
ddb_regs(ffffffffffffffff, ffffffffffffffff, ffffffffffffffff,
ffffffffffffffff, ffffffffffffffff, ffffffffffffffff) at 0x4130c688
--------8<--------8<--------
... and the lock complaints while trying to reboot (Reboot hangs,
but serial console is functional enough to break into OFW and
reset the box from there):
--------8<--------8<--------
db> reboot
syncing disks...
simple_lock: lock held
lock: 0x1810af8, currently at: ../../../../kern/kern_synch.c:1159
last locked: ../../../../kern/kern_synch.c:421
last unlocked: ../../../../kern/kern_synch.c:658
hme0: status=30001<GOTFRAME,RXTOHOST,NORXD>
pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4
pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4
pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4
pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4
pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4
pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4
pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4
pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4
pool_get(PR_WAITOK) with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_f4
switching with held simple_lock 0x6058bc8 CPU 0
../../../../uvm/uvm_fault.c:1034
--------8<--------8<--------
--
/ali
:wq