Subject: Re: kern/10016: ddb can get stuck in infinite uvm_fault()ing
To: None <gnats-bugs@gnats.netbsd.org>
From: John Hawkinson <jhawk@mit.edu>
List: netbsd-bugs
Date: 05/04/2000 15:06:59
OK, I feel like I'm missing something with respect to how this is supposed
to work.
uvm_fault() is faulting inside db_read_bytes(), because the
code segment is invalid:
_db_read_bytes+0x10: movb 0(%ecx),%al
db> t
_db_read_bytes(0,4,c0548cf8,c04dab64,c0548d38) at _db_read_bytes+0x10
_db_get_value(0,4,0,5,0) at _db_get_value+0x18
_db_stop_at_pc(c04dab64,c0548d38) at _db_stop_at_pc+0x1c2
_db_trap(5,0,1,ffffffff,c0548e30) at _db_trap+0x39
_kdb_trap(5,0,c0548dac) at _kdb_trap+0xb4
_trap() at _trap+0x168
--- trap (number 5) ---
(null)(ff2) at 0
_pnpbios_getnode(1,c0548e30,c06b9600,d2) at _pnpbios_getnode+0x6a
_pnpbios_attach(c06bdfc0,c06afe80,c0548ed4,c06afe80,c0548ed4) at _pnpbios_attach
+0x223
_config_attach(c06bdfc0,c0458fb8,c0548ed4,c031d7b8,c06bdfc0) at _config_attach+0
x30a
_config_found_sm(c06bdfc0,c0548ed4,c031d7b8,0) at _config_found_sm+0x29
_mainbus_attach(0,c06bdfc0,0,c06bdfc0,0) at _mainbus_attach+0x41
_config_attach(0,c0458164,0,0,c04b9694) at _config_attach+0x30a
_config_rootfound(c02f86c9,0) at _config_rootfound+0x37
_cpu_configure(bfeff000,c0548fa8,c01a202b,c0546010,546000) at _cpu_configure+0x1
e
_configure(c0546010,546000,54d000,90,500007ff) at _configure+0x5a
_main(0,0,0,0,0) at _main+0x317
db> show reg
es 0x160010
ds 0x10
edi 0x5
esi 0x4
ebp 0xc0548cdc _end+0x6bdbc
ebx 0xc0548cf9 _end+0x6bdd9
edx 0x2
ecx 0x1
eax 0x3
eip 0xc02fbb14 _db_read_bytes+0x10
cs 0x8
eflags 0x10006
esp 0xc0548cd8 _end+0x6bdb8
ss 0xc04d0010 _playstats+0x6270
So one bug appears to be that ddb needs to do some checking before
blindly assuming it can deal here. 32/16 etc./ugh. Ignoring that
for the moment...
A page fault is incurred, handled by the i386 trap(), which calls
uvm_fault() which fails, and so we print the message, and loop to the
"we_re_toast:" label. Said label special-cases DDB and KGDB (for
breakpoint handling?) and in the ddb case, invokes kdb_trap() and then
returns, rather than panic()ing.
[ aside: I find it ironic that if you've built with DDB enabled
you don't see the trap data that you'd otherwise get printf()d out
before the panic("trap") [i.e. arch/i386/i386/trap.c ll 301-307],
but if you haven't built with DEBUG you cannot enable the display
of this upon entry to trap() via setting "trapdebug". So if you
build without DDB and without DEBUG, you get it, but if you build
with DDB and without debug, you don't, but if you build with DEBUG
and set trapdebug, you do get it. It seems odd, since one presumes
the normal case for people trying to debug things is they have
kernels build with DDB (for ocassional debugging) but not with
DEBUG set.
I don't really understand why the "if (trapdebug)" is
conditionalized on DEBUG -- it does not seem like one cmpl upon
every trap() invokation is particularly expensive, so I'd be
curious if there's objection to removing the #ifdef DEBUG (ll
262).
]
In any event, kdb_trap() (in db_interface.c) prints the trap type
information and then calls the MI db_trap(). db_trap() calls
db_stop_at_pc() to see if there's a ddb breakpoint in effect (or an
"until", etc.), and if not, calls db_restart_at_pc().
For the case of a pagefault, this causes an infinite loop since
the pagefault didn't happen on a breakpoint, and db_restart_at_pc() just
results in trying to re-execute the instruction which triggers the
pagefault. There's no opportunity for user interaction.
I'm unclear on whether I'm misunderstanding the sequence of events
of it they're just that broken :-(.
The following workaround seems to be helpful:
Index: db_trap.c
===================================================================
RCS file: /cvsroot/syssrc/sys/ddb/db_trap.c,v
retrieving revision 1.14
diff -c -3 -2 -p -r1.14 db_trap.c
*** db_trap.c 1999/04/12 20:38:21 1.14
--- db_trap.c 2000/05/04 18:54:23
***************
*** 47,82 ****
--- 47,85 ----
void
db_trap(type, code)
int type, code;
{
boolean_t bkpt;
boolean_t watchpt;
bkpt = IS_BREAKPOINT_TRAP(type, code);
watchpt = IS_WATCHPOINT_TRAP(type, code);
if (db_stop_at_pc(DDB_REGS, &bkpt)) {
if (db_inst_count) {
db_printf("After %d instructions (%d loads, %d stores),\n",
db_inst_count, db_load_count, db_store_count);
}
if (curproc != NULL) {
if (bkpt)
db_printf("Breakpoint in %s at\t", curproc->p_comm);
else if (watchpt)
db_printf("Watchpoint in %s at\t", curproc->p_comm);
else
db_printf("Stopped in %s at\t", curproc->p_comm);
} else if (bkpt)
db_printf("Breakpoint at\t");
else if (watchpt)
db_printf("Watchpoint at\t");
else
db_printf("Stopped at\t");
db_dot = PC_REGS(DDB_REGS);
db_print_loc_and_inst(db_dot);
db_command_loop();
+ } else if (type==T_PAGEFLT) {
+ db_printf("pagefault stop\t");
+ db_command_loop();
}
db_restart_at_pc(DDB_REGS, watchpt);
}
it certainly prevents the endless looping of uvm_fault()s
and allowed me to get the above stack trace.
I guess it's probably not legit to use "T_PAGEFLT" in MI code.
I suppose I could define IS_PAGEFAULT_TRAP() in the MD db_machdep.h
and then #ifdef the db_trap() code on IS_PAGEFAULT_TRAP().
But I'm not convinced that the code is in-principal correct (i.e.
ignoring the prev. paragraph's issue). I suppose it's certainly better
than the status quo.
Feedback would be appreciated.
--jhawk