Subject: kern/22005: i386 panic running ifconfig with heavy UDP traffic
To: None <gnats-bugs@gnats.netbsd.org>
From: Douglas Wade Needham <dneedham@naapo.org>
List: netbsd-bugs
Date: 06/27/2003 23:19:36
>Number: 22005
>Category: kern
>Synopsis: System panic while a ifconfig is running
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jun 28 03:20:01 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator: Douglas Wade Needham
>Release: NetBSD 1.6.1 (sources as of Apr 6 09:44:00EDT)
>Organization:
North American Astrophysical Observatory
>Environment:
System: NetBSD bfc0 1.6.1 NetBSD 1.6.1 (BFC) #0: Fri Jun 27 06:53:05 EDT 2003 root@display:/usr/src/sys/arch/i386/compile/BFC i386
Architecture: i386
Machine: i386
Hardware: Multiple Gigabyte GA7VAX with 256MB of RAM using dual NICs.
>Description:
While sustaining an extremely heavy UDP traffic flow (~80Mbps) on
one of two rtk interfaces, the machine will panic. The data on
the interface consists of packets primarily having 4KB of
application data from a radio telescope. Messages end with the
following:
uvm_fault(0xd413f17c, 0x0, 0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 eip c0240526 cs 8 eflags 10286 cr2 1 cpl c0000000
panic: trap
syncing disks... uvm_fault(0xd413f17c, 0x0, 0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 eip c0240526 cs 8 eflags 10286 cr2 1 cpl c0000000
panic: trap
Inspection of the crash dump indicates that an ifconfig is
being run, and that the kernel stack is as follows:
#0 0x1 in ?? ()
#1 0xc029c64b in cpu_reboot (howto=260, bootstr=0x0)
at /usr/src/sys/arch/i386/compile/BFC/../../../../arch/i386/i386/machdep.c:2236
#2 0xc01fc426 in panic ()
at /usr/src/sys/arch/i386/compile/BFC/../../../../kern/subr_prf.c:253
#3 0xc02a36e2 in trap (frame={tf_gs = 16, tf_fs = 16, tf_es = 16, tf_ds = 16,
tf_edi = -1, tf_esi = -757862384, tf_ebp = -736859296, tf_ebx = -1,
tf_edx = -1063837184, tf_ecx = -1064109312, tf_eax = 0, tf_trapno = 6,
tf_err = 0, tf_eip = -1071381210, tf_cs = 8, tf_eflags = 66182,
tf_esp = -1064109312, tf_ss = -1072685334, tf_vm86_es = 4,
tf_vm86_ds = -1071657377, tf_vm86_fs = 10, tf_vm86_gs = 5})
at /usr/src/sys/arch/i386/compile/BFC/../../../../arch/i386/i386/trap.c:231
#4 0xc0100c39 in calltrap ()
#5 0xc0240253 in ipintr ()
at /usr/src/sys/arch/i386/compile/BFC/../../../../netinet/ip_input.c:381
#6 0xc0101fc4 in Xsoftnet ()
#7 0xc029c623 in cpu_reboot (howto=256, bootstr=0x0)
at /usr/src/sys/arch/i386/compile/BFC/../../../../arch/i386/i386/machdep.c:2223
#8 0xc01fc426 in panic ()
at /usr/src/sys/arch/i386/compile/BFC/../../../../kern/subr_prf.c:253
#9 0xc02a36e2 in trap (frame={tf_gs = 16, tf_fs = 16, tf_es = 16, tf_ds = 16,
tf_edi = -1, tf_esi = -757759984, tf_ebp = -736858912, tf_ebx = -1,
tf_edx = -1063837184, tf_ecx = -1063525376, tf_eax = 0, tf_trapno = 6,
tf_err = 0, tf_eip = -1071381210, tf_cs = 8, tf_eflags = 66182,
tf_esp = -1063525376, tf_ss = -1072685282, tf_vm86_es = 4,
tf_vm86_ds = 99747316, tf_vm86_fs = -1454069420, tf_vm86_gs = 23708})
at /usr/src/sys/arch/i386/compile/BFC/../../../../arch/i386/i386/trap.c:231
#10 0xc0100c39 in calltrap ()
#11 0xc0240253 in ipintr ()
at /usr/src/sys/arch/i386/compile/BFC/../../../../netinet/ip_input.c:381
#12 0xc0101fc4 in Xsoftnet ()
#13 0xc0257eef in udp_usrreq (so=0xc096bd24, req=11, m=0x8040691a,
nam=0xd4146ec0, control=0xc08ba42c, p=0xd4173744)
at /usr/src/sys/arch/i386/compile/BFC/../../../../netinet/udp_usrreq.c:963
#14 0xc0227587 in ifioctl (so=0xc096bd24, cmd=2151704858,
data=0xd4146ec0 "rtk1", p=0xd4173744)
at /usr/src/sys/arch/i386/compile/BFC/../../../../net/if.c:1532
#15 0xc0201e08 in soo_ioctl (fp=0xd413009c, cmd=2151704858,
data=0xd4146ec0 "rtk1", p=0xd4173744)
at /usr/src/sys/arch/i386/compile/BFC/../../../../kern/sys_socket.c:139
#16 0xc01ff50d in sys_ioctl (p=0xd4173744, v=0xd4146f80, retval=0xd4146f78)
at /usr/src/sys/arch/i386/compile/BFC/../../../../kern/sys_generic.c:616
#17 0xc02a31cb in syscall_plain (frame={tf_gs = 31, tf_fs = 31, tf_es = 31,
tf_ds = 31, tf_edi = 134716555, tf_esi = -1077945544,
tf_ebp = -1077945684, tf_ebx = 0, tf_edx = 0, tf_ecx = 134783568,
tf_eax = 54, tf_trapno = 3, tf_err = 2, tf_eip = 134692867, tf_cs = 23,
tf_eflags = 663, tf_esp = -1077945824, tf_ss = 31, tf_vm86_es = 0,
tf_vm86_ds = 0, tf_vm86_fs = 0, tf_vm86_gs = 0})
at /usr/src/sys/arch/i386/compile/BFC/../../../../arch/i386/i386/syscall.c:140
#18 0xc0100d06 in syscall1 ()
can not access 0xbfbfdaac, invalid translation (invalid PDE)
can not access 0xbfbfdaac, invalid translation (invalid PDE)
Cannot access memory at address 0xbfbfdaac
The kernel in question is essentially a GENERIC kernel with most
of the unused NICs/HBAs disabled, and APM enabled. This may be a
race condition between the ifconfig and a interrupt. A more
complete crash dump analysis is available at the following URL:
http://cinnion.ka8zrt.com/bfc0_crash_analysis
Panics happen about once or twice a day, and occur both on a
system with Athlon 2400+ and an Athlon 2500+ (Barton). In
addition, the following related deficiences have been noted:
- Large numbers of watchdog timeouts can occur on the interface
handling the data from the telescope. However, none were seen before
the latest panic.
- Process listings using both ps and the xps gdb macro do not return a
valid PPID.
And finally, though of a lesser degree.
- While other OSes (Linux, UnixWare, HP/UX) permit the
transmission of UDP with application data packet sizes of 4KB,
NetBSD does not permit this even though this is perfectly valid
per the RFCs (IP should fragment and reassemble). Yea...not
ideal, but what our main researcher will use in argument for
Linux.
Please send email, and I can get you additional information if
necessary.
>How-To-Repeat:
Subject a system to an extremely heavy UDP/IP load (around 2K
packets/sec), and run ifconfig (exact arguments currently
unknown). It may take the 4KB application payload to get the
fragmentation and trigger the problem, but I suspect all it would
do is amplify the problem, not cause it.
>Fix:
Unknown at this time
>Release-Note:
>Audit-Trail:
>Unformatted: