Subject: kern/27037: ipfilter or ipv6 crash, something to do with fragments, on sparc64
To: None <gnats-bugs@gnats.NetBSD.org>
From: None <carton@Ivy.NET>
List: netbsd-bugs
Date: 09/26/2004 02:50:27
>Number: 27037
>Category: kern
>Synopsis: ipfilter or ipv6 crash, something to do with fragments, on sparc64
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Sep 26 02:51:00 UTC 2004
>Closed-Date:
>Last-Modified:
>Originator: Miles Nordin
>Release: NetBSD 2.0_BETA
>Organization:
>Environment:
System: NetBSD lucette 2.0_BETA NetBSD 2.0_BETA (LUCETTE-$Revision: 1.1 $) #4: Sat Sep 11 13:03:44 EDT 2004 carton@castrovalva:/scratch/src/sys/arch/sparc64/compile/LUCETTE sparc64
Architecture: sparc64
Machine: sparc64
>Description:
in the text below, 3ffe:0bc0:206::/48 is my ipv6 prefix.
Script started on Sun Sep 26 02:09:38 2004
$ sudo cu -l ttyC1 -s 9600
Password:
Connected.
db> db mesg 0t400
ansmit underrun; new threshold: 96/256 bytes
tlp1: transmit underrun; new threshold: 128/512 bytes
tlp0: receive ring overrun
tlp1: transmit underrun; new threshold: 160/1024 bytes
arpresolve: can't allocate llinfo on tlp2 for 127.0.0.1
tlp2: transmit underrun; new threshold: 96/256 bytes
tlp2: transmit underrun; new threshold: 128/512 bytes
panic: lockmgr: no context
kdb breakpoint at 130ba04
db> bt
lockmgr(18468a0, 1, 0, 0, 0, 318c000) at netbsd:lockmgr+0x28c
uvmfault_lookup(e00170a0, 0, e0017948, 0, 3ffe0bc0, 0) at netbsd:uvmfault_lookup
+0x1c0
uvm_fault(1846898, 0, 2, 2, 180c400, 500) at netbsd:uvm_fault+0x6c
data_access_fault(e00172a0, 30, 1044534, 0, 0, 80080d) at netbsd:data_access_fau
lt+0x418
?(0, e00176c0, 4d0, 1, 3, e0017c80) at 0x100871c
fr_coalesce(e00176c0, e00176f0, ffffffffffffffff, e00176e0, 2, e00176e0) at netb
sd:fr_coalesce+0xc
frpr_ipv6hdr(e00176c0, 996fe, 10b1ba0, 0, 2dbc0c17fc0, 1) at netbsd:frpr_ipv6hdr
+0x1b8
fr_makefrip(28, c6ba840, e00176c0, fefefefefefefeff, 12ff564, 2ea400) at netbsd:
fr_makefrip+0x60
fr_checkicmp6matchingstate(e0017a20, 30, ffffffffffffffff, 0, 0, 318c000) at net
bsd:fr_checkicmp6matchingstate+0xdc
fr_stlookup(0, 180c800, e0017948, 0, 3ffe0bc0, 0) at netbsd:fr_stlookup+0x518
fr_checkstate(e0017a20, e0017a1c, e0017a20, 180c400, 180c400, 500) at netbsd:fr_
checkstate+0x27c
fr_check(3358880, 10, 30c0078, 0, e0017ba8, a) at netbsd:fr_check+0x6d4
pfil_run_hooks(189d138, e0017d20, 30c0078, 1, 3, e0017c80) at netbsd:pfil_run_ho
oks+0x54
ip6_input(3358880, fc660000, 3082b80, 0, 7, 512) at netbsd:ip6_input+0xc78
--db_more-- ip6intr(0, 996fe, 10b1ba0, 0, 2dbc0c17fc0, 1) at netbsd:ip6intr+0x54
softnet(1000000, 0, e0017ed0, fefefefefefefeff, 12ff564, 2ea400) at netbsd:softn
et+0x88
sparc64_ipi_flush_all(0, 0, 136ba84, 0, ffffffffffffffff, 0) at netbsd:sparc64_i
pi_flush_all+0x23c
db> ps
PID PPID PGRP UID S FLAGS LWPS COMMAND WAIT
1003 1008 1003 405 2 0x4002 1 ksh ttyin
1008 1005 1005 405 2 0x100 1 sshd select
1005 96 1005 0 2 0x101 1 sshd netio
1428 1079 1428 405 2 0x4002 1 ksh ttyin
1079 609 609 405 2 0x100 1 sshd select
609 96 609 0 2 0x101 1 sshd netio
662 1050 1050 12 2 0x4100 1 pickup select
531 1 531 0 2 0x4002 1 getty ttyin
295 1 295 0 2 0x101 1 bgpd select
172 1 172 0 2 0 1 cron nanosle
1212 1 1212 0 2 0 1 inetd kqread
984 1 984 67 2 0 1 ircd nanosle
658 1 658 0 2 0x101 1 ospf6d select
989 1 989 0 2 0x101 1 ospfd select
227 1050 1050 12 2 0x4100 1 qmgr select
1050 1 1050 0 2 0x4108 1 master select
96 1 96 0 2 0 1 sshd select
913 1 913 0 2 0 1 rtadvd poll
821 1 821 0 2 0 1 rarpd select
893 1 893 15 2 0x100 1 ntpd pause
824 1 824 0 2 0 1 dhcpd select
--db_more-- 537 1 537 0 2 0 1 mount_mfs mfsidl
498 1 498 0 2 0 1 rpcbind poll
471 1 471 14 2 0x500 3 named *
443 1 443 0 2 0 1 ipmon nanosle
382 1 382 0 2 0 1 altqd select
319 1 319 0 2 0 1 syslogd poll
374 1 374 0 2 0x101 1 zebra select
307 1 16 0 2 0x4002 1 choparp select
15 0 0 0 2 0x20200 1 aiodoned aiodone
14 0 0 0 2 0x20200 1 ioflush syncer
13 0 0 0 2 0x20200 1 pagedaemon pgdaemo
12 0 0 0 2 0x20200 1 lfs_writer lfswrit
11 0 0 0 2 0x20200 1 atapibus0 sccomp
10 0 0 0 2 0x20200 1 scsibus1 sccomp
9 0 0 0 2 0x20200 1 scsibus0 sccomp
8 0 0 0 2 0x20200 1 usb1 usbevt
7 0 0 0 2 0x20200 1 atabus1 atath
6 0 0 0 2 0x20200 1 atabus0 atath
5 0 0 0 2 0x20200 1 usbtask usbtsk
4 0 0 0 2 0x20200 1 usb0 usbevt
3 0 0 0 2 0x20200 1 sysmon smtaskq
2 0 0 0 2 0x20200 1 cryptoret crypto_
1 0 1 0 2 0x4000 1 init wait
--db_more-- 0 -1 0 0 2 0x20200 1 swapper schedul
db> reboot
syncing disks... gem1: MAC rx fault, status 3
tlp2: receive ring overrun
tlp1: receive ring overrun
tlp0: receive ring overrun
3 3 2 1 done
rebooting
Res
LOM event: +15d+5h10m35s host reset
etting ...
þ
Netra T1 200 (UltraSPARC-IIe 500MHz), No Keyboard
[...]
Script done on Sun Sep 26 02:17:35 2004
>How-To-Repeat:
mount NFS over IPv4, client is 192.168.3.102 on crashed machine's tlp3
server is 216.158.24.196 on crashed machine's tlp2
crashed machine has a lot of network activity. the NFS mount is the
only thing I know of that I don't usually do, but I can't be absolutely
sure what caused the crash.
excerpts from crashed machine's ipf.conf
# grimalkin nfs
pass in quick on tlp2 proto udp from 216.158.24.196/32 port > 1023 to 192.168.3.102/32 port > 1023
pass in quick on tlp2 proto udp from 216.158.24.196/32 port = nfs to 192.168.3.102/32
pass out quick on tlp2 proto udp from 192.168.3.102/32 to 216.158.24.196/32 with frag
pass in quick on tlp2 proto udp from 216.158.24.196/32 to 192.168.3.102/32 with frag
#
# outgoing only tcp
pass out quick on tlp2 proto tcp from 192.168.0.0/16 to any flags S/SAFR keep state
block out log on tlp2 proto tcp from 192.168.0.0/16 to any
block return-icmp(filter-prohib) in log on tlp2 proto tcp from any to 192.168.0.0/16
#
# outgoing only udp
# hrm... maybe too permissive.
pass out quick on tlp2 proto udp from 192.168.0.0/16 to any keep state
block out log on tlp2 proto udp from 192.168.0.0/16 to any
block return-icmp(filter-prohib) in log on tlp2 proto udp from any to 192.168.0.0/16
#
# for ICMP_INFOTYPE stuff like echo-request, ask to keep state.
# not sure if it works for all these.
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type echo
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type timest
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type inforeq
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type maskreq
pass out quick on tlp2 proto icmp from 192.168.0.0/16 to any icmp-type echo keep state
pass out quick on tlp2 proto icmp from 192.168.0.0/16 to any icmp-type timest keep state
pass out quick on tlp2 proto icmp from 192.168.0.0/16 to any icmp-type inforeq keep state
pass out quick on tlp2 proto icmp from 192.168.0.0/16 to any icmp-type maskreq keep state
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type echorep
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type timestrep
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type inforep
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type maskrep
#
# no redirs.
block in quick on tlp2 proto icmp from any to 192.168.0.0/16 icmp-type redir
#
# to facilitate experimentation, pass what we don't understand.
pass in quick on tlp2 proto icmp from any to 192.168.0.0/16
#
>Fix:
unknown. firewall is a semi-production machine. not sure I can repeat it.
>Release-Note:
>Audit-Trail:
>Unformatted:
I'm running 2.0 BETA 2004-08-15 with the following files upgraded:
netinet/fil.c 1.61.2.7 pr#26666
kern/uipc_mbuf.c 1.80.2.3 pr#26733
sys/mbuf.h 1.90.2.3 pr#26733
netinet/ip_fil_netbsd.c 1.3.2.10 pr#26733
netinet6/raw_ip6.c 1.63.2.2 pr#26733
kern/kern_lock.c 1.75.2.1