Re: kern/52263: Frequent ixg(4) panic

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost,Hauke Fath <hf%spg.tu-darmstadt.de@localhost>
Subject: Re: kern/52263: Frequent ixg(4) panic
From: Masanobu SAITOH <msaitoh%execsw.org@localhost>
Date: Thu, 10 Aug 2017 04:55:00 +0000 (UTC)

The following reply was made to PR kern/52263; it has been noted by GNATS.

From: Masanobu SAITOH <msaitoh%execsw.org@localhost>
To: gnats-bugs%NetBSD.org@localhost, kern-bug-people%netbsd.org@localhost,
 gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Cc: msaitoh%execsw.org@localhost
Subject: Re: kern/52263: Frequent ixg(4) panic
Date: Thu, 10 Aug 2017 13:54:32 +0900

 Hi.
 
 On 2017/05/31 1:05, Hauke Fath wrote:
 >> Number:         52263
 >> Category:       kern
 >> Synopsis:       Frequent ixg(4) panic in ixgbe_rxeof()
 >> Confidential:   no
 >> Severity:       critical
 >> Priority:       high
 >> Responsible:    kern-bug-people
 >> State:          open
 >> Class:          sw-bug
 >> Submitter-Id:   net
 >> Arrival-Date:   Tue May 30 16:05:00 +0000 2017
 >> Originator:     Hauke Fath
 >> Release:        NetBSD 7.99.73
 >> Organization:
 > Technische Universitaet Darmstadt
 >> Environment:
 > 	
 > 	
 > System: NetBSD Zinnenwand 7.99.73 NetBSD 7.99.73 (FIFI-$Revision$) #0: Mon May 29 17:00:08 CEST 2017 hf@Hochstuhl:/var/obj/netbsd-builds/developer/amd64/sys/arch/amd64/compile/FIFI amd64
 > Architecture: x86_64
 > Machine: amd64
 >> Description:
 > 
 > 	A pr & carp router under current (7.99.73 here, but happens in
 > 	yesterday's .75, too) panics frequently with
 > 
 > NetBSD 7.99.73 (FIFI-$Revision$) #2: Fri May 26 15:51:24 CEST 2017
 >          hf@Hochstuhl:/var/obj/netbsd-builds/developer/amd64/sys/arch/amd64/compile/FIFI
 > 
 > [...]
 > 
 > fatal protection fault in supervisor mode
 > trap type 4 code 0 rip 0xffffffff8029646d cs 0x8 rflags 0x10202 cr2 0xffff80008f799000 ilevel 0x8 rsp 0xfffffe810e8aeeb0
 > curlwp 0xfffffe810e89d4c0 pid 0.18 lowest kstack 0xfffffe810e8ab2c0
 > panic: trap
 > cpu1: Begin traceback...
 > vpanic() at netbsd:vpanic+0x140
 > snprintf() at netbsd:snprintf
 > trap() at netbsd:trap+0xbab
 > --- trap (number 4) ---
 > ixgbe_rxeof() at netbsd:ixgbe_rxeof+0x523
 > ixgbe_handle_que() at netbsd:ixgbe_handle_que+0x98
 > softint_dispatch() at netbsd:softint_dispatch+0xd4
 > DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e8aeff0
 > Xsoftintr() at netbsd:Xsoftintr+0x4f
 > --- interrupt ---
 > 0:
 > cpu1: End traceback...
 > rebooting...
 > 
 > 	According to objdump(1) probing, the relevant instruction is
 > 	at sys/dev/pci/ixgbe/ix_txrx.c:1933
 > 
 >     1922                         /*
 >     1923                          * Optimize.  This might be a small packet,
 >     1924                          * maybe just a TCP ACK.  Do a fast copy that
 >     1925                          * is cache aligned into a new mbuf, and
 >     1926                          * leave the old mbuf+cluster for re-use.
 >     1927                          */
 >     1928                         if (eop && len <= IXGBE_RX_COPY_LEN) {
 >     1929                                 sendmp = m_gethdr(M_NOWAIT, MT_DATA);
 >     1930                                 if (sendmp != NULL) {
 >     1931                                         sendmp->m_data +=
 >     1932                                             IXGBE_RX_COPY_ALIGN;
 >     1933                                         ixgbe_bcopy(mp->m_data,
 >     1934                                             sendmp->m_data, len);
 >     1935                                         sendmp->m_len = len;
 >     1936                                         rxr->rx_copies.ev_count++;
 >     1937                                         rbuf->flags |= IXGBE_RX_COPY;
 >     1938                                 }
 >     1939                         }
 > 
 > 	I tried to KASSERT() for zero pointers, but it wasn't that
 > 	easy.
 > 
 > 	Sometimes I also see
 > 
 > fatal protection fault in supervisor mode
 > trap type 4 code 0 rip 0xffffffff8061e443 cs 0x8 rflags 0x10202 cr2 0x6b1e00 ilevel 0x4 rsp 0xfffffe810e913ef0
 > curlwp 0xfffffe810e904540 pid 0.30 lowest kstack 0xfffffe810e9102c0
 > panic: trap
 > cpu3: Begin traceback...
 > vpanic() at netbsd:vpanic+0x140
 > snprintf() at netbsd:snprintf
 > trap() at netbsd:trap+0xbab
 > --- trap (number 4) ---
 > ether_input() at netbsd:ether_input+0x83
 > if_percpuq_softint() at netbsd:if_percpuq_softint+0x5b
 > softint_dispatch() at netbsd:softint_dispatch+0xd4
 > DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e913ff0
 > Xsoftintr() at netbsd:Xsoftintr+0x4f
 > --- interrupt ---
 > f557b81a7cde3fa1:
 > cpu3: End traceback...
 > rebooting...
 > 
 > 
 >> How-To-Repeat:
 > 
 > 	Run serious amounts of traffic over an ixg(4) equipped pf/carp
 > 	router machine - 9 vlans here.
 
   Does this problem still occur?
 
 I suspect this is not ixg(4)'s bug but pf's bug.
 Have you ever tested without pf?
 
   The following change avoid using the optimization, but
 it won't solve your machine's proble,
 
 ------------------
 Index: ix_txrx.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ix_txrx.c,v
 retrieving revision 1.27
 diff -u -p -r1.27 ix_txrx.c
 --- ix_txrx.c	13 Jun 2017 09:37:22 -0000	1.27
 +++ ix_txrx.c	10 Aug 2017 04:40:59 -0000
 @@ -1915,6 +1915,7 @@ ixgbe_rxeof(struct ix_queue *que)
   			 * is cache aligned into a new mbuf, and
   			 * leave the old mbuf+cluster for re-use.
   			 */
 +#if 0
   			if (eop && len <= IXGBE_RX_COPY_LEN) {
   				sendmp = m_gethdr(M_NOWAIT, MT_DATA);
   				if (sendmp != NULL) {
 @@ -1927,6 +1928,7 @@ ixgbe_rxeof(struct ix_queue *que)
   					rbuf->flags |= IXGBE_RX_COPY;
   				}
   			}
 +#endif
   			if (sendmp == NULL) {
   				rbuf->buf = rbuf->fmp = NULL;
   				sendmp = mp;
 ------------------
 
 
 > 	Happens once every few hours here, so I can provide details,
 > 	and/or try things easily.
 > 	
 > 	
 >> Fix:
 > 	I'd love to.
 > 
 > 	
 > 
 >> Unformatted:
 >   	
 >   	
 > 
 
 
 -- 
 -----------------------------------------------
                  SAITOH Masanobu (msaitoh%execsw.org@localhost
                                   msaitoh%netbsd.org@localhost)

Prev by Date: Re: kern/52263: Frequent ixg(4) panic
Next by Date: Re: kern/52263: Frequent ixg(4) panic
Previous by Thread: Re: kern/52263: Frequent ixg(4) panic
Next by Thread: Re: kern/52263: Frequent ixg(4) panic
Indexes:

Home | Main Index | Thread Index | Old Index