Subject: port-alpha/28361: bge(4) locks up on AlphaServer ES40 when any significant traffic is transmitted
To: None <port-alpha-maintainer@netbsd.org, gnats-admin@netbsd.org,>
From: Greg A. Woods <woods@weird.com>
List: netbsd-bugs
Date: 11/19/2004 21:05:01
>Number: 28361
>Category: port-alpha
>Synopsis: bge(4) locks up on AlphaServer ES40 when any significant traffic is transmitted
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: port-alpha-maintainer
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Nov 19 21:05:00 +0000 2004
>Originator: Greg A. Woods
>Release: NetBSD-current (2.99.10) 2004/11/15
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Environment:
System: NetBSD 2.99.10 NetBSD 2.99.10 (TSUNAMI) #0: Thu Nov 18 15:27:36 EST 2004 root@woffi.planix.com:/m5/netbsd-current/src/sys/arch/alpha/compile/obj.alpha/TSUNAMI alpha
Architecture: alpha
Machine: alpha
>Description:
The bge(4) driver, when used with a DEGXA-TX card on an
AlphaServer ES40, seems to lock up and go catatonic when one
attempts to send or receive any significant amount of traffic
through it.
It will work sufficiently for a basic "ping", but it dies
immediately (i.e. seemingly on the first packet) with the likes
of "ttcp" or "ping -f".
Given the "netstat -i|-I" results it appears the interface
receives some/all of the packets but they're never handed to the
application.
Note also that it spits out some warnings on the console when it
is first configured: "bge0: pcistate failed to revert"
Also, attempts to ifconfig the wm0 card after the bge0 device is
hung result in the following panic:
[console]<@> # ifconfig wm0 inet 10.11.11.2 netmask 255.255.255.0 up
wm0: unable to load rx DMA map 1, error = 35
panic: wm_add_rxbuf
Stopped in pid 49.1 (ifconfig) at netbsd:cpu_Debugger+0x4: ret zero,(ra)
db> trace
cpu_Debugger() at netbsd:cpu_Debugger+0x4
panic() at netbsd:panic+0x1f8
wm_add_rxbuf() at netbsd:wm_add_rxbuf+0x4dc
wm_init() at netbsd:wm_init+0x490
ether_ioctl() at netbsd:ether_ioctl+0xac
wm_ioctl() at netbsd:wm_ioctl+0x90
ifioctl() at netbsd:ifioctl+0x434
soo_ioctl() at netbsd:soo_ioctl+0xf8
sys_ioctl() at netbsd:sys_ioctl+0x12c
syscall_plain() at netbsd:syscall_plain+0xc4
XentSys() at netbsd:XentSys+0x5c
--- syscall (54) ---
--- user mode ---
db>
About this Jason Thorpe speculated:
>
> The bge driver is probably
> gobbling up all of the SGMAP resources...
I can make lots of other information available, and I can grant
temporary console access to this machine to anyone who can help
fix the bug!
I do need the fix to work on NetBSD-1.6.x, but for now I can
test any kernel that'll work with a 1.6 userland (and if really
necessary I could do a full re-install on one or more of the
other currently unused drives).
>How-To-Repeat:
P00>>>boot -file netbsd-cur dkc100
(boot dkc100.1.0.102.0 -file netbsd-cur -flags A)
block 0 of dkc100.1.0.102.0 is a valid boot block
reading 15 blocks from dkc100.1.0.102.0
bootstrap code read in
base = 200000, image_start = 0, image_bytes = 1e00(7680)
initializing HWRPB at 2000
initializing page table at 3fb54000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code
NetBSD/alpha 1.6.2_STABLE FFS Primary Bootstrap
Jumping to entry point...
NetBSD/alpha 1.6.2_STABLE Secondary Bootstrap, Revision 1.13
(woods@building, Wed Sep 22 19:07:04 EDT 2004)
VMS PAL rev: 0x4006800010162
OSF PAL rev: 0x400690002015c
Switch to OSF PAL code succeeded.
Boot file: netbsd-cur
Boot flags: A
3670688+386744 [221112+137599]=0x436790
Entering netbsd-cur at 0xfffffc00003012e0...
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 2.99.10 (TSUNAMI) #0: Thu Nov 18 15:27:36 EST 2004
root@woffi.planix.com:/m5/netbsd-current/src/sys/arch/alpha/compile/obj.alpha/TSUNAMI
AlphaServer ES40, 666MHz, s/n NI94900217
8192 byte page size, 4 processors.
total memory = 16384 MB
(7080 KB reserved for PROM, 16377 MB used by NetBSD)
avail memory = 16088 MB
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21264A-14
cpu0: Architecture extensions: 307<PAT,MVI,CIX,FIX,BWX>
cpu1 at mainbus0: ID 1, 21264A-14
cpu1: processor off-line; multiprocessor support not present in kernel
cpu2 at mainbus0: ID 2, 21264A-14
cpu2: processor off-line; multiprocessor support not present in kernel
cpu3 at mainbus0: ID 3, 21264A-14
cpu3: processor off-line; multiprocessor support not present in kernel
tsc0 at mainbus0: 21272 Core Logic Chipset, Cchip rev 0
tsc0: 8 Dchips, 2 memory buses of 32 bytes
tsc0: arrays present: 4096MB (split), 4096MB (split), 4096MB (split), 4096MB (split), Dchip 0 rev 1
tsp0 at tsc0
tsp0: window 2: 0/base 3ff00000/mask 5300000 reinitialized
tsp0: window 3: 0/base fff00000/mask 5800000 reinitialized
pci0 at tsp0 bus 0
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
vga0 at pci0 dev 1 function 0: ATI 3D Rage II+ (rev. 0x9a)
wsdisplay0 at vga0 kbdmux 1
wsmux1: connecting to wsdisplay0
ahc0 at pci0 dev 2 function 0: Adaptec 3960D Ultra160 SCSI adapter
ahc0: interrupting at dec 6600 irq 12
ahc0: aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus0 at ahc0: 16 targets, 8 luns per target
ahc1 at pci0 dev 2 function 1: Adaptec 3960D Ultra160 SCSI adapter
ahc1: interrupting at dec 6600 irq 13
ahc1: aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
scsibus1 at ahc1: 16 targets, 8 luns per target
isp0 at pci0 dev 3 function 0: QLogic Dual Port FC-AL and 2Gbps Fabric HBA
isp0: interrupting at dec 6600 irq 16
isp0: bad execution throttle of 0- using 16
scsibus2 at isp0: 256 targets, 8 luns per target
tlp0 at pci0 dev 4 function 0: DECchip 21143 Ethernet, pass 3.0
tlp0: interrupting at dec 6600 irq 20
tlp0: DEC DE500-BA, Ethernet address 08:00:2b:c4:b5:26
tlp0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
sio0 at pci0 dev 7 function 0: Acer Labs M1543 PCI-ISA Bridge (rev. 0xc3)
Acer Labs M5229 UDMA IDE Controller (IDE mass storage, interface 0xfa, revision 0xc1) at pci0 dev 15 function 0 not configured
Acer Labs M5237 USB 1.1 Host Controller (USB serial bus, interface 0x10, revision 0x03) at pci0 dev 19 function 0 not configured
isa0 at sio0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
isabeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
mcclock0 at isa0 port 0x70-0x71: mc146818 or compatible
tsp1 at tsc0
tsp1: window 2: 0/base 3ff00000/mask 5200000 reinitialized
tsp1: window 3: 0/base fff00000/mask 5400000 reinitialized
pci1 at tsp1 bus 0
pci1: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
wm0 at pci1 dev 1 function 0: Intel i82542 1000BASE-X Ethernet, rev. 3
wm0: interrupting at dec 6600 irq 24
wm0: Ethernet address 00:d0:b7:82:33:b0
wm0: 1000baseSX, 1000baseSX-FDX, auto
isp1 at pci1 dev 2 function 0: QLogic Dual Port FC-AL and 2Gbps Fabric HBA
isp1: interrupting at dec 6600 irq 28
isp1: bad execution throttle of 0- using 16
scsibus3 at isp1: 256 targets, 8 luns per target
esiop0 at pci1 dev 4 function 0: Symbios Logic 53c895 (ultra2-wide scsi)
esiop0: using on-board RAM
esiop0: interrupting at dec 6600 irq 36
scsibus4 at esiop0: 16 targets, 8 luns per target
bge0 at pci1 dev 6 function 0: Broadcom BCM5703X Gigabit Ethernet
bge0: interrupting at dec 6600 irq 44
bge0: ASIC BCM5703 A2 (0x1002), Ethernet address 00:08:02:91:89:ae
ukphy0 at bge0 phy 1: Generic IEEE 802.3u media interface
ukphy0: BCM5703 1000BASE-T media interface (OUI 0x001018, model 0x0016), rev. 2
ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
scsibus0: waiting 2 seconds for devices to settle...
scsibus1: waiting 2 seconds for devices to settle...
scsibus2: waiting 2 seconds for devices to settle...
scsibus3: waiting 2 seconds for devices to settle...
scsibus4: waiting 2 seconds for devices to settle...
cd0 at scsibus0 target 4 lun 0: <TOSHIBA, CD-ROM XM-5701TA, 0557> cdrom removable
cd0: sync (100.00ns offset 8), 8-bit (10.000MB/s) transfers
sd0 at scsibus1 target 0 lun 0: <COMPAQ, BF01864663, 3B07> disk fixed
sd0: 17365 MB, 7001 cyl, 20 head, 254 sec, 512 bytes/sect x 35565080 sectors
sd0: sync (25.00ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
sd1 at scsibus1 target 1 lun 0: <COMPAQ, BF01864663, 3B07> disk fixed
sd1: 17365 MB, 7001 cyl, 20 head, 254 sec, 512 bytes/sect x 35565080 sectors
sd1: sync (25.00ns offset 63), 16-bit (80.000MB/s) transfers, tagged queueing
sd2 at scsibus1 target 2 lun 0: <COMPAQ, BF03685A35, HPB7> disk fixed
sd2: 34732 MB, 31310 cyl, 4 head, 567 sec, 512 bytes/sect x 71132000 sectors
sd2: sync (12.50ns offset 63), 16-bit (160.000MB/s) transfers, tagged queueing
sd3 at scsibus1 target 3 lun 0: <COMPAQ, BF03685A35, HPB7> disk fixed
sd3: 34732 MB, 31310 cyl, 4 head, 567 sec, 512 bytes/sect x 71132000 sectors
sd3: sync (12.50ns offset 63), 16-bit (160.000MB/s) transfers, tagged queueing
sd4 at scsibus1 target 4 lun 0: <COMPAQ, BF03685A35, HPB7> disk fixed
sd4: 34732 MB, 31310 cyl, 4 head, 567 sec, 512 bytes/sect x 71132000 sectors
sd4: sync (12.50ns offset 63), 16-bit (160.000MB/s) transfers, tagged queueing
sd5 at scsibus2 target 1 lun 0: <APPLE, Xserve RAID, 1.21> disk fixed
sd5: 1402 GB, 179526 cyl, 128 head, 128 sec, 512 bytes/sect x 2941353984 sectors
sd6 at scsibus3 target 1 lun 0: <APPLE, Xserve RAID, 1.21> disk fixed
sd6: 1402 GB, 179526 cyl, 128 head, 128 sec, 512 bytes/sect x 2941353984 sectors
sd2: no disk label
sd3: no disk label
sd4: no disk label
root on sd1a dumps on sd1b
root file system type: ffs
WARNING: preposterous clock chip time
-- CHECK AND RESET THE DATE!
/etc/rc.conf is not configured. Multiuser boot aborted.
N O T I C E : Please do not use the console except to run shutdown!
We recommend creating a non-root account and using su(1) for root access.
Terminal type is wsvt25m.
chmod: /tmp: Read-only file system
We recommend creating a non-root account and using su(1) for root access.
[console]<@> # uname -a
NetBSD 2.99.10 NetBSD 2.99.10 (TSUNAMI) #0: Thu Nov 18 15:27:36 EST 2004 root@woffi.planix.com:/m5/netbsd-current/src/sys/arch/alpha/compile/obj.alpha/TSUNAMI alpha
[console]<@> # ifconfig bge0
bge0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> mtu 1500
capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
enabled=0<>
address: 00:08:02:91:89:ae
media: Ethernet autoselect (1000baseT full-duplex,master)
status: active
[console]<@> # ifconfig bge0 inet 10.10.10.2 netmask 255.255.255.0 up
bge0: pcistate failed to revert
bge0: pcistate failed to revert
[console]<@> # ping 10.10.10.1
PING 10.10.10.1 (10.10.10.1): 48 data bytes
64 bytes from 10.10.10.1: icmp_seq=0 ttl=64 time=0.381 ms
64 bytes from 10.10.10.1: icmp_seq=1 ttl=64 time=0.145 ms
64 bytes from 10.10.10.1: icmp_seq=2 ttl=64 time=0.238 ms
64 bytes from 10.10.10.1: icmp_seq=3 ttl=64 time=0.160 ms
64 bytes from 10.10.10.1: icmp_seq=4 ttl=64 time=0.247 ms
64 bytes from 10.10.10.1: icmp_seq=5 ttl=64 time=0.178 ms
64 bytes from 10.10.10.1: icmp_seq=6 ttl=64 time=0.260 ms
64 bytes from 10.10.10.1: icmp_seq=7 ttl=64 time=0.194 ms
64 bytes from 10.10.10.1: icmp_seq=8 ttl=64 time=0.129 ms
64 bytes from 10.10.10.1: icmp_seq=9 ttl=64 time=0.209 ms
64 bytes from 10.10.10.1: icmp_seq=10 ttl=64 time=0.146 ms
^C
----10.10.10.1 PING Statistics----
11 packets transmitted, 11 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.129/0.208/0.381/0.072 ms
[console]<@> # ifconfig bge0
bge0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
enabled=0<>
address: 00:08:02:91:89:ae
media: Ethernet autoselect (1000baseT full-duplex,master)
status: active
inet 10.10.10.2 netmask 0xffffff00 broadcast 10.10.10.255
[console]<@> # ttcp -v -r -s
ttcp-r: buflen=8192, nbuf=2048, align=16384/0, port=5001 tcp
ttcp-r: socket
ttcp-r: accept from 10.10.10.1
load: 0.06 cmd: ttcp 31 [netio] 0.00u 0.02s 0% 264k
load: 0.14 cmd: ttcp 31 [netio] 0.00u 0.02s 0% 264k
load: 0.12 cmd: ttcp 31 [netio] 0.00u 0.02s 0% 264k
load: 0.07 cmd: ttcp 31 [netio] 0.00u 0.02s 0% 264k
^C
[console]<@> # ping 10.10.10.1
PING 10.10.10.1 (10.10.10.1): 48 data bytes
^C
----10.10.10.1 PING Statistics----
11 packets transmitted, 0 packets received, 100.0% packet loss
[console]<@> # ifconfig bge0
bge0: flags=8c43<UP,BROADCAST,RUNNING,OACTIVE,SIMPLEX,MULTICAST> mtu 1500
capabilities=7<IP4CSUM,TCP4CSUM,UDP4CSUM>
enabled=0<>
address: 00:08:02:91:89:ae
media: Ethernet autoselect (1000baseT full-duplex,master)
status: active
inet 10.10.10.2 netmask 0xffffff00 broadcast 10.10.10.255
[console]<@> # netstat -i -I bge0
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
bge0 1500 <Link> 00:08:02:91:89:ae 512 19 356 0 0
bge0 1500 10.10.10/24 10.10.10.2 512 19 356 0 0
[console]<@> # netstat -b -I bge0
Name Mtu Network Address Ibytes Obytes
bge0 1500 <Link> 00:08:02:91:89:ae 755254 25282
bge0 1500 10.10.10/24 10.10.10.2 755254 25282
[console]<@> #
>Fix:
unknown