Subject: port-sparc/4777: 1.3 si0 SCSI driver bombs.
To: None <gnats-bugs@gnats.netbsd.org>
From: David Gilbert <dgilbert@jaywon.pci.on.ca>
List: netbsd-bugs
Date: 01/04/1998 23:32:39
>Number: 4777
>Category: port-sparc
>Synopsis: New in 1.3, si0 SCSI dies with flags 0x03
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: gnats-admin (GNATS administrator)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jan 4 20:35:00 1998
>Last-Modified:
>Originator: David Gilbert
>Organization:
============================================================================
|David Gilbert, Velocet Communications. | Two things can only be |
|Mail: dgilbert@velocet.net | equal if and only if they |
|http://www.velocet.net/~dgilbert | are precisely opposite. |
=========================================================GLO================
>Release: Got 1.3 from master release site.
>Environment:
System: NetBSD repeat 1.3 NetBSD 1.3 (REPEAT) #0: Sun Jan 4 17:08:39 EST 1998 dgilbert@repeat:/u3/dgilbert/downloads/NetBSD/src/src/sys/arch/sparc/compile/REPEAT sparc
Machine is Sun4/260 with 5 SCSI drives and 4 'xd' drives. The following
is the kernel config:
# $NetBSD: GENERIC,v 1.51.2.1 1997/11/20 08:46:57 mellon Exp $
include "arch/sparc/conf/std.sparc"
maxusers 128
# Options for variants of the Sun SPARC architecure.
# At least one is required.
options SUN4 # sun4/100, sun4/200, sun4/300
#options SUN4C # sun4c - SS1, 1+, 2, ELC, SLC, IPC, IPX, etc.
#options SUN4M # sun4m - SS10, SS20, Classic, etc.
#options MMU_3L # 3-level MMU on sun4/400; (incomplete)
# Standard system options
options KTRACE # system call tracing
options SYSVMSG # System V message queues
options SYSVSEM # System V semaphores
options SYSVSHM # System V shared memory
#options SHMMAXPGS=1024 # 1024 pages is the default
options LKM # loadable kernel modules
#options INSECURE # disable kernel security level
#options UCONSOLE # allow anyone to steal the virtual console
# Debugging options
options DDB # kernel dynamic debugger
#options DEBUG # kernel debugging code
#options DIAGNOSTIC # extra kernel sanity checking
#options KGDB # support for kernel gdb
#options KGDBDEV=0xc01 # kgdb device number (dev_t)
#options KGDBRATE=38400 # baud rate
#options SCSIVERBOSE # Verbose SCSI errors
# Compatibility options
options COMPAT_43 # 4.3BSD system interfaces
options COMPAT_10 # NetBSD 1.0 binary compatibility
options COMPAT_11 # NetBSD 1.1 binary compatibility
options COMPAT_12 # NetBSD 1.2 binary compatibility
options COMPAT_SUNOS # SunOS 4.x binary compatibility
options COMPAT_SVR4 # SunOS 5.x binary compatibility
options EXEC_ELF32 # Exec module for SunOS 5.x binaries.
# Filesystem options
file-system FFS # Berkeley Fast Filesystem
file-system NFS # Sun NFS-compatible filesystem client
file-system KERNFS # kernel data-structure filesystem
file-system NULLFS # NULL layered filesystem
file-system MFS # memory-based filesystem
file-system FDESC # user file descriptor filesystem
file-system UMAPFS # uid/gid remapping filesystem
file-system LFS # Log-based filesystem (still experimental)
file-system PORTAL # portal filesystem (still experimental)
file-system PROCFS # /proc
file-system CD9660 # ISO 9660 + Rock Ridge file system
file-system UNION # union file system
file-system MSDOSFS # MS-DOS FAT filesystem(s).
options NFSSERVER # Sun NFS-compatible filesystem server
options QUOTA # FFS quotas
options FIFO # POSIX fifo support (in all filesystems)
# Networking options
options INET # IP stack
options TCP_COMPAT_42 # compatibility with 4.2BSD TCP/IP
options GATEWAY # IP packet forwarding
#options ISO,TPIP # OSI networking
#options EON # OSI tunneling over IP
#options CCITT,LLC,HDLC # X.25
#options PFIL_HOOKS # pfil(9) packet filter hooks.
# Options for SPARCstation hardware
options RASTERCONSOLE # fast rasterop console
options BLINK # blink the led on supported machines
# Generic swap; second partition of root disk or network.
config netbsd root on ? type ?
# Main bus and CPU .. all systems.
mainbus0 at root
cpu0 at mainbus0
# Bus types found on SPARC systems.
#sbus0 at mainbus0 # sun4c
obio0 at mainbus0 # sun4 and sun4m
vmes0 at mainbus0 # sun4
vmel0 at mainbus0 # sun4
#iommu0 at mainbus0 # sun4m
#sbus0 at iommu0 # sun4m
#audioamd0 at mainbus0 # sun4c
#audio* at audioamd0
#audioamd0 at obio0 # sun4m
#audio* at audioamd0
#audioamd0 at sbus0 slot ? offset ? # sun4m
#audio* at audioamd0
#auxreg0 at mainbus0 # sun4c
#auxreg0 at obio0 # sun4m
# Power status and control register found on Sun4m systems
#power0 at obio0
# Mostek clock found on 4/300, sun4c, and sun4m systems.
# The Mostek clock NVRAM is the "eeprom" on sun4/300 systems.
#clock0 at mainbus0 # sun4c
#clock0 at obio0 # sun4m
#clock0 at obio0 addr 0xf2000000 # sun4/300
# Intersil clock found on 4/100 and 4/200 systems.
oclock0 at obio0 addr 0xf3000000 # sun4/200
#oclock0 at obio0 addr 0x03000000 # sun4/100
# Memory error registers.
#memreg0 at mainbus0 # sun4c
#memreg0 at obio0 # sun4m
memreg0 at obio0 addr 0xf4000000 # sun4/200 and sun4/300
#memreg0 at obio0 addr 0x04000000 # sun4/100
# Timer chip found on 4/300, sun4c, and sun4m systems.
#timer0 at mainbus0 # sun4c
#timer0 at obio0 # sun4m
#timer0 at obio0 addr 0xef000000 # sun4/300
# EEPROM found on 4/100 and 4/200 systems. Note that the 4/300
# doesn't use this driver; the `EEPROM' is in the NVRAM on the
# Mostek clock chip on 4/300 systems.
eeprom0 at obio0 addr 0xf2000000 # sun4/200
#eeprom0 at obio0 addr 0x02000000 # sun4/100
# Zilog 8530 serial chips. Each has two-channels.
# zs0 is ttya and ttyb. zs1 is the keyboard and mouse.
#zs0 at mainbus0 # sun4c
#zs0 at obio0 # sun4m
zs0 at obio0 addr 0xf1000000 level 12 flags 0x103 # sun4/200 and sun4/300
#zs0 at obio0 addr 0x01000000 level 12 flags 0x103 # sun4/100
zstty0 at zs0 channel 0 # ttya
zstty1 at zs0 channel 1 # ttyb
#zs1 at mainbus0 # sun4c
#zs1 at obio0 # sun4m
zs1 at obio0 addr 0xf0000000 level 12 flags 0x103 # sun4/200 and sun4/300
#zs1 at obio0 addr 0x00000000 level 12 flags 0x103 # sun4/100
kbd0 at zs1 channel 0 # keyboard
ms0 at zs1 channel 1 # mouse
#zs2 at obio0 addr 0xe0000000 level 12 flags 0x103 # sun4/300
#zstty2 at zs2 channel 0 # ttyc
#zstty3 at zs2 channel 1 # ttyd
#
# Note the flags on the esp entries below, that work around
# deficiencies in the current driver:
# bits 0-7: disable disconnect/reselect for the corresponding target
# bits 8-15: disable synch negotiation for target [bit-8]
# Note: targets 4-7 have disconnect/reselect enabled on the premise
# that tape devices normally have one of these targets. Tape
# devices should be allowed to disconnect for the SCSI bus
# to operate acceptably.
#
# sun4/300 SCSI - an NCR53c94 or equivalent behind
# an LSI Logic DMA controller
#dma0 at obio0 addr 0xfa001000 level 4 # sun4/300
#esp0 at obio0 addr 0xfa000000 level 4 flags 0x0000 #
# sun4c or sun4m SCSI - an NCR53c94 or equivalent behind
# specialized DMA glue
#dma0 at sbus0 slot ? offset ? # on-board SCSI
#esp0 at sbus0 slot ? offset ? flags 0x0000 # sun4c
#esp0 at dma0 flags 0x0000 # sun4m
# FSBE/S SCSI - an NCR53c94 or equivalent behind
#dma* at sbus? slot ? offset ? # SBus SCSI
#esp* at sbus? slot ? offset ? flags 0x0000 # two flavours
#esp* at dma? flags 0x0000 # depending on model
# Qlogic ISP SBus SCSI Card
#isp* at sbus? slot ? offset ?
# sun4m Ethernet - an AMD 7990 LANCE behind
# specialized DMA glue
#ledma0 at sbus0 slot ? offset ? # sun4m on-board
#le0 at ledma0 #
# Additional SBus LANCE devices - glued on by lebuffer
#lebuffer0 at sbus0 slot ? offset ? # sun4m SBus
#lebuffer* at sbus? slot ? offset ? # sun4m SBus
#le0 at lebuffer0 #
#le* at lebuffer? #
# sun4/300 and sun4c Ethernet - an AMD 7990 LANCE
#le0 at sbus0 slot ? offset ? # sun4c on-board
#le* at sbus? slot ? offset ?
#le0 at obio0 addr 0xf9000000 level 6 # sun4/300
# sun4/100 and sun4/200 Ethernet - an Intel 82586 on-board
# or on a Multibus/VME card.
ie0 at obio0 addr 0xf6000000 level 6 # sun4/200 on-board
#ie0 at obio0 addr 0x06000000 level 6 # sun4/100 on-board
ie1 at vmes0 addr 0xffe88000 level 5 vect 0x75
ie2 at vmel0 addr 0xff31ff02 level 5 vect 0x76
ie3 at vmel0 addr 0xff35ff02 level 5 vect 0x77
ie4 at vmel0 addr 0xff2dff02 level 5 vect 0x7c
# Xylogics 753 or 7053 VME SMD disk controllers and disks, found
# on sun4 systems.
xdc0 at vmel0 addr 0xffffee80 level 3 vect 0x44
xdc1 at vmel0 addr 0xffffee90 level 3 vect 0x45
xdc2 at vmel0 addr 0xffffeea0 level 3 vect 0x46
xdc3 at vmel0 addr 0xffffeeb0 level 3 vect 0x47
xd* at xdc? drive ?
# Xylogics 451 or 451 VME SMD disk controllers and disks, found
# on sun4 systems.
xyc0 at vmes0 addr 0xffffee40 level 3 vect 0x48
xyc1 at vmes0 addr 0xffffee48 level 3 vect 0x49
xy* at xyc? drive ?
# NCR5380-based "Sun SCSI 3" VME SCSI controller.
# This driver has several flags which may be enabled by OR'ing
# the values and using the "flags" directive.
# Valid flags are:
#
# 0x01 Use DMA (may be polled)
# 0x02 Use DMA completion interrupts
# 0x04 Allow disconnect/reselect
#
# E.g. the following would enable DMA, interrupts, and reselect:
# si0 at vmes0 addr 0xff200000 level 3 vect 0x40 flags 0x07
#
# By default, DMA is enabled in the driver.
si0 at vmes0 addr 0xff200000 level 3 vect 0x40 flags 0x03
# NCR5380-based "SCSI Weird" on-board SCSI interface found
# on sun4/100 systems. The flags are the same as the "si"
# controller. Note, while DMA is enabled by default, only
# polled DMA works at this time, and reselects do not work
# on this particular controller.
#sw0 at obio0 addr 0x0a000000 level 3
# Sun "bwtwo" black and white framebuffer, found on sun4, sun4c, and sun4m
# systems. If your sun4 system has a cgfour installed in the P4 slot,
# the P4 entries for "bwtwo" will attach to the overlay plane of the
# "cgfour".
#bwtwo0 at sbus0 slot ? offset ? # sun4c on-board
#bwtwo* at sbus? slot ? offset ? # sun4c and sun4m
bwtwo0 at obio0 addr 0xfd000000 level 4 # sun4/200
#bwtwo0 at obio0 addr 0xfb300000 level 4 # sun4/300 in P4 slot
#bwtwo0 at obio0 addr 0x0b300000 level 4 # sun4/100 in P4 slot
# Sun "cgtwo" VME color framebuffer
cgtwo0 at vmes0 addr 0xff400000 level 4 vect 0xa8
# Sun "cgthree" Sbus color framebuffer
#cgthree0 at sbus? slot ? offset ?
#cgthree* at sbus? slot ? offset ?
#cgthree0 at obio? slot ? offset ? # sun4m
#cgthree* at obio? slot ? offset ? # sun4m
# Sun "cgfour" color framebuffer with overlay plane. See above comment
# regarding overlay plane.
#cgfour0 at obio0 addr 0xfb300000 level 4 # sun4/300 P4
#cgfour0 at obio0 addr 0x0b300000 level 4 # sun4/100 P4
# Sun "cgsix" accelerated color framebuffer.
#cgsix0 at sbus? slot ? offset ?
#cgsix* at sbus? slot ? offset ?
#cgsix0 at obio0 addr 0xfb000000 level 4 # sun4/300 P4
#cgsix0 at obio0 addr 0x0b000000 level 4 # sun4/100 P4
# Sun "cgeight" 24-bit framebuffer
#cgeight0 at obio0 addr 0xfb300000 level 4 # sun4/300 P4
#cgeight0 at obio0 addr 0x0b300000 level 4 # sun4/100 P4
# Sun "tcx" accelerated color framebuffer.
#tcx0 at sbus? slot ? offset ?
#tcx* at sbus? slot ? offset ?
# Sun "cgfourteen" accelerated 24-bit framebuffer.
#cgfourteen0 at obio0 # sun4m
# SCSI bus layer. SCSI devices attach to the SCSI bus, which attaches
# to the underlying hardware controller.
#scsibus* at esp?
#scsibus* at isp?
scsibus* at si?
#scsibus* at sw?
# These entries find devices on all SCSI busses and assign
# unit numbers dynamically.
sd* at scsibus? target ? lun ? # SCSI disks
st* at scsibus? target ? lun ? # SCSI tapes
cd* at scsibus? target ? lun ? # SCSI CD-ROMs
ch* at scsibus? target ? lun ? # SCSI changer devices
# Floppy controller and drive found on SPARCstations.
#fdc0 at mainbus0 # sun4c controller
#fdc0 at obio0 # sun4m controller
#fd* at fdc0 # the drive itself
pseudo-device loop # loopback interface; required
pseudo-device pty 32 # pseudo-ttys (for network, etc.)
pseudo-device sl 2 # SLIP interfaces
pseudo-device ppp 2 # PPP interfaces
pseudo-device tun 4 # Network "tunnel" device
pseudo-device bpfilter 16 # Berkeley Packet Filter
pseudo-device vnd 4 # disk-like interface to files
pseudo-device ccd 4 # concatenated and striped disks
#pseudo-device strip 1 # radio clock
#pseudo-device ipfilter # ip filter
# rnd is EXPERIMENTAL
pseudo-device rnd # /dev/random and in-kernel generator
>Description:
Machine crashes to DB ... havn't noticed a panic message. The calls
on the top of the 'trace' command output are:
ncr5380_data_xfer
ncr5380_machine
ncr5380_sched
ncr5380_scsi_cmd
So far, the machine has come back without major disk lossage. I
vaguely suspect that the loading of screensavers is hosing the system.
The system runs a small news feed, but /var and /var/news are both on the
xd drives (/var/news is a ccd across two of them). /var/spool/uucp
is written from NFS. Swapping is done to all for xd drives. I don't
currently have a dump partition large enough for the 80 Megs RAM I have.
So... the only thing running from SCSI disks should be the
screensavers (/usr/X11R6 is on a ccd covering two SCSI drives).
>How-To-Repeat:
I don't know how much of the problem lies in the changes to the
SCSI driver. It seems very different from the last time I looked at it.
On thing that cropped up in 1.2 was a chache flushing problem that
basically screwed my disks. This smells similar since my install for
1.3 is acting strange for any portion of it that is residing on SCSI
disk.
The machine is available for testing. I even have a X86 and/or
a Sun3 that I can hook the console to (such that someone testing could
do useful debugging).
>Fix:
Right now, I'm compiling a kernel (luckily that's on an xd
drive) that has flags 0x01 on si0. This would suck in the long
term. It is my perception (may not be acurate) that the SCSI
disks seem 25-50% slower (one-quarter to one-half the speed) of
1.2.
Dave.
>Audit-Trail:
>Unformatted: