Subject: Re: kern/33445: Fixes for Promise SATA (pdcsata) PCI driver
To: None <buhrow@lothlorien.nfbcal.org>
From: Timo Schoeler <timo.schoeler@riscworks.net>
List: netbsd-bugs
Date: 05/17/2006 16:28:15
thus Brian Buhrow spake:
>> Number: 33445 Category: kern Synopsis: These
>> patches fix a number of problems with the pdcsata driver for
>> NetBSD-3.0 and -current.. Confidential: no Severity:
>> critical Priority: high Responsible: kern-bug-people
>> State: open Class: sw-bug Submitter-Id: net
>> Arrival-Date: Mon May 08 23:30:00 +0000 2006 Originator:
>> Brian Buhrow Release: NetBSD 3.0_STABLE and -current
>> Organization:
> Vianet Communications
>> Environment:
> System: NetBSD fserv1.via.net 3.0_STABLE NetBSD 3.0_STABLE
> (NFBNETBSD) #0: Tue Jan 31 14:45:08 PST 2006
> buhrow@lothlorien.nfbcal.org:/usr/src/sys/arch/i386/compile/NFBNETBSD
> i386 Architecture: i386 Machine: i386
>> Description:
> The driver for the family of Promise SATA controllers,
> /usr/src/sys/dev/pci/pdcsata.c is not very robust when it comes to
> handling transient drive errors, or interrupt hickups when the card
> is under load. Worse, my experience seems to indicate, and the Linux
> driver confirms, that these cards tend to fall over rather frequently
> during high load operations or if drives unexpectedly reset or go to
> sleep. Symptoms include interupt timeouts during heavy load, the
> inability to reset drives if they go to sleep, and a failure of the
> card to generate interrupts at all if the interrupt load gets too
> high.
>
>> How-To-Repeat:
> To test to see if you're encountering the problems this driver fixes
> before you patch, try the following steps:
>
> 1. Install a card supported by the pdcsata driver, either one of the
> 203xx cards, or the 205xx cards. The Promise PDC40718 and PDC40719
> cards are also supported by the pdcsata driver.
>
> 2. After the card is configured, and you have a disk on it which is
> running, perform the command: atactl /dev/wd3d sleep Assuming the
> drive attached to your pdcsata driven card is wd3. Change the drive
> number to match the drive actually attached to your pdcsata card.
> Now, run disklabel -r wd3 Again, making the same assumptions as
> above. If you have the broken verssion of the driver, you won't be
> able to revive the drive without a reboot.
>> Fix:
> These patches solve all the bugs listed above, as well as simplify
> the driver. I have tested these patches on production systems
> running at high volume, and they work well. I have been working with
> abs@netbsd.org, who also has one of these cards, and they help him as
> well, although there are still some minor issues to work out with his
> setup. These patches apply cleanly against 3.0 sources as of April
> 21, 2006, but I believe they'll apply equally cleanly to current 3.0
> sources, as well as -current sources. I would like to see these fixes
> get into 3.0, as well as the 4.0 branch.
>
> -thanks -Brian
>
>
> Index: pdcsata.c
> ===================================================================
> RCS file: /cvsroot/src/sys/dev/pci/pdcsata.c,v retrieving revision
> 1.3.2.2 diff -u -r1.3.2.2 pdcsata.c --- pdcsata.c 5 Feb 2006 17:13:57
> -0000 1.3.2.2 +++ pdcsata.c 5 May 2006 17:07:57 -0000 @@ -48,17
> +48,19 @@
>
> #define PDC203xx_BAR_IDEREGS 0x1c /* BAR where the IDE registers are
> mapped */
>
> +#define PDC_CHANNELBASE(ch) 0x200 + ((ch) * 0x80) +#define
> PDC_ERRMASK 0x00780700 + static void pdcsata_chip_map(struct
> pciide_softc *, struct pci_attach_args *); static void
> pdc203xx_setup_channel(struct ata_channel *); -static int
> pdc203xx_pci_intr(void *); static void pdc203xx_irqack(struct
> ata_channel *); static int pdc203xx_dma_init(void *, int, int, void
> *, size_t, int); static void pdc203xx_dma_start(void *,int ,int);
> static int pdc203xx_dma_finish(void *, int, int, int); +static int
> pdcsata_pci_intr(void *); +static void pdcsata_do_reset(struct
> ata_channel *, int);
>
> /* PDC205xx, PDC405xx and PDC407xx. but tested only pdc40718 */
> -static int pdc205xx_pci_intr(void *); -static void
> pdc205xx_do_reset(struct ata_channel *, int); static void
> pdc205xx_drv_probe(struct ata_channel *);
>
> static int pdcsata_match(struct device *, struct cfdata *, void *);
> @@ -183,30 +185,8 @@ return; } intrstr = pci_intr_string(pa->pa_pc,
> intrhandle); - - switch (sc->sc_pp->ide_product) { - case
> PCI_PRODUCT_PROMISE_PDC20318: - case PCI_PRODUCT_PROMISE_PDC20319: -
> case PCI_PRODUCT_PROMISE_PDC20371: - case
> PCI_PRODUCT_PROMISE_PDC20375: - case PCI_PRODUCT_PROMISE_PDC20376: -
> case PCI_PRODUCT_PROMISE_PDC20377: - case
> PCI_PRODUCT_PROMISE_PDC20378: - case PCI_PRODUCT_PROMISE_PDC20379: -
> default: - sc->sc_pci_ih = pci_intr_establish(pa->pa_pc, -
> intrhandle, IPL_BIO, pdc203xx_pci_intr, sc); - break; - - case
> PCI_PRODUCT_PROMISE_PDC40718: - case PCI_PRODUCT_PROMISE_PDC40719: -
> case PCI_PRODUCT_PROMISE_PDC20571: - case
> PCI_PRODUCT_PROMISE_PDC20575: - case PCI_PRODUCT_PROMISE_PDC20579: -
> sc->sc_pci_ih = pci_intr_establish(pa->pa_pc, - intrhandle,
> IPL_BIO, pdc205xx_pci_intr, sc); - break; - } + sc->sc_pci_ih =
> pci_intr_establish(pa->pa_pc, + intrhandle, IPL_BIO,
> pdcsata_pci_intr, sc);
>
> if (sc->sc_pci_ih == NULL) { aprint_error("%s: couldn't establish
> native-PCI interrupt", @@ -258,6 +238,8 @@
> sc->sc_wdcdev.sc_atac.atac_set_modes = pdc203xx_setup_channel;
> sc->sc_wdcdev.sc_atac.atac_channels = sc->wdc_chanarray;
>
> + sc->sc_wdcdev.reset = pdcsata_do_reset; + switch
> (sc->sc_pp->ide_product) { case PCI_PRODUCT_PROMISE_PDC20318: case
> PCI_PRODUCT_PROMISE_PDC20319: @@ -281,7 +263,6 @@
> bus_space_write_4(sc->sc_ba5_st, sc->sc_ba5_sh, 0x60, 0x00ff00ff);
> sc->sc_wdcdev.sc_atac.atac_nchannels = PDC40718_NCHANNELS;
>
> - sc->sc_wdcdev.reset = pdc205xx_do_reset;
> sc->sc_wdcdev.sc_atac.atac_probe = pdc205xx_drv_probe;
>
> break; @@ -290,7 +271,6 @@ bus_space_write_4(sc->sc_ba5_st,
> sc->sc_ba5_sh, 0x60, 0x00ff00ff);
> sc->sc_wdcdev.sc_atac.atac_nchannels = PDC20575_NCHANNELS;
>
> - sc->sc_wdcdev.reset = pdc205xx_do_reset;
> sc->sc_wdcdev.sc_atac.atac_probe = pdc205xx_drv_probe;
>
> break; @@ -403,53 +383,37 @@ }
>
> static int -pdc203xx_pci_intr(void *arg) +pdcsata_pci_intr(void *arg)
> { struct pciide_softc *sc = arg; struct pciide_channel *cp; struct
> ata_channel *wdc_cp; int i, rv, crv; - u_int32_t scr; - - rv = 0; -
> scr = bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, 0x00040); - -
> for (i = 0; i < sc->sc_wdcdev.sc_atac.atac_nchannels; i++) { - cp =
> &sc->pciide_channels[i]; - wdc_cp = &cp->ata_channel; - if (scr &
> (1 << (i + 1))) { - crv = wdcintr(wdc_cp); - if (crv == 0) { -
> printf("%s:%d: bogus intr (reg 0x%x)\n", -
> sc->sc_wdcdev.sc_atac.atac_dev.dv_xname, - i, scr); - } else
> - rv = 1; - } - } - return rv; -} - -static int
> -pdc205xx_pci_intr(void *arg) -{ - struct pciide_softc *sc = arg; -
> struct pciide_channel *cp; - struct ata_channel *wdc_cp; - int i, rv,
> crv; - u_int32_t scr, status; + u_int32_t scr, status, chanbase;
>
> rv = 0; scr = bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, 0x40); +
> if (scr == 0xffffffff) return(1); bus_space_write_4(sc->sc_ba5_st,
> sc->sc_ba5_sh, 0x40, scr & 0x0000ffff); - - status =
> bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, 0x60); -
> bus_space_write_4(sc->sc_ba5_st, sc->sc_ba5_sh, 0x60, status &
> 0x000000ff); + scr = scr & 0x0000ffff; + if (!scr) return(1);
>
> for (i = 0; i < sc->sc_wdcdev.sc_atac.atac_nchannels; i++) { cp =
> &sc->pciide_channels[i]; wdc_cp = &cp->ata_channel; if (scr & (1 <<
> (i + 1))) { + chanbase = PDC_CHANNELBASE(i) + 0x48; + status =
> bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase); + if
> (status & PDC_ERRMASK) { + chanbase = PDC_CHANNELBASE(i) + 0x60; +
> status = bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase); +
> status |= 0x800; + bus_space_write_4(sc->sc_ba5_st, sc->sc_ba5_sh,
> chanbase, status); + status &= ~0x800; +
> bus_space_write_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase, status); +
> status = bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase); +
> continue; + } crv = wdcintr(wdc_cp); if (crv == 0) { printf("%s:%d:
> bogus intr (reg 0x%x)\n", @@ -541,24 +505,29 @@
>
>
> static void -pdc205xx_do_reset(struct ata_channel *chp, int poll)
> +pdcsata_do_reset(struct ata_channel *chp, int poll) { struct
> pciide_softc *sc = CHAN_TO_PCIIDE(chp); - u_int32_t scontrol; - -
> wdc_do_reset(chp, poll); + int reset, status, i, chanbase;
>
> /* reset SATA */ - scontrol = SControl_DET_INIT | SControl_SPD_ANY |
> SControl_IPM_NONE; - SCONTROL_WRITE(sc, chp->ch_channel, scontrol); -
> delay(50*1000); - - scontrol &= ~SControl_DET_INIT; -
> SCONTROL_WRITE(sc, chp->ch_channel, scontrol); - delay(50*1000); -} +
> reset = (1 << 11); + chanbase = PDC_CHANNELBASE(chp->ch_channel) +
> 0x60; + for (i = 0; i < 11;i ++) { + status =
> bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase); + if
> (status & reset) break; + delay(100); + status |= reset; +
> bus_space_write_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase, status); +
> } + status = bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh,
> chanbase); + status &= ~reset; + bus_space_write_4(sc->sc_ba5_st,
> sc->sc_ba5_sh, chanbase, status); + status =
> bus_space_read_4(sc->sc_ba5_st, sc->sc_ba5_sh, chanbase);
>
> + wdc_do_reset(chp, poll);
>
> +}
>
> static void pdc205xx_drv_probe(struct ata_channel *chp)
>
>> Unformatted:
hi,
i have massive troubles with fxp* since adding the patch to a netbsd-3
machine (build on may 10th, see uname):
test: {11} ping -f 192.168.100.2
(takes about four or five seconds to start! should start immediately)
PING packetvermuckler.ts39-bln.riscworks.net (192.168.100.2): 56 data bytes
...............................................................................................^C...........
----packetvermuckler.ts39-bln.riscworks.net PING Statistics----
824 packets transmitted, 600 packets received, 27.2% packet loss
round-trip min/avg/max/stddev = 0.243/1.640/80.343/8.880 ms
314.8 packets/sec sent, 268.3 packets/sec received
test: {12} ifconfig fxp0
fxp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=6<TCP4CSUM,UDP4CSUM>
enabled=0
address: 00:02:b3:8e:29:83
media: Ethernet autoselect (none flowcontrol,rxpause,txpause)
status: no carrier
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
inet 192.168.100.123 netmask 0xffffff00 broadcast 192.168.100.255
inet6 fe80::202:b3ff:fe8e:2983%fxp0 prefixlen 64 scopeid 0x1
test: {13} uname -a
NetBSD test.riscworks.net 3.0_STABLE NetBSD 3.0_STABLE (GENERIC) #0: Wed
May 10 15:22:29 CEST 2006
root@deneb.ts39-bln.riscworks.net:/usr/obj/sys/arch/i386/compile/GENERIC
i386
i can login via ssh, and a ping flood from the LAN gets me this:
localhost:~ tis$ sudo ping -f 192.168.100.123
PING 192.168.100.123 (192.168.100.123): 56 data bytes
...................................................................^C
--- 192.168.100.123 ping statistics ---
2319158 packets transmitted, 2319091 packets received, 0% packet loss
round-trip min/avg/max = 0.153/0.435/192.297 ms
which looks much better.
summary: the machine says 'no carrier', but pings external hosts. it
allows login via ssh (reliable, i work on four shells right now).
the problem appears with both MP and uniprocessor kernels (the machine
is MP).
NetBSD 3.0-RELEASE runs very well on the same machine; i'll try a kernel
without above patch soon.
dmesg following:
NetBSD 3.0_STABLE (GENERIC) #0: Wed May 10 15:22:29 CEST 2006
root@deneb.ts39-bln.riscworks.net:/usr/obj/sys/arch/i386/compile/GENERIC
total memory = 1279 MB
avail memory = 1240 MB
BIOS32 rev. 0 found at 0xfd8b0
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel Pentium III (686-class), 864.02 MHz, id 0x683
cpu0: features 383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 383fbff<PGE,MCA,CMOV,PAT,PSE36,MMX>
cpu0: features 383fbff<FXSR,SSE>
cpu0: I-cache 16 KB 32B/line 4-way, D-cache 16 KB 32B/line 4-way
cpu0: L2 cache 256 KB 32B/line 8-way
cpu0: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative
cpu0: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way
cpu0: 8 page colors
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: ServerWorks CNB20-HE PCI bridge (rev. 0x22)
ppb0 at pci0 dev 0 function 1: ServerWorks CNB20-HE PCI/AGP bridge (rev.
0x01)
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
vga1 at pci1 dev 0 function 0: Matrox MGA G400 AGP (rev. 0x85)
wsdisplay0 at vga1 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
pchb1 at pci0 dev 0 function 2
pchb1: ServerWorks CNB30-LE PCI bridge (rev. 0x00)
pchb2 at pci0 dev 0 function 3
pchb2: ServerWorks CNB30-LE PCI bridge (rev. 0x00)
pci2 at pchb2 bus 2
pci2: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
ahc1 at pci2 dev 1 function 0: Adaptec 3960D Ultra160 SCSI adapter
ahc1: interrupting at irq 11
ahc1: aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus0 at ahc1: 16 targets, 8 luns per target
ahc2 at pci2 dev 1 function 1: Adaptec 3960D Ultra160 SCSI adapter
ahc2: interrupting at irq 11
ahc2: aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
scsibus1 at ahc2: 16 targets, 8 luns per target
fxp0 at pci0 dev 1 function 0: i82550 Ethernet, rev 12
fxp0: interrupting at irq 10
fxp0: Ethernet address 00:02:b3:8e:29:83
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
acardide0 at pci0 dev 2 function 0
acardide0: Acard ATP860-A Ultra66 IDE Controller (rev. 0x01)
acardide0: bus-master DMA support present
acardide0: primary channel wired to native-PCI mode
acardide0: using irq 5 for native-PCI interrupt
atabus0 at acardide0 channel 0
acardide0: secondary channel wired to native-PCI mode
atabus1 at acardide0 channel 1
pdcsata0 at pci0 dev 3 function 0
pdcsata0: Promise PDC40718 SATA300 controller (rev. 0x02)
pdcsata0: interrupting at irq 11
pdcsata0: bus-master DMA support present
atabus2 at pdcsata0 channel 0
atabus3 at pdcsata0 channel 1
atabus4 at pdcsata0 channel 2
atabus5 at pdcsata0 channel 3
fxp1 at pci0 dev 7 function 0: i82559 Ethernet, rev 8
fxp1: interrupting at irq 11
fxp1: May need receiver lock-up workaround
fxp1: Ethernet address 00:10:83:ff:e1:5a
inphy1 at fxp1 phy 1: i82555 10/100 media interface, rev. 4
inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcib0 at pci0 dev 15 function 0
pcib0: ServerWorks OSB4 southbridge (rev. 0x50)
rccide0 at pci0 dev 15 function 1
rccide0: ServerWorks OSB4 IDE Controller (rev. 0x00)
rccide0: bus-master DMA support present
rccide0: primary channel configured to compatibility mode
rccide0: primary channel interrupting at irq 14
atabus6 at rccide0 channel 0
rccide0: secondary channel configured to compatibility mode
rccide0: secondary channel interrupting at irq 15
atabus7 at rccide0 channel 1
isa0 at pcib0
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
isapnp0: no ISA Plug 'n Play devices found
pdcsata0:1: bogus intr (reg 0x14)
pdcsata0:3: bogus intr (reg 0x14)
Kernelized RAIDframe activated
scsibus0: waiting 2 seconds for devices to settle...
scsibus1: waiting 2 seconds for devices to settle...
wd0 at atabus3 drive 0sd0 at scsibus0 target 0 lun 0: <QUANTUM,
ATLAS10K2-TY184L, DA40> disk fixed
sd0: 17366 MB, 17338 cyl, 5 head, 410 sec, 512 bytes/sect x 35566480 sectors
sd0: sync (12.50ns offset 127), 16-bit (160.000MB/s) transfers, tagged
queueing
pdcsata0:1:0: lost interrupt
type: ata tc_bcount: 512 tc_skip: 0
: <WDC WD2500YD-01NVB1>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 233 GB, 486344 cyl, 16 head, 63 sec, 512 bytes/sect x 490234752 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(pdcsata0:1:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using
DMA)
wd1 at atabus5 drive 0pdcsata0:3:0: lost interrupt
type: ata tc_bcount: 512 tc_skip: 0
: <WDC WD2500YD-01NVB1>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 233 GB, 486344 cyl, 16 head, 63 sec, 512 bytes/sect x 490234752 sectors
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd1(pdcsata0:3:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using
DMA)
atapibus0 at atabus6: 2 targets
cd0 at atapibus0 drive 0: <MATSHITADVD-ROM SR-8585, , 1W21> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd0(rccide0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 2
(Ultra/33) (using DMA)
raid0: RAID Level 1
raid0: Components: component0[**FAILED**] /dev/wd0a
raid0: Total Sectors: 490234624 (239372 MB)
boot device: raid0
root on raid0a dumps on raid0b
root file system type: ffs
--
Timo Schoeler | http://riscworks.net/~tis | timo.schoeler@riscworks.net
RISCworks -- Perfection is a powerful message
ISP | POWER & PowerPC afficinados | Networking, Security, BSD services
GPG Key fingerprint = B5F6 68A4 EC45 C309 6770 38C4 50E8 2740 9E0C F20A
There are 10 types of people in the world. Those who understand binary
and those who don't.