tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: SunFire v100 / Acer M5229 IDE DMA error workaround
On Wed, Oct 29, 2008 at 12:05:39PM -0400, Rafal Boni wrote:
> Folks:
> I've been taunted by the IDE interface on my SunFire V100 for a long
> (LOOONG!) time with messages along the lines of:
>
> wdNN: DMA error writing fsbn xxxx of xxxx-yyy (wdN bn pppp; cn ccc tn
> tt sn ss), retrying
> wdN: soft error (corrected)
>
> This box is running RAIDFrame over 2 110GB IDE drives, one on each
> channel. While the errors have not caused any data loss, they do
> eventually cause the IDE subsystem downgrade to slower-and-slower
> DMA modes and eventually even to PIO access to the disks. See [1]
> and the messages in that thread for my prior attempts at getting
> rid of these errors.
>
> I've since tried a bunch more stuff, and found that none of the
> aceride-specific changes made any real difference. It looks like
> for whatever reason, the chip asserts interrupts before the DMA
> is complete, or the PCI IDE code at least believed that was the
> case. So last night, after looking at the FreeBSD and OpenBSD
> IDE code, I came up with the following set of changes, which
> so far has not had any negative consequences on the system and
> has survived a complete RAID parity rebuild (this was the one
> case where I *always* got the DMA errors) without spewing a
> single IDE DMA-related error. It also makes the box feel a
> bit snappier, but maybe I'm just imagining that ;)
>
> I realize this change is probably done in the wrong place -- I
> should probably have created a aceride-specific dma_finish method
> and done the checks there, but this is at least a proof-of-concept
> that the change works; the change also includes some more debug
> logging in the case of DMA errors, which aren't necessary to fix
> the issue but helped me diagnose it, so I've left them in for now.
>
> Finally, I know Manuel mentioned that doing something along these
> lines would likely have an impact on ATAPI DMA operations, and I
> have not tested it with anything beyond ATA disk -- however, I'm
> not sure that ATAPI DMA ever worked on my V100 -- I think it always
> falls back to PIO, at least with the CDROM in the system.
>
> Patch below... I'd love comments / feedback, esp. on ATAPI use cases,
> --rafal
>
> ---8<------8<------8<------8<------8<------8<------8<------8<------8<---
> Index: pci/pciide_common.c
> ===================================================================
> RCS file: /cvsroot/src/sys/dev/pci/pciide_common.c,v
> retrieving revision 1.38
> diff -u -p -r1.38 pciide_common.c
> --- pci/pciide_common.c 18 Mar 2008 20:46:37 -0000 1.38
> +++ pci/pciide_common.c 29 Oct 2008 15:30:21 -0000
> @@ -737,7 +738,9 @@ pciide_dma_finish(v, channel, drive, for
> ATADEBUG_PRINT(("pciide_dma_finish: status 0x%x\n", status),
> DEBUG_XFERS);
>
> - if (force == WDC_DMAEND_END && (status & IDEDMA_CTL_INTR) == 0)
> + /* XXXrkb: From FreeBSD; should probably add an evcnt here */
> + if (force == WDC_DMAEND_END &&
> + ((status & (IDEDMA_CTL_INTR | IDEDMA_CTL_ACT)) != IDEDMA_CTL_INTR))
> return WDC_DMAST_NOIRQ;
I have a hunch that this is not necessary. After you introduce the new
bus_space_write_1() call, below, does the condition IDEDMA_CTL_INTR &&
IDEDMA_CTL_ACT ever occur?
> /* stop DMA channel */
> @@ -752,6 +755,9 @@ pciide_dma_finish(v, channel, drive, for
> BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
> bus_dmamap_unload(sc->sc_dmat, dma_maps->dmamap_xfer);
>
> + /* Clear status bits */
> + bus_space_write_1(sc->sc_dma_iot, cp->dma_iohs[IDEDMA_CTL], 0, status);
> +
I may be missing something, but by my reading of a PCI IDE controller
spec that I scrounged off the web, it is important to acknowledge the
interrupt in this way. ISTM that the code should already acknowledge
the interrupt by calling pciide_irqack(). Not so?
Note that this write may not be flushed to the device, and the
interrupt deasserted, until a second call to pciide_dma_finish() calls
bus_space_read_1(, cp->dma_iohs[IDEDMA_CTL], ). In other words, you
may take two interrupts per DMA completed.
Dave
--
David Young OJC Technologies
dyoung%ojctech.com@localhost Urbana, IL * (217) 278-3933 ext 24
Home |
Main Index |
Thread Index |
Old Index