Subject: Re: kern/29936: isp(4) with Qlogic 2312 FC HBA hangs with: "unable to load DMA (35)"
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Greg A. Woods <woods@planix.com>
List: netbsd-bugs
Date: 04/11/2005 00:01:02
The following reply was made to PR kern/29936; it has been noted by GNATS.
From: "Greg A. Woods" <woods@planix.com>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: NetBSD GNATS submissions and followups <gnats-bugs@netbsd.org>,
<kern-bug-people@NetBSD.org>,
NetBSD GNATS Administrator <gnats-admin@NetBSD.org>
Subject: Re: kern/29936: isp(4) with Qlogic 2312 FC HBA hangs with: "unable to load DMA (35)"
Date: Sun, 10 Apr 2005 19:59:57 -0400 (EDT)
[ On Sunday, April 10, 2005 at 22:42:12 (+0200), Manuel Bouyer wrote: ]
> Subject: Re: kern/29936: isp(4) with Qlogic 2312 FC HBA hangs with: "unable to load DMA (35)"
>
> > isp1: unable to load DMA (35)
>
> This is EAGAIN. My guess is that pci_sgmap_pte64_load() is in ressource
> shortage.
Indeed, but why is it so "fatal" -- a "shortage" is not an "outage" and
I wouldn't have thought it to be a permanent condition....
> > sd6(isp1:0:1:0): adapter resource shortage
>
> the scsipi subsystem will sleep for one second and try again, 5 times.
Which of course won't help if the "shortage" never goes away (in time?).
> What is strange is that you say other isp devices don't have this problem.
No, so far it hasn't, though I wasn't going to let a sample of 2 decide
that certain. :-)
> If there is ressource shortage it should be for everyone using this sgamap.
> If I understood it properly, the sgamap is per-tsp bus, which means that
> the ressource shortage is only for devices on the pci1 bus.
> I see you have lots of network adapters on pci1; it's possible that their
> drivers allocate DMA ressources statically, causing this condition.
> You should try to arrange to have all network devices on one PCI bus,
> and all scsi ones on the second PCI bus.
Well that's a very good clue! Thanks!
Indeed the bge0 device on pci1 (along with isp1) is not being used,
partly because it alone can trigger some very similar kind of problem
with DMA resources. Like I say it's unused, however I suppose there
could be some situation which might somehow trigger it and cause it to
try to allocate DMA buffers. As far as I know nobody has ifconfig'ed it
before either hang, but it's possible someone or something did something
to activate it. (However the third crash -- the one where everything
hung completely, was, perhaps not coincidentally, right after I had done
a "pcictl pci0 list" command to get the product code for the Qlogic
card.)
I had thought I had applied Jason's patches from the "bge(4) (DEGXA-TX)
no-go on the AlphaServer ES40" thread on tech-kern (& port-alpha) to the
1.6.x code too, but it seems I had not, so the 1.6.x version definitely
still causes problems on big memory machins.
I guess this still all boils down to needing a proper fix for PR# 28362
as well as complete support for 64-bit DMA so that mapping doesn't have
to be done for 64-bit cards on 64-bit systems like this.
In the mean time I will remove the bge driver from the kernel entirely
and hope that it was indeed the underlying cause.
However that still leaves wm0 (and the unused wm1) on pci1 along with
isp1. I'm not very comfortable with moving all the isp and ahc devices
to one bus just to put the network devices alone on the other, but I
suppose if that's what it takes.... I guess I won't know for sure
though if the bge removal fixes it until at least a couple of weeks go
by without further problems along these lines.
(note I cannot bring up wm1 concurrently with wm0 with this kernel -- I
encounter a similar DMA resource problem.... I'm not even sure it
worked with a -current kernel. I didn't want a dual-port card, but they
were the same price as the single, and a dual is of more use in other
kinds of machines if I can ever get the bge to work again, and if we
ever get a copper GigE port on the/a switch to connect it with, but in
the mean time even without the DMA resource issues, the bge driver still
only goes about half the speed of the wm driver. :-)
--
Greg A. Woods
H:+1 416 218-0098 W:+1 416 489-5852 x122 VE3TCP RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com> Secrets of the Weird <woods@weird.com>