Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: I/O bus reset to fix CMD MSCP controllers (and probably others)



On Mon, Mar 31, 2025 at 12:33:51PM +0200, Johnny Billquist wrote:
> > But none of that happens in udamatch() on a CMD controller because we
> > don't even get to step 1. Even if we fix early return and add a retry,
> > that's usually not enough to get it to step 1.
> 
> Oh. So we just write to IP, and nothing happens? That would be seriously
> broken.

Yes, exactly that.

> > > Hmm, and by the way, when we say "step 1", are we talking about the
> > > controller even indicating that it is at step 1, or the response afterwe
> > > write the data in step 1?
> > 
> > I'm talking about SA indicating it's at step 1, following the write to IP:
> > 
> > 	tries = 0;
> > again:
> > 
> > 	bus_space_write_2(mi.mi_iot, mi.mi_iph, 0, 0); /* Start init */
> > 	if (mscp_waitstep(&mi, MP_STEP1, MP_STEP1) == 0)
> > 		return 0; /* Nothing here... */
> 
> And detecting errors are not even triggering? So it just sits and waits
> until timeout?

Yes. Sometimes errors happen after 9 seconds, sometimes not at all.

> And one or two additional writes eventually gets it to a reset state?

Eventually, yes. For testing I added code that repeatedly checks SA in
1s intervals, and nothing happens for >10s, sometimes >60s.

> > > I think mscp_waitstep really should be modified to detect error states.
> > 
> > Sure, I have a patch attached to do that. It just isn't enough to get
> > the CMD controller going after boot. But it shouldn't hurt either, so
> > I'll probably just go and commit that, and the fix to udamatch() to try
> > again if we don't get to step 1 as well.
> 
> Yeah. That fix should go in there. It might also be a good idea to somehow
> report if we get an error, I think.

Yes, I also think it would be worth to revisit mscp_waitstep() to return
more useful information, and change all the callers to use that.

But there are so many different MSCP controllers out there that testing
any major change is a bit of a nightmare. The MSCP support in NetBSD is
working fine as far as I know and I don't want to riks breaking it.

But right now, none of that really makes a difference. The comment above
mscp_waitstep() mentions that Dilog controllers can't handle fast
back-to-back register accesses, but doesn't specify which model. I've
never heard anything about DEC controllers being problematic (U/KDA50,
KFQSA), or Emulex, or even the Viking. There are also several iterations
of CMD controllers (200, 220, 220A, 4xx?), and all I know is that the
220 has these issues. I took mine out of a uVAX 3500 because 4.3BSD
wouldn't boot off it (VMS did fine) and ultimately put it in a 11/73,
knowing that it's working on 2.11BSD...

And now we're having these issues on NetBSD, too. I really wonder
whether NetBSD might have done a I/O bus reset in the past that went
missing during some refactoring or rototilling.


> > The real initialization of the controller happens in udaattach(),
> > including the real interrupt setup and all that. But for the CMD
> > controller, we don't even get there because udamatch() can't detect it.
> 
> Really strange. Are you booting from that controller? So it's currently
> operational when we get to udamatch() ?

Yes. That's been the point all along. It is detected fine by the same
kernel if that same kernel was booted from the network, and in this case
the network adapter isn't on the QBus either. The CMD controller is not
detected by the same binary when booting off a disk attached to it.

That being said, what I'm really after isn't a change to uda.c or
mscp_subr.c, because the code used by udamatch() hasn't changed
significantly since the days of 3.1.1 when it was working fine.

What I really want is to know what good reason there might be not to
reset the I/O bus again after the kernel was loaded but before
enumerating the devices on the bus, because that makes the CMD
controller work just fine without any additional delays.

Which peripherals do take several seconds to recover from a bus reset?

What's ubainit() good for if it's not being used?


Hans



-- 
%SYSTEM-F-ANARCHISM, The operating system has been overthrown


Home | Main Index | Thread Index | Old Index