Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: I/O bus reset to fix CMD MSCP controllers (and probably others)



I agree with almost all the things you said.

Do we know which model of VAX and which model of CMD controller this is?

It do seem we should expect a transition to step 1 within 100ms after a reset, and could do a second reset much faster (than 10s).

I don't agree that skipping this retrying in udaattach() is correct. Just because udamatch() was successful don't mean we can't have a fail to reset for some weird reason, at which point it should be retried at least once.

You can also check (in 2.11BSD) /usr/src/sys/pdpuba/ra.c, where you have the initialization for the normal operation in the OS (as opposed to the boot code you now looked at). But you'll that the 2.11BSD ra driver do the initialization fully interrupt driven.
(The RSX driver is also fully interrupt driven, by the way.)

And I can tell that I'm running with a CMD controller in my 11/93 and booting 2.11BSD just fine. Which means it goes through both of those initializations fine.

  Johnny

On 2025-03-31 18:17, Hans Rosenfeld wrote:
On Fri, Mar 28, 2025 at 05:27:48PM +0100, Johnny Billquist wrote:
Here is the actual patch:

*** usr/src/sys/conf/boot/raboot.s.old  Mon Aug 17 21:41:34 2009
--- usr/src/sys/conf/boot/raboot.s      Mon Aug 17 22:44:12 2009
***************
*** 1,5 ****
--- 1,9 ----
   /*
    *    SCCS id @(#)raboot.s    2.0 (2.11BSD)   4/13/91
+  *
+  * Code corrected as per the other primitive mscp drivers
+  * to handles other mscp controllers than DECs.
+  * /bqt - 20090817
    */
   #include "localopts.h"

***************
*** 59,65 ****

   MSCPSIZE =    64.     / One MSCP command packet is 64bytes long (need 2)

! RASEMAP       =       140000  / RA controller owner semaphore

   RAERR =               100000  / error bit
   RASTEP1 =     04000   / step1 has started
--- 63,69 ----

   MSCPSIZE =    64.     / One MSCP command packet is 64bytes long (need 2)

! RASEMAP       =       100000  / RA controller owner semaphore

   RAERR =               100000  / error bit
   RASTEP1 =     04000   / step1 has started
***************
*** 153,170 ****
         mov     $RASEMAP,*$ra+RARSPH    / set mscp semaphores
         mov     $RASEMAP,*$ra+RACMDH
         mov     *_bootcsr,r0            / tap controllers shoulder
!       mov     $ra+RACMDI,r0
   1:
         tst     (r0)
!       beq     1b                      / Wait till command read
!       clr     (r0)+                   / Tell controller we saw it, ok.
   2:
         tst     (r0)
!       beq     2b                      / Wait till response written
         clr     (r0)                    / Tell controller we got it
         rts     pc

! icons:        RAERR
         ra+RARING
         0
         RAGO
--- 157,176 ----
         mov     $RASEMAP,*$ra+RARSPH    / set mscp semaphores
         mov     $RASEMAP,*$ra+RACMDH
         mov     *_bootcsr,r0            / tap controllers shoulder
!       mov     $ra+RACMDH,r0
   1:
         tst     (r0)
!       bmi     1b                      / Wait till command read
!       mov     $ra+RARSPH,r0
   2:
         tst     (r0)
!       bmi     2b                      / Wait till response written
!       mov     $ra+RACMDI,r0
!       clr     (r0)+                   / Tell controller we saw it, ok.
         clr     (r0)                    / Tell controller we got it
         rts     pc

! icons:        RAERR + 033
         ra+RARING
         0
         RAGO

So just out of curiosity, I took a look at the whole 2.11BSD rauboot.s
as I wanted to know what it is doing and what wisdom may be gleaned from
this patch. Not much, it seems, as it apparently fixes a different
problem.

But the initialization bits look similar:

RAERR =         100000  / error bit
RASTEP1 =       04000   / step1 has started
RAGO =          01      / start operation, after init
...
RARING =        8.      / Ring base
...
/
/ RA initialize controller
/
         mov     $RASTEP1,r0
         mov     raip,r1
         clr     (r1)+                   / go through controller init seq.
         mov     $icons,r2
1:
         bit     r0,(r1)
         beq     1b
         mov     (r2)+,(r1)
         asl     r0
         bpl     1b
         ...

icons:  RAERR + 033
         ra+RARING
         0
         RAGO

So it writes 0 into IP just once, and loops until the step 1 bit is set
in SA. Once there, it writes the values beginning at icons, each
corresponding to an initialization value for SA for each step, and waits
for each step bit by shifting RASTEP1.

Step 1: RAERR + 033 (100033)
         Bit 15 needs to be 1, and RAERR does that, but it has nothing to
         do with an error here. 033 corresponds to interrupt vector 154,
         which is the default vector for the first MSCP controller. But
         IE is 0, so it shouldn't matter. Ring length is 0 for both
         commands and responses, corresponding to 2**0 == 1 entry each.

Step 2: ra + RARING
         ra is the base of the communications area, but the controller
         actually expects to be given the base of the response and
         command descriptor rings, which are at +8 in the comm area.
         That's the low 16 bit of the full Unibus or Qbus address.

Step 3: 0
         That's the high bits of the full Unibus or Qbus address of the
         comm area.

Step 4: RAGO
	Set DMA burst = 0 (1 longword), request no "last fail" message,
	and kick the controller into action.

So, 9 instructions of code plus 4 words of data to get the thing going.
Nice.


Anyway, I've re-read most of the UDA50 programming manual this
morning and I'd like to share a few things from Section 9.2:
(https://bitsavers.org/pdf/dec/disc/uda50/AA-L621A-TK_UnibusPortDescription_1982.pdf)

   In the event of an initialization error, the port driver must retry
   the sequence at least once. It is suggested, however, that a second
   failure be considered as meaning that the port/controller is "down".

That's where the requirement for (at least) one retry comes from. We do
that only in udamatch(), assuming it won't ever be needed in udaattach().
I don't think that's necessarily a bad assumption, given that udamatch()
must have succeeded talking to the controller for us to ever reach
udaattach().

   The host begins the initialization sequence either by issuing a bus
   INIT or by writing any value to the IP register. The port must
   guarantee that the host will read zeroes in SA on the next bus cycle.
   Initialization then sequences through Steps 1-4 as described on the
   following pages.

So we're kinda expected to read SA=0 once before we get to Step 1.

   From the host's viewpoint, Step n is deemed to have begun when reading
   SA shows the transition Sn 0-->1. Of course, Step n ends when Step
   n+1 begins as just defined. This transition from Step n to Step n+1
   may be accompanied by an interrupt, depending on whether interrupts
   are enabled.

Obviously the transition to Step 1 cannot cause an interrupt, but then
we're not using interrupts anyway despite enabling them.

   Steps 1-3 each are required to complete within 10 seconds. If any of
   these steps fails to complete within that period, this is to be
   treated as a host-detected fatal error.

This is where the 10s timeout in mscp_waitstep() comes from.

   During initialization, the host must wait 100 microseconds after any
   interrupt before reading the SA register to see if there was an error.
   This is because the port may use the SA register to deliver the vector
   address to the processor interrupt sequence. If it does, then time
   will be required by the port to set SA to the value to be read by the
   host initialization code.

We're probably good on that as mscp_waitstep() waits 10ms. Except for
the first read of SA, which is done with no delay. That's probably worth
fixing, just in case.

   This pattern should appear within 100 microseconds after the
   hard-initialize.

This is about the Step 1 bit in SA appearing following a write to IP.
We're currently waiting the whole 10s if it doesn't appear, which
shouldn't do any harm but seems unnecessary. Also, this is where the CMD
controller is failing to react.

   Upon receipt of the above data the port/controller begins running its
   integrity check diagnostics. When finished, the port conditionally
   interrupts the host as described above. If enabled, the interrupt
   will take place whether the diagnositics succeeded or failed.

   Step 1 must complete within 10 seconds after the host writes to the SA
   register. The completion will result in an interrupt if IE was set to
   one in Step 1.

This is what we expect to have happened towards the end of udamatch()
before we return 1, or as the comment says: "should have interrupted by
now". Since we waited for SA to indicate transition to Step 2, we can be
sure that the interrupt has happened by now.


So, I'm not sure this helps much with our problems with uda(4) on CMD
controllers, but I found it interesting nonetheless. The system with the
CMD controller will be offline and unreachable until Friday, so I won't
be able to conduct any more experiments until then.


Hans



--
Johnny Billquist                  || "I'm on a bus
                                  ||  on a psychedelic trip
email: bqt%softjar.se@localhost             ||  Reading murder books
pdp is alive!                     ||  tryin' to stay hip" - B. Idol



Home | Main Index | Thread Index | Old Index