Port-vax archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: I/O bus reset to fix CMD MSCP controllers (and probably others)



I took a quick look at it.

udamatch() only does the initial steps to see if there is an uda at all, then leave the rest to the mscp bus routines.

The MP_STEP1 parts you quote below is just that.  It resets the uda and sees if it gets any answer. If it succeeds then STEP1MASK etc is checked later on to walk through the initialization process.

I do not know where Robert's CMD controller fails, but as some of you have written there is a logic error in udamatch() where it do not retry if nothing is found the first time.  Hans, have you tried to change the udamatch() routines to try init multiple times?

Should also note that when this code was written around 30 years ago I did not have access to all the documentation that is available today.
So there is most likely errors in how it is implemented :-)

-- R


Den 2025-03-29 kl. 02:23, skrev Johnny Billquist:
Actually, udamatch() confuse me. I don't understand how it is expected to deal with step 3 and 4. And we have a proper initialization in mscp/mscp_subr.c in mscp_init(), which also walks through all the initialization steps.

I honestly don't understand the thinking behind that code...

  Johnny

On 2025-03-29 02:18, Johnny Billquist wrote:
Hmm. I haven't read through all the code, but I at least see some problems.

In the initialization, the code looks like this:

         bus_space_write_2(mi.mi_iot, mi.mi_iph, 0, 0); /* Start init */
         if (mscp_waitstep(&mi, MP_STEP1, MP_STEP1) == 0)
                 return 0; /* Nothing here... */


and so on for the next step. The problem is that mscp_waitstep then only checks that the controller moves to the next step, but cannot detect if the controller indicates any error. The first MP_STEP1 really should be ALLSTEPS, and there should be some code to do a reset for a second try in case you see an error condition.

But actually, even more proper should be to use STEP1MASK and compare against STEP1GOOD, and so on... There are all these nice values defined in mscp/mscpreg.h, but then they are not used, and we have this half- broken code instead. I wonder how that happened...?

   Johnny

On 2025-03-28 18:59, Hans Rosenfeld wrote:
On Fri, Mar 28, 2025 at 05:27:48PM +0100, Johnny Billquist wrote:

Here is the actual patch:

*** usr/src/sys/conf/boot/raboot.s.old  Mon Aug 17 21:41:34 2009
--- usr/src/sys/conf/boot/raboot.s      Mon Aug 17 22:44:12 2009
***************
*** 1,5 ****
--- 1,9 ----
   /*
    *    SCCS id @(#)raboot.s    2.0 (2.11BSD)   4/13/91
+  *
+  * Code corrected as per the other primitive mscp drivers
+  * to handles other mscp controllers than DECs.
+  * /bqt - 20090817
    */
   #include "localopts.h"

***************
*** 59,65 ****

   MSCPSIZE =    64.    / One MSCP command packet is 64bytes long (need 2)

! RASEMAP       =       140000  / RA controller owner semaphore

   RAERR =               100000  / error bit
   RASTEP1 =     04000   / step1 has started
--- 63,69 ----

   MSCPSIZE =    64.    / One MSCP command packet is 64bytes long (need 2)

! RASEMAP       =       100000  / RA controller owner semaphore

   RAERR =               100000  / error bit
   RASTEP1 =     04000   / step1 has started
***************
*** 153,170 ****
         mov     $RASEMAP,*$ra+RARSPH    / set mscp semaphores
         mov     $RASEMAP,*$ra+RACMDH
         mov     *_bootcsr,r0            / tap controllers shoulder
!       mov     $ra+RACMDI,r0
   1:
         tst     (r0)
!       beq     1b                      / Wait till command read
!       clr     (r0)+                   / Tell controller we saw it, ok.
   2:
         tst     (r0)
!       beq     2b                      / Wait till response written
         clr     (r0)                    / Tell controller we got it
         rts     pc

! icons:        RAERR
         ra+RARING
         0
         RAGO
--- 157,176 ----
         mov     $RASEMAP,*$ra+RARSPH    / set mscp semaphores
         mov     $RASEMAP,*$ra+RACMDH
         mov     *_bootcsr,r0            / tap controllers shoulder
!       mov     $ra+RACMDH,r0
   1:
         tst     (r0)
!       bmi     1b                      / Wait till command read
!       mov     $ra+RARSPH,r0
   2:
         tst     (r0)
!       bmi     2b                      / Wait till response written
!       mov     $ra+RACMDI,r0
!       clr     (r0)+                   / Tell controller we saw it, ok.
         clr     (r0)                    / Tell controller we got it
         rts     pc

! icons:        RAERR + 033
         ra+RARING
         0
         RAGO



Anyway, not sure if this helps, since now we're in PDP-11 assembler. But
maybe it gives a bit of an idea what the problem was.

I've actually looked at that and tried to understand it when I was
looking into this issue. The PDP-11 assembly doesn't scare my, I've
written my fair share of it and I'm still comfortable reading it. Too
bad the patch doesn't show the definiton of RACMDI and RARSPH, and I'm
too lazy to google that. Maybe I'll boot the 11/73 later this weekend
and look at the full code.

What I did read was the MSCP programming document for the UDA50 that's
on Bitsavers.

But if someone points me at the specific code in NetBSD, I can try to see if
it's a similar kind of issue.

The problem is in sys/dev/qbus/uda.c, in particular in udamatch(). All
that udamatch() wants to do is to go through the first initialization
steps to cause an interrupt.

The state of the controller when udamatch() is running is that it has
been used already by VMB and boot to get the kernel loaded.

The UDA50 register interface really consists only of two registers, IP
and SA. Writing anything into IP should cause a initialization sequence
to be started, with SA indicating Step1 shortly after. If it doesn't,
udamatch() should try one more time, but currently doesn't. It only
retries the initialization if Step1 was reached and we then fail to
reach Step2.

The first thing I did was checking that the CSR was mapped correctly,
that the physical addresses where what was expected. I also read and
wrote the registers directly at the VMB console. It would have been
surprising if anything was wrong there, as the same code works just fine
when the controller hasn't been touched since the last I/O bus reset
since we've booted off the network.

One of the things I did as an experiment was have udamatch() write 0
into SA, and then read it once per second until something happened. Most
of the time, SA would have the error bit set after a few seconds. From
there, writing 0 into IP would kick off a controller initialization and
get SA to indicate Step1 2s later. But as I said, this incurs a boot
delay around 12s.


Hans







Home | Main Index | Thread Index | Old Index