port-hp300: Re: SCSI

Subject: Re: SCSI
To: None <thorpej@nas.nasa.gov>
From: Elmar Kolkman <kolkmae@la1.apd.dec.com>
List: port-hp300
Date: 01/28/1997 07:59:38
This was mailed to me by Jason Thorpe, but I'll reply to other messages I got
too.

> On Mon, 27 Jan 1997 08:45:43 +0100 (CET) 
>  "Elmar Kolkman" <kolkmae@la1.apd.dec.com> wrote:
> 
>  > I've tried a bit more, and I'm sure it ISN'T the SCSI code. I will attach
>  > the full boot log from SCSI at the end of this file, but I've also tried by
>  > removing ALL SCSI hardware, including the controller, from my system. It
>  > still hangs with the 'old' 1.2 kernel (I didn't have a 1.2b prerelease
>  > kernel) when netbooting from my linux-machine.
> 
> Thanks, your stack trace is _very_ helpful...  From looking at the
> info you've provided, I see where the problem is, and I'm fairly sure
> I know what's causing it...  More below..
> 
>  > > Well, I know the DCM driver works, since I'm using it to ppp to my ISP,
>  > > so I can type this mail :-)
>  > 
>  > But then again, your machine at least starts, which I cann't say about mine.
>  > ;-)
> 
> Yes, but it's worth noting, I'm not using the DCM as the console (I'm
> using a Catseye framebuffer).

I thought so. I would love to try that too. But I've only a HIL connector, no
connector for a screen.

> 
>  > OK, but to make the debugging a bit easier, I will copy the whole booting
>  > process, so you (all) see the rest of the process too. Maybe it is some
>  > setting, because I don't have any documentation on this machine...
>  > (I removed some '^H ^H stuff...)
> 
> Cool... (BTW, that self-test is _really_ cool looking, with all the
> serial MUX entries :-)

Hm. And that's just the part of it. In real life I have four more of them
connected... ;-)
> 
>  > dca0 at scode 9 ipl 5 flags 0x1: no fifo
>  > dcm0 at scode 10 ipl 3 flags 0
>  > dcm1 at scode 11 ipl 3 flags 0xetrap: bad kernel read access at 0x6e
> 
> ...ok, I'm assuming that the console is on dcm1?  Can you tell
> me _exactly_ which board the console is on?  (I'm assuming it's on
> the port marked "console", since that's the only one the remote bit
> affects :-)

Nope. It's dcm0 I'm using as for the console port.
But, after I've made a cable to connect my HP2624a terminal to the HP9000 on
the small RS-232 connector, I'll try it without any dcm's. Or is it possible
to connect it to a PC's RS-232 connector with a straight 9-pin RS-232 cable ?
(I must have been tired yesterday to not think of that possibility). What I
mean: has this RS-232 connector a HOST or a SLAVE pinout.
(This information isn't yet on the FAQ-page. What I discovered about my
hardware so far, I'll mail to its maintainer).

> 
> Ok, here's a quick tutorial on using this kind of information...
> 
> Note the address in the "trap" message:
> 
> 	trap: bad kernel read access at 0x6e
> 
> 0x6e is in the first "page" (i.e. it's less than 0x1000).  This page
> is not mapped ... i.e. the pte for this page doesn't have the PG_V bit
> set.  This causes dereferences of NULL pointers to cause the trap
> you're seeing (i.e. it's designed to catch bugs :-).

Sounds obvious.

> So, what that has told you is that you attempted to deref NULL.  This

*I* didn't do anything. It goes wrong, even if I am at least 5 meters away
from the console... ;-<

> is the kernel equivalent of getting a SIGSEGV (and, like catching SIGSEGV
> in a user program, it's fatal).
> 
>  > trap type 8, code = 0x402074d, v = 0x6e
>  > kernel program counter = 0xa6c0a
>  > kernel: MMU fault trap
>  > panic: MMU fault
>  > Stopped at      _Debugger+0x6:  unlk    a6
>  > db> trace
>  > _Debugger(200ac,a0ccf,144cdc,2304,144d0c) + 6
>  > _panic(a0ccf,1,1,eb4c4,3) + 34
>  > _trap(8,402074d,6e) + 21a
>  > _addrerr(?)
>  > _dcmxint(eb4c4,1,12,0,0) + 10c
> 
> Ok...this is the part of the stack trace that tells you where the
> problem occurred.  Basically, you ere in the function dcmxint()
> when an address error occurred; the CPU jumps to that function
> when an invalid address is used.
> 
> Ok, so, if you look at the dcmxint() function (sys/arch/hp300/dev/dcm.c,
> line 898), it's pretty clear what's happening...

I would love to, but I would need the sources. I could, of course, set up a
cross compile environment on my Linux-box (I saw some mailing on that too,
this week).
> 
> You're getting an interrupt, and you're dereferencing "tp", which
> is NULL... it's NULL because the port hasn't yet been opened, which
> means the tty structure hasn't been allocated yet.
> 
> "Oops!"  :-)

Well, it's kind of usual for me to find this kind of bugs. Don't know if
it's me or my hardware... Happened with my PC too.
> 
> So, I have a question for you... "Do you have XON/XOFF flow control
> enabled on the terminal you're using?"

Yes. Both on the HP2624a and in minicom I used XON/XOFF.
> 
> If you do, please try disabling it, and tell me if that helps.  In the
> mean time, I'll look for the nicest way to fix that bug...

I'll try this as soon as I'm home.

> I may need to send you a kernel or two to netboot, for testing, as well.
> 
>  > _dcmpint(eb4c4,1,1) + 2c
>  > _dcmintr(eb4c4,219df,2004,a20000,144e84) + de
>  > _isrdispatch(6c) + 7a
>  > _intrhand(?)
>  > _dcmselftest(eb52e,c8ca8,eb52e,28) + a
> 
> ...hmm, and given that this is in the trace... I decided to look and see
> what it does, and I've found a couple of slight bugs in it... *sigh*

This shouldn't be happening if I don't have any dcm's ?

>  > _dcmattach(c8c7c,e8d88,9263e,de45c,de46c) + 82
>  > _find_device(de45c) + 15e
>  > _configure(c,ff801000,fffffffc,13a000,ffeffffc) + 92
>  > _cpu_startup(c992c,c,ff801000,fffffffc,13a000) + 2f2
>  > _main() + 4a
>  > _main() + 4a
>  > db> 
>  > ----- End of minicom.cap ----

Some more information: It happens too when I have only one dcm. But it
occurs in the SCSI drivers then. That was the original reason to post it on
this thread.

And: I have only 1 terminal connected: the console. The rest of the ports are
currently empty.


						Elmar
-- 
Alp =	1) One of a number of ski mountains in Europe
	2) A shouted request for assistance made by a European skier in
	   America. An appropriate reply is "What's Zermatter ?".
				Henry Beard & Roy McKie 

This mail was brought to you by:

		Elmar Kolkman.

He can be reached as 'kolkmae@apd.dec.com' or 'elmar@usn.nl'