port-mac68k: Re: mp3 on m68k (was headless)

Subject: Re: mp3 on m68k (was headless)
To: John Klos <john@sixgirls.org>
From: Michael R.Zucca <mrz5149@acm.org>
List: port-mac68k
Date: 03/02/2003 18:45:00
On Sunday, March 2, 2003, at 05:11  PM, Riccardo Mottola wrote:

> on 3/2/03 8:12 PM, John Klos at john@sixgirls.org wrote:
> The problem would also remain SCSI access, which on the 840 is still 
> dogslow
> and CPU consuming (no dma)

I'm working on that as we speak. I've gotten DMA to work, but I'm not 
getting the big improvements in performance that I was expecting. Here 
are some bonnie scores from my latest test kernel (1.5-pre 1.6 -current 
vintage):

./bonnie -s 10 (This makes a 10 megabyte test file)
               -------Sequential Output-------- ---Sequential Input-- 
--Random--
               -Per Char- --Block--- -Rewrite-- -Per Char- --Block--- 
--Seeks---
Kernel     MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU  
/sec %CPU
PIO         0   533 98.9  2217 98.7  1449 98.3   474 99.3  3315 100.1 
211.4 97.6
DMATEST     0   533 98.5  1737 69.7  1394 98.2   475 99.8  3469  99.9 
212.9 99.4

So, on block I/O we're using less CPU (good) but we're getting less 
throughput (bad). On writes we're using marginally less CPU and getting 
only marginally better throughput. (Though, it seems to feel 
"subjectively faster" when I run it with DMA :-) )

My theory at the moment is that the throughput isn't better because the 
drives are not doing synchronous transfers and this is because when the 
drive probe is going on at boot, the generic NCR SCSI code is using 
unaligned, ECB buffers to do the synch negotiation. Thus, the first few 
bytes of the transfer are done with PIO, and this seems to cause the 
sync negotiation to fail. If I was able to use DMA for that 
transaction, I think the sync negotiation would pass and we'd be seeing 
much, much better bonnie scores. I've got a few ideas on how to do 
this, but I'm still working on it. Right now I'm trying to use a DMA 
bounce buffer to pick up short, unaligned transfers, but I'm having 
some problems.

The biggest problem is that the SCSI DMA channel seems to want things 
16 byte aligned and it wants transfer lengths to be a multiple of 16 
bytes. This makes sense since the SCSI chip FIFO is 16 bytes deep and 
the AV documentation lists the SCSI DMA channel buffer to be 16 bytes, 
but its still a pain. :-)

As for CPU usage, we have to take an interrupt for each non-physically 
contiguous block we transmit. I suspect we're not getting more than a 
couple of contiguous blocks in a row, so we're still taking a number of 
expensive SCSI interrupts. Right now, each segment requires a new SCSI 
transaction. Ideally, I'd like to change the code so that the SCSI 
transaction is setup once, and the DMA engine gets interrupted at the 
end of a contiguous block which then loads the next block from the 
dmamap for the transfer. I believe that the DMA engine is capable of 
this, I just haven't gotten around to finding the interrupt and setting 
up the infrastructure to do this. If it were done this way there would 
still be interrupts for each block but they would be much less costly. 
Though, I suspect interrupts on the 68k are so costly that I may only 
see a few percentage points of improvement in CPU usage.

I've been contemplating releasing the above code so other folks can 
play with it too. If folks are interested enough, I can apply a little 
polish to what I've got and put it up for download. Though, be 
forewarned, the code is still a bit primitive. I think we could be 
doing a lot more with the DMA engine than what I'm doing currently.

Right now, the code uses PIO to 16 byte align transfer addresses and 
lengths. All other transactions DMA directly to/from memory. For many 
transactions that are page aligned, like paging to/from disk, the 
entire transaction is done with DMA!

Any takers?

-- 
----------------------------------------------
  Michael Zucca - mrz5149@acm.org
----------------------------------------------
  "I'm too old to use Emacs." -- Rod MacDonald
----------------------------------------------