port-vax: VAX 6460 being slow, IO bottlenecks and SMP woes ...

Subject: VAX 6460 being slow, IO bottlenecks and SMP woes ...
To: None <classiccmp@classiccmp.org, port-vax@netbsd.org>
From: Gunther Schadow <gunther@aurora.regenstrief.org>
List: port-vax
Date: 03/20/2002 17:55:14
It is very contemplative to sit in the basement, room filled with
big iron and lots of blower and disk noise, and watch the system
creep ahead compiling stuff. I remember those old days back in, say,
1994 when I sat there on a i486/33 watching GCC compile (and those
*endless* runs of all the fix-include stuff.) What surprizes me is
that the feeling on a 1989 vintage 6-processor million-dollar VAX
isn't very different! I expected it to be faster with that.

So I looked at statistics. Nothing paged out (0.5 GB of RAM),
but looks like the system is heavily I/O bound and I/O
is quite slow. I even distributed disk load over 3 RA90, though all on
the same controller, but at least that's a KDM70, directly on the XMI
bus. Still, disk access seems just a bit on the slow side. Why?

All terminal I/O went through the Ethernet, which I thought was a
DEBNT but is reported to me as a DEBNA. Anyway, that's just the
few lines of text that are being logged as make creeps forward with
its job.

Looked at cpustat (we're on Ultrix 4.5 BTW), where you can see the
load on the CPUs. Sorry, no "screenshot" here, but in short it lists
6 CPU's and the load on each. 5 of them tend to be 97% idle and CPU#1
is 75% idle. CPU #1 gets bombarded with all interrupt requests while
the others get none of that. Unfortunately I couldn't figure out how
one can see process to CPU allocation. I suppose that each process
runs in a single thread on one CPU every time it is active. Since the
making and the cc-ing is a sequential thing writing temporary files
to disk, I suppose that the 5 idle processors have nothing to do
while CPU#1 takes all the burden of the compiling task plus all
other system interrupts (essentiall all I/O.) So, that's kind of
not optimal.

I suppose, once I have my first GCC built I should use the -pipe
option to avoid temporary files with the hope that the two ends
of each pipe would then be allocated to two different CPUs. That
should then speed up the process a lot, basically could stream cpp
on CPU#1 to cc1 on CPU#2 to as on CPU#3 right through. I hope. That
is, if UNIX domain sockets (i.e. pipes) are implemented so as to not
require any hardware IO. I *hope* this is simply done by CPU#1
entering kernel with an mbuf and CPU#2 entering kernel shortly
thereafter reading that mbuf, so only memory should be involved.

It's interesting. At some time soon, may be early this summer,
I'll give a VAX party where one of the highlights will be a
race between my i486/33 and the 6460 in compiling something,
there we can see if it's really just my perception that the
6460 is kind of slow for the price, even measured by past
standards.

Another thing that made me wonder is that writing to the TU81+
on KLESI-B showed its signs of I/O bottlenecks. When I just
did

# tar cvb 20 -f /dev/nrmt0h /usr5/gcc-2.7.2

the tape would write block by block in staccato and would not
stream. The TU81 is still nice even in block mode, not that
endless back and forth of the TK50 or any other cassette media
that I have seen operating, it's a fast staccato tatatatatatata,
you gotta see this!

Only if I used my dd buffering trick with

# tar cvb 20 -f - /usr5/gcc-2.7.2 |dd ibs=2048000 of=/dev/nrmt0h obs=10240

would it stream over larger sections. But the slightest disk
read activity would cause a little pause to the tape transport.
First I thought I should be rearranging my cards on the VAXBI
busses, but then I remembered that the disks are on XMI directly.
So, a simple RA90 read through KDM70 on XML is just not fast
enough in order to keep the VAXBI - KLESIB - TU81+ streaming.
Are the RA90 disks so slow? Or may be it is Ultrix' bad way of
using the multipe CPUs again, i.e., they still handle all the
work through one single CPU#1 while the others are chatting idly?

Does NetBSD do a better job with SMP? Would it use one CPU for
the disk IO and another CPU for the KLESI-B IO with shared
memory buffers in the middle? May be not if I used just one
process to both read from disk and write to tape (like tar
only), but with the pipe, tar | dd it should do it and that
should allow me to use a smaller ibs for dd, AND have real
streaming write to the tape. Or is SMP on NetBSD/VAX still a
sub-optimal hack?

When I attended BSDcon 2002 in San Francisco few weeks ago,
it seemed like all the BSDs would go different ways about SMP.
I liked what Jason announced about NetBSD, like IO being
handled without memory data copies, and the kernel actually
shrinking in size. I would hope that all BSD/SMP efforts
nowadays seek to allow true load sharing between the CPUs
and not shedule IRQs to only one and not hog that one primary
I/O CPU with *all* processes that have any I/O to do. And
I sure hope that it will be natural for pipelined processes
to operate on different CPUs. Right?

regards,
-Gunther


-- 
Gunther Schadow, M.D., Ph.D.                    gschadow@regenstrief.org
Medical Information Scientist      Regenstrief Institute for Health Care
Adjunct Assistant Professor        Indiana University School of Medicine
tel:1(317)630-7960                         http://aurora.regenstrief.org