Port-vax archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: NetBSD/vax - worth continuing?
A month since I was updating this thread. Since I at least have some
observations to offer now, I figured I should post them.
On 2016-09-21 09:59, Johnny Billquist wrote:
On 2016-09-21 08:26, Anders Magnusson wrote:
Den 2016-09-20 kl. 23:55, skrev Johnny Billquist:
On 2016-09-20 21:24, Anders Magnusson wrote:
Den 2016-09-20 kl. 19:12, skrev Johnny Billquist:
Hi, Ragge...
Tried work that out many times, but never gotten far. You want console
access to a hung or crashed system? :-)
If you cannot get into DDB then something really evil has happened. Is
this the case?
No. Seems I must have really mis-stated this. The system hangs as in
the OS stalls. The hardware is working fine, and I can break into DDB
as well. If I were to make a guess, it appears that all processes that
do disk I/O stalls.
Other things continue running. But as most things touch disk sooner or
later, pretty much everything draws to a standstill.
You aere using MSCP, eh? I would take a guess of that it loses MSCP
buffers somewhere then.
Yes, MSCP.
Hmm... Loosing buffers. That's an interesting idea I hadn't considered.
Could be.
After lots of experimentation and playing around, I think the problem is
not related to loosing buffers. I'll try to explain my observations, but
this requires a bit of describing my setup as well, so bear with me.
The machine is a real VAX 8650, with 60 Megs of memory, and eight RA73
disk drives, and one ethernet.
Disk drives are connected to two UDA-50, and ethernet is DELUA.
The machine have two Unibuses. First Unibus have one UDA-50 and DELUA.
Second Unibus only have one UDA-50. Disks are numered ra0 to ra7, with
ra0-3 on UDA-50 #0, and ra4-ra7 on UDA-50 #1.
/, swap, /var and /home are all on ra0.
/usr is now on ra1
/usr/src in on ra2
/usr/src/external/gpl3 is on ra5
Earlier I had a ccd disk, which consisted of ra4-7, and this was all of /usr
In between I also tried having /usr/src on ra4.
In short, I have had various setups for the disks, but what I have been
changing around is on which controller the different file systems have
been located.
Now, with ccd, the system get stuck in uvn_fp2 when I', running cvs. It
does not happen right away, but eventually it always happened.
Having skipped ccd, and just working on disks on the first UDA-50, the
system seems to not have any problems. But when I do disk operations on
the second UDA-50, sooner or later, the process gets stuck in biowait,
and never recovers.
Now, I have tried this on different disks, and with different
controllers, so I think the problem is not there. I have not tried
replacing the Unibus adapter as such.
However, it seems the problem is somehow related to the controller on
the second bus. Either we have some bug in the NetBSD code, or I have
some other problem that I haven't noticed. I've tried exercising the
disks through VMS, and haven't seen any problem through there, but I'm
sure this testing have not been very thorough.
The machine runs both VMS and Ultrix fine, and pass all the diagnostics
I've thrown at it so far. Unfortunately I do not have any diagnostics
for the RA73 drives. The MSCP disk diagnostics I have do not recognize
the RA73 drives (too new), so they do not show up that way. If anyone
have newer DS diagnostics for MSCP drives than around 1990, I would be
interested in getting copies.
But for now, this really smells as if we have some kind of issue with
additional Unibuses in NetBSD. Interesting detail is that looking at
vmsstat -i, I can see that uba0 have generated some interrupts, but uba1
never generate any interrupts.
Ragge, what interrupts would the Unibus adapter generate, and does it
make sense that only one of the adapters are generating interrupts?
In the end, I have not been able to run the actual tests with cvs that I
intended to, since the machine always hangs sooner or later, while
working on the disk. And since /usr/src is so big, I need at least two
ra73 to hold it. I could allocate another ra73 on the first UDA-50, and
see if I can get through all the work then, but since I have some data
on the other disks, this is a bit messy.
And really, seems like we have a problem that needs solving here.
Next time it happens, please get a process list (ps axl or from ddb) so
we can get further on diagnosing it.
Sure. That is easy.
Done a lot more than that, but it certainly seems like it gets stuck on
disk, but only for disks on the second controller/second Unibus.
In fact, the machine is partially stuck right now and in ddb.
Here is ps from ddb:
db> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
9866 1 3 0 80 812f1d40 telnetd netio
7388 1 3 0 80 80e8b7e0 telnetd netio
7220 1 3 0 1000000 81548aa0 df vcache
6190 1 3 0 80 812f1800 telnetd netio
7785 1 3 0 80 81548560 tcsh pause
6934 1 3 0 80 80be22c0 tcsh pause
6966 1 3 0 80 80be2020 pickup kqueue
3234 1 3 0 0 80be2560 find vcache
921 1 3 0 80 815ed2a0 postdrop netio
3880 1 3 0 80 81836aa0 sendmail pipe_rd
3650 1 3 0 80 812f1aa0 tee pipe_rd
3453 1 3 0 80 81548d40 sh wait
1088 1 3 0 80 81548800 sh wait
1092 1 3 0 80 82dbe000 cron pipe_rd
27046 1 3 0 0 83aa72a0 cvs vcache
25830 1 3 0 80 8393a7e0 tcsh ttyraw
26574 1 3 0 80 81836800 tcsh pause
28141 1 3 0 80 82194d40 login wait
26357 1 3 0 80 80e8ba80 telnetd select
21901 1 3 0 0 815ed7e0 find biowait
16289 1 3 0 80 80e8b000 postdrop netio
22585 1 3 0 80 821942c0 sendmail pipe_rd
21391 1 3 0 80 80be2aa0 tee pipe_rd
23186 1 3 0 80 812f1020 sh wait
21144 1 3 0 80 812f1560 sh wait
19187 1 3 0 80 838ceaa0 cron pipe_rd
12339 1 3 0 0 82194020 find biowait
12025 1 3 0 80 815ed000 postdrop netio
10843 1 3 0 80 82194800 sendmail pipe_rd
10822 1 3 0 80 82dbea80 tee pipe_rd
11165 1 3 0 80 83aa7000 sh wait
10919 1 3 0 80 80be2800 sh wait
12324 1 3 0 80 80e8b2a0 cron pipe_rd
1939 1 3 0 80 83aa77e0 getty ttyraw
1882 1 3 0 80 83b11d40 getty ttyraw
2007 1 3 0 80 82dbe540 cron nanoslp
463 1 3 0 80 812f12c0 inetd kqueue
1759 1 3 0 80 838ce2c0 qmgr kqueue
1747 1 3 0 80 815ed540 master kqueue
1381 1 3 0 80 82dbe2a0 sshd select
1338 1 3 0 80 8393aa80 rwhod select
1012 1 3 0 80 82dbe7e0 ntpd pause
1168 1 3 0 80 82dbed20 rpc.lockd select
862 1 3 0 80 82e7caa0 rpc.statd select
1148 5 3 0 80 82e7c020 slave nfsd
1148 4 3 0 80 82e7c2c0 slave nfsd
1148 3 3 0 80 82e7c560 slave nfsd
1148 2 3 0 80 82e7c800 slave nfsd
1148 1 3 0 80 82e7cd40 master select
349 1 3 0 80 838ce020 mountd select
1068 1 3 0 80 838ce800 rpcbind select
985 1 3 0 80 838ce560 syslogd kqueue
1 1 3 0 80 83b11aa0 init wait
0 39 3 0 200 8393a000 nfsio nfsiod
0 38 3 0 200 8393a2a0 nfsio nfsiod
0 37 3 0 200 8393a540 nfsio nfsiod
0 36 3 0 200 8393ad20 nfsio nfsiod
0 35 3 0 200 83b48000 physiod physiod
0 34 3 0 200 83aa7a80 aiodoned aiodoned
0 33 3 0 200 83aa7d20 ioflush tstile
0 32 3 0 200 83b482a0 pgdaemon pgdaemon
0 29 3 0 200 83b11800 unpgc unpgc
0 28 3 0 200 83b11560 nd6_timer nd6_timer
0 27 3 0 200 83b112c0 rt_timer rt_timer
0 26 3 0 200 83b11020 vmem_rehash vmem_rehash
0 17 3 0 200 83b48540 mscp_wq mscp_wq
0 16 3 0 200 83b487e0 mscp_wq mscp_wq
0 15 3 0 200 83b48a80 pmfsuspend pmfsuspend
0 14 3 0 200 83b48d20 pmfevent pmfevent
0 13 3 0 200 83b6a020 sopendfree sopendfr
0 12 3 0 200 83b6a2c0 nfssilly nfssilly
0 11 3 0 200 83b6a560 cachegc cachegc
0 10 3 0 200 83b6a800 vrele vrele
0 9 3 0 200 83b6aaa0 vdrain vdrain
0 8 3 0 200 83b6ad40 modunload mod_unld
0 7 3 0 200 83b80000 xcall/0 xcall
0 6 1 0 200 83b802a0 softser/0
0 5 1 0 200 83b80540 softclk/0
0 4 1 0 200 83b807e0 softbio/0
0 3 1 0 200 83b80a80 softnet/0
0 > 2 7 0 201 83b80d20 idle/0
0 1 3 0 200 802cac40 swapper uvm
db>
Fixing BDPs would also help to improve Unibus speed I assume. That
wouldn't be too much worw.
One potential issue is that the kernel is spending a damn large amount
of time in the system these days. Performance is really sluggish, while
the same hardware with another OS really performs much better.
I still don't have any final numbers here, but I can say that in Ultrix,
doing a cvs update on usr/src takes about 2h. NetBSD get stuck (for me)
after maybe 12h, and at that point it have hardly started going through
the files yet...
If I ever get the system to work right, I will be able to provide more
interesting numbers comparing to Ultrix.
Johnny
--
Johnny Billquist || "I'm on a bus
|| on a psychedelic trip
email: bqt%softjar.se@localhost || Reading murder books
pdp is alive! || tryin' to stay hip" - B. Idol
Home |
Main Index |
Thread Index |
Old Index