Subject: Re: [PATCH] uvm_map.c
To: Jason Thorpe <thorpej@nas.nasa.gov>
From: J. Scott Kasten <jsk@titan.tetracon-eng.net>
List: netbsd-bugs
Date: 03/01/2000 10:21:33
Well, I'll take another look at the code as I'm not yet
very familiar with the NetBSD kernel.
I suspect that my immediate responce to this is to back
track my steps a little to verify my results this weekend.
Let me explain:
#0 Barebones 3/80 was built up using DRAM and HD previously
used in a Sparc 1+ that was known to function well.
(The 1+ was built up with more ram and disk for duty as
a web/mail server. The spares went into the 3/80.)
#1 Installed the base OS. Found I could not use "ps". In
fact, found that the installed kernel did not even appear
to support proc.
#2 Installed kernel sources. Compiled using the egcs compiler
in the distro. Proc and ps now worked, but started getting
the kmem_map panics periodically while under heavy load.
#3 Built binutils-2.9.1, and gcc-2.7.2.3. Played some path
games and got the kernel to build with those. Still got
kmem_map panic under heavy load.
#4 Researched the list archives, came up with this possible
patch, recompiled using gcc-2.7.2.3. Kernel up 24 hours
under load, no panic.
#5 System reports "NFS not responding" on my telnet screen.
Hardware console shows display "le0 lost carrier" a dozen
or so times. Box responds to ICMP (ping), but higher level
protocols are "dead". Tried ifconfig down/up to no avail.
Rebooted.
#6 System up about 24 hours again, no kmem_map panic. However
I get the same "leo lost..." and same IP stack lockup.
Replaced hub which was shown to have one bad port, cable,
and aui xcvr which also proves to be questionable.
#7 With new hub, cable, xcvr, and a reboot, box is now up
and stable for over 48 hrs and counting under heavy load.
#8 I concluded that the hub/xcvr problem were unrelated to
the kmem_map panic since the two had appeard in different
places of the time sequence above. I had beleived that the
hub/xcvr problem spontaneously developed, but that may
or may not be true.
It sounds like I really should research this more. I'll
drop back to my egcs kernel and my first gcc-2.7.2.3
kernels without my patch and see if the kmem_map panic
still occurs now that the hub/xcvr have been replaced.
I find it unlikely that would be the source of the problem,
but I suppose there is atleast the posibility of a flaky
net interface hosing some state machine in the kernel.
If I am able to produce the kmem_map panic again, I'll
surely save the kcore for some well earned spelunking.
The stats on this system are:
DRAM: 16MB, SWAP 60MB partition and 40MB partition -p 1
Box uses NIS, NFS for user access. The c++ compiles were
done on the NFS partition using both the original egcs and
binutils and gcc-2.7.2.3 and binutils-2.9.1 in various
combinations. The code under compilation consisted of
lesstif and ddd packages. Lesstif takes about 12 hours
on this box, and ddd about 24 hours. Kernel, about 3-4 hours?
On Tue, Feb 29, 2000 at 04:37:23PM -0800, Jason Thorpe wrote:
> It appears that this code block is in the function uvm_map_submap(). This
> function is used to create submaps of the kernel_map. I.e. the function
> is called only during bootstrap, and the VA range is already checked.
>
> I don't think this is the reason your system is losing the way it is.
>
> -- Jason R. Thorpe <thorpej@nas.nasa.gov>
>
--
J. Scott Kasten
jsk AT tetracon-eng DOT net
"That wasn't an attack. It was preemptive retaliation!"