Subject: Re: [PATCH] uvm_map.c
To: Jason Thorpe <thorpej@nas.nasa.gov>
From: J. Scott Kasten <jsk@titan.tetracon-eng.net>
List: netbsd-bugs
Date: 03/01/2000 10:21:33
Well, I'll take another look at the code as I'm not yet 
very familiar with the NetBSD kernel.

I suspect that my immediate responce to this is to back
track my steps a little to verify my results this weekend.

Let me explain:

#0 Barebones 3/80 was built up using DRAM and HD previously
used in a Sparc 1+ that was known to function well.
(The 1+ was built up with more ram and disk for duty as
 a web/mail server.  The spares went into the 3/80.)

#1 Installed the base OS.  Found I could not use "ps".  In
fact, found that the installed kernel did not even appear
to support proc.

#2 Installed kernel sources.  Compiled using the egcs compiler
in the distro.  Proc and ps now worked, but started getting
the kmem_map panics periodically while under heavy load.

#3 Built binutils-2.9.1, and gcc-2.7.2.3.  Played some path
games and got the kernel to build with those.  Still got
kmem_map panic under heavy load.

#4 Researched the list archives, came up with this possible
patch, recompiled using gcc-2.7.2.3.  Kernel up 24 hours
under load, no panic.

#5 System reports "NFS not responding" on my telnet screen.
Hardware console shows display "le0 lost carrier" a dozen
or so times.  Box responds to ICMP (ping), but higher level
protocols are "dead".  Tried ifconfig down/up to no avail.
Rebooted.

#6 System up about 24 hours again, no kmem_map panic.  However
I get the same "leo lost..." and same IP stack lockup.
Replaced hub which was shown to have one bad port, cable,
and aui xcvr which also proves to be questionable.

#7 With new hub, cable, xcvr, and a reboot, box is now up
and stable for over 48 hrs and counting under heavy load.

#8 I concluded that the hub/xcvr problem were unrelated to
the kmem_map panic since the two had appeard in different
places of the time sequence above.  I had beleived that the
hub/xcvr problem spontaneously developed, but that may
or may not be true.

It sounds like I really should research this more.  I'll
drop back to my egcs kernel and my first gcc-2.7.2.3
kernels without my patch and see if the kmem_map panic
still occurs now that the hub/xcvr have been replaced.
I find it unlikely that would be the source of the problem,
but I suppose there is atleast the posibility of a flaky
net interface hosing some state machine in the kernel.

If I am able to produce the kmem_map panic again, I'll
surely save the kcore for some well earned spelunking.

The stats on this system are:

DRAM: 16MB, SWAP 60MB partition and 40MB partition -p 1
Box uses NIS, NFS for user access.  The c++ compiles were
done on the NFS partition using both the original egcs and
binutils and gcc-2.7.2.3 and binutils-2.9.1 in various
combinations.  The code under compilation consisted of
lesstif and ddd packages.  Lesstif takes about 12 hours
on this box, and ddd about 24 hours.  Kernel, about 3-4 hours?


On Tue, Feb 29, 2000 at 04:37:23PM -0800, Jason Thorpe wrote:
> It appears that this code block is in the function uvm_map_submap().  This
> function is used to create submaps of the kernel_map.  I.e. the function
> is called only during bootstrap, and the VA range is already checked.
> 
> I don't think this is the reason your system is losing the way it is.
> 
>         -- Jason R. Thorpe <thorpej@nas.nasa.gov>
> 

-- 
J. Scott Kasten

jsk AT tetracon-eng DOT net

"That wasn't an attack.  It was preemptive retaliation!"