Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: build.sh soft-halts machine on 4.99.49 kernel, with 4.99.48 userland.
On Friday 11 January 2008, Andrew Doran wrote:
> On Fri, Jan 11, 2008 at 12:36:58PM -0800, Marc Tooley wrote:
> > uname -a:
> >
> > NetBSD shog 4.99.49 NetBSD 4.99.49 (shog) #0: Thu Jan 10 21:01:10
> > PST 2008 root@shog:/v/src-current-build/sys/arch/i386/compile/shog
> > i386
> >
> > The machine itself is:
> >
> > . Pentium D 930 (obviously running in 32-bit mode.)
> > . Intel D945GNT motherboard
> > . 2.5 GB RAM (already run through memtest86+)
> >
> > Kenel is an i386 debug kernel built from sources rsync'd from I
> > believe January 9. Userland was from 4.99.48, which I believe was
> > sync'd and built about two weeks earlier.
> >
> > The symptoms are that the machine will happily churn along doing
> > its thing with a build.sh -j 4 until.. suddenly it'll stop.
> > Interactive bash commands seem to function, until I try to create a
> > new process or a run a non-builtin bash command. Then that session
> > will simply never return.
> >
> > I can hit enter and the keyboard is responsive. Breaking into the
> > kernel debugger gives me a list of processes that are sitting in
> > vm_map.
> >
> > I have no backtraces, as there is no panic. It's just sitting
> > there, responsive to keyboard input and *already running* network
> > login sessions; but nothing new gets done.
> >
> > I was trying to watch vmstat -i but I lost my screen session.
> >
> > Hints appreciated.
>
> What does 'ps axs' or 'ps/w' from the debugger say? What kind of file
> system configuration does the machine have?
>
> Cheers,
> Andrew
Hello Andrew,
I will properly retrieve that information for you this evening. show
procs from the debugger showed normal process lists. I pared down
everything hoping to nail down a particular culprit: even shut down
syslogd. The only thing "running" (or not, since everything was paused)
was the build, some child cc1, and so on.
The filesystem is standard ffs--on top of a RAIDFrame raid0 striped
between two 500GB drives (total 1TB or so.) The build is happening on
the raid0e partition, which is a volume spanning the whole raid0.
I'll double-check the actual filesystem type (v1 or v2.. I think it was
newfs -o time -O 2) tonight, as it's paused again.
One positive thing: I was able to log in just a moment ago and run a
single command after about five hours of paused-ness.
ps auxww showed:
load averages: 4.99, 4.97, 4.92 up 0 days, 6:06 03:08:35
18 processes: 17 sleeping, 1 on processor
CPU0 states: 0.0% user, 0.0% nice, 0.0% system,0.0% interrupt, 100% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system,0.0% interrupt, 100% idle
Memory: 1143M Act, 316M Inact,780K Wired,4384K Exec,1443M File,774M Free
Swap: 2048M Total, 2048M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU
COMMAND
0 root 125 0 0K 14M schedu/0 2:17 0.00% 0.00%
[system]
420 root 85 0 752K 1248K vm_map/0 0:38 0.00% 0.00% cp
276 root 85 0 756K 876K vm_map/0 0:10 0.00% 0.00% cron
413 root 85 0 752K 1292K inoded/0 0:06 0.00% 0.00% rm
302 root 85 0 768K 1372K vm_map/0 0:05 0.00% 0.00%
screen-4.0.3
397 root 85 0 752K 1256K inoded/0 0:05 0.00% 0.00% rm
115 root 85 0 760K 1012K kqread/0 0:04 0.00% 0.00%
syslogd
464 root 85 0 752K 828K inoded/1 0:04 0.00% 0.00% rm
299 root 85 0 2828K 2736K vm_map/1 0:02 0.00% 0.00% bash
454 root 42 0 756K 1168K CPU/0 0:00 0.00% 0.00% top
497 root 85 0 768K 3084K select/1 0:00 0.00% 0.00% sshd
425 root 85 0 2828K 2748K vm_map/0 0:00 0.00% 0.00% bash
310 root 85 0 2828K 2748K wait/0 0:00 0.00% 0.00% bash
444 root 85 0 2824K 2728K wait/0 0:00 0.00% 0.00% bash
259 root 85 0 768K 2060K select/1 0:00 0.00% 0.00% sshd
... and then when I tried another command (that would be 'w') it's
paused-but-letting-me-type-stuff again.
Home |
Main Index |
Thread Index |
Old Index