Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Strange hang on 4.99.49
On Tuesday 22 January 2008, Paul Goyette wrote:
> On Wed, 23 Jan 2008, Zafer Aydogan wrote:
> > 2008/1/22, Paul Goyette <paul%whooppee.com@localhost>:
> >> I'm running a kernel + userland from sources as of just a few
> >> hours ago.
> >>
> >> I started up a build.sh with -j 4 and all of a sudden, after an
> >> hour or more of running, the system just went completely idle even
> >> though the build is nowhere near complete.
> >>
> >> Looking around, I found three cc1 processes (only 3, not 4). One
> >> of them was in a 'vnode' state while the other two are in
> >> 'vnlock'. Both my /usr/src and /usr/obj are null-mounts
> >>
> >> /dev/wd0a on / type ffs (NFS exported, local)
> >> /dev/wd0e on /var type ffs (local)
> >> /dev/wd0g on /home type ffs (NFS exported, local)
> >>
> >>>>> /dev/wd0h on /build type ffs (soft dependencies, NFS
> >>>>> exported, local)
> >>
> >> /dev/wd1a on /amanda type ffs (local)
> >>
> >>>>> /build/src on /usr/src type null (local)
> >>>>> /build/obj on /usr/obj type null (local)
> >>
> >> /build/xsrc on /usr/xsrc type null (local)
> >> /build/pkgsrc on /usr/pkgsrc type null (local)
> >> kernfs on /kern type kernfs (local)
> >> ptyfs on /dev/pts type ptyfs (local)
> >> tmpfs on /tmp type tmpfs (local)
> >>
> >> Is there some sort of locking race condition that I've managed to
> >> hit? If so, is there a way to avoid it?
> >
> > I'm having the same problems with LFS.
> > Processes get stuck in vnode, especially when read/write is
> > involved. FFS seems not to be affected.
>
> Well, since I couldn't kill any of the processes, I tried to shut the
> system down. It hung for several hours at "dismounting file systems"
> and I finally had to hit the reset button.
>
> If there's anything I can do to help debug this, I'll be happy to
> help. In the mean time I'm going to try again to see if I can get a
> build.sh running - this time I think I'll try without -j4 and see if
> it gets any further.
I had an almost identical problem happen with my system. System specs:
Intel(R) Pentium(R) D CPU 3.00GHz (dual-core, EM64T)
Intel D945GNT motherboard
total memory = 3062 MB
... in my case, a long time ago the AMD64 port basically refused to run
and would crash constantly (port-amd64/34122,) so I switched to running
in 32-bit i386 mode. From about 3.99.23 up until somewhere around
4.99.15 or so, i386 was rock-solid on that machine. Literally, the
3.99.23 kernel was so solid I have an uptime of well over a year on
another machine:
:?:09:56:53 ~# uptime
9:56AM up 381 days, 3:26, 20 users, load averages: 0.21, 0.30, 0.28
... and that's running multiple MySQL databases, ntpd, Apache, thousands
of emails in and out per day, it's acting as an OpenVPN hub for about
six machines, an active NFS server, a Squid-based anonymizing web
proxy, a BIND9 name server, a busy Samba server for about five Windows
machines, a CVS pserver for NetBSD, DragonFlyBSD, MythTV, and a half
dozen others, and is running about six Perforce daemons on it tracking
hundreds of thousands of files from lots of open source applications
along with local branches filled with customizations and modifications.
And then recently when I tried to update to -current on the other
machine (this is back to the Pentium D again) around 4.99.45 or so, it
began getting strange freezes. Existing processes kept going, but new
processes would fail with complaints about too many processes in the
system (when there were only about four of them running.) It was a lot
like the livelock problems reported recently on -sparc64.
A build.sh on it would trigger it within a dozen minutes, and
the machine would need to be reset. Andrew Doran thought I was running
softdeps, which I was, but when I shut down softdeps the issues
continued.
http://mail-index.netbsd.org/current-users/2008/01/11/0018.html
http://mail-index.netbsd.org/current-users/2008/01/11/0020.html
I've now updated the system to the AMD64 port, 4.99.49 updated as of a
few days ago, and the original installation problems are fixed, but
building anything on it returns all sorts of nasty problems.
I get these errors in dmesg:
free inode /v/3430621 had 4 blocks
free inode /v/3430622 had 4 blocks
free inode /v/3430623 had 4 blocks
free inode /v/3430624 had 4 blocks
And when building various versions of NetBSD:
/v/netbsd-build/netbsd-3.1_STABLE/i386/TOOLS/bin/i386--netbsdelf-strip -g -o
netbsd netbsd.gdb
echo netbsd |
GZIP=-9 /v/netbsd-build/netbsd-3.1_STABLE/i386/TOOLS/bin/nbpax -O -zw -M -N
/v/src-3-build/etc -f
/v/netbsd-build/netbsd-3.1_STABLE/i386/REL/i386/binary/sets/kern-GENERIC.tgz
[1] Segmentation fault (core dumped) GZIP=-9 /v/netbs...
And when building various packages from pkgsrc inside a pkg_comp chroot:
Processing hints file hints/netbsd.pl
Unable to find a perl 5 (by these names: ../../miniperl miniperl perl
perl5 perl5.8.8, in these
dirs: ../.. /pkg_comp/obj/pkgsrc/lang/perl5/default/.wrapper/bin
/pkg_comp/obj/pkgsrc/lang/perl5/default/.buildlink/bin
/pkg_comp/obj/pkgsrc/lang/perl5/default/.tools/bin
/pkg_comp/obj/pkgsrc/lang/perl5/default/.gcc/bin /usr/pkg/bin /sbin /usr/sbin
/bin /usr/bin /usr/pkg/sbin /usr/pkg/bin /usr/X11R6/bin /usr/local/sbin
/usr/local/bin /usr/pkg/bin /usr/X11R6/bin /usr/pkg/bin)
Writing Makefile for DynaLoader
sh: /pkg_comp/obj/pkgsrc/lang/perl5/default/perl-5.8.8/ext/DynaLoader/0:
not found
*** Error code 127
... while others are filled with:
/etc/ld.so.conf: invalid/unknown sysctl for libm.so.0 (22)
/etc/ld.so.conf: invalid/unknown sysctl for libm.so.0 (22)
/etc/ld.so.conf: invalid/unknown sysctl for libm.so.0 (22)
Anyway, off to try an update. :)
-Marc
Home |
Main Index |
Thread Index |
Old Index