Subject: LFS and Xen3 testing
To: None <current-users@netbsd.org>
From: Daniel Carosone <dan@geek.com.au>
List: current-users
Date: 09/05/2006 13:56:44
--YwTTlJgQ7QoYB9ta
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
I reinstated my LFS-testing setup from a while ago. For convenience
it seemed easier, this time around, to test on a Xen3 domU - but now
it's not clear to me whether the problems I find are due to LFS or
Xen. So, sorry, I'm going to mix together both.
* Sometimes, all disk activity will stop, and something (usually the
cleaner) is stuck in biowait. I suspect this to be a Xen issue.
Dom0 is linux with LVM2 volumes for the xbd backend, domU is
-current a day or two old. It seems most easily (or even only?)=20
triggered when dom0 is busy with CPU-heavy tasks. I saw a commit
go by recently that looked promising for something like this, but
it doesn't seem to have helped this case.
* if I run screen, the screen process takes 100% of the cpu, in state
either "lfs sb" or "lfs_ioco", and can't be killed. The cleaner
and several other things are then in "lfs segl" and the system gets
generally unhappier from there. The whole system (including /tmp)
is all on one root lfs, perhaps this is related to screen's socket
usage in /tmp? It doesn't matter whether screen is run on the xm
console or in a sshd pty. I probably wouldn't have found this if
I'd remembered to enable tmpfs in the kernel, and I'll confirm
whether that affects the issue.
* the kernel prints "lfs_segwrite: loopcount=3D2" every so often, and
just once or twice "lfs_writeinode: looping count=3D2". This happens
every few minutes as the cleaner is running after a crash (via xm
destroy) after one of the above. If this is a diagnostic for
something, it seems to be happening here, in case that's
interesting.
* resize_lfs produces an almost instant, repeatable panic trying to=20
shrink a filesystem:
panic: lfs_rescount
Stopped in pid 138.1 (lfs_cleanerd) at netbsd:cpu_Debugger+0x4: =
popl %ebp
db> tr
cpu_Debugger(c03fc581,d3b3aa48,d3b3aa4c,c0282ade,ccadde8c) at netbsd:cp=
u_Debugger+0x4
panic(c03f3849,0,0,200,c1f69218) at netbsd:panic+0x155
lfs_reserve(c1f69000,ccadde8c,0,ffffffb8,cd511900) at netbsd:lfs_reserv=
e+0x2c1
lfs_create(d3b3aab8,d3b54f50,0,0,1b0713) at netbsd:lfs_create+0x135
VOP_CREATE(ccadde8c,d3b3abb8,d3b3abcc,d3b3aafc,d3b54f50) at netbsd:VOP_=
CREATE+0x31
vn_open(d3b3aba8,602,1a4,d3b41c44,bbbd5000) at netbsd:vn_open+0x274
sys_open(d3b54f50,d3b3ac48,d3b3ac68,0,bbbd5098) at netbsd:sys_open+0xb6
syscall_plain() at netbsd:syscall_plain+0xb3
--- syscall (number 5) ---
0xbbb206cb:
db>=20
I recall this appearing to work last time I tried it, but I may not
have had DIAGNOSTIC in that kernel, more fool me :)
--
Dan,
--YwTTlJgQ7QoYB9ta
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (NetBSD)
iD8DBQFE/PV7EAVxvV4N66cRAvQVAJ4u2+0G1kUj4IccF/yMA01IXdlJzwCfRiAu
WLoKWXp5UwHh9qAM1LXtCAc=
=t6he
-----END PGP SIGNATURE-----
--YwTTlJgQ7QoYB9ta--