Subject: kern/23494: panic results in (massive) fs corruption
To: None <gnats-bugs@gnats.netbsd.org>
From: Marc Recht <recht@netbsd.org>
List: netbsd-bugs
Date: 11/19/2003 14:27:40
>Number: 23494
>Category: kern
>Synopsis: panic results in (massive) fs corruption
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Nov 19 17:45:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator: Marc Recht
>Release: NetBSD 1.6ZF
>Organization:
<organization of PR author (multiple lines)>
>Environment:
<The following information is extracted from your kernel. Please>
<append output of "ldd", "ident" where relevant (multiple lines).>
System: NetBSD leeloo.intern.geht.de 1.6ZF NetBSD 1.6ZF (LEELOO) #0: Tue=20
Nov 18 23:28:17 CET 2003=20
marc@leeloo.intern.geht.de:/usr/src/sys/arch/i386/compile/LEELOO i386
Architecture: i386
Machine: i386
>Description:
For a while I'm suffering from panics which result in, sometimes massive,=20
fs corruption. I'm unable to find a pattern yet which triggers this=20
behaviour. It always seems to happen
with medium/high load and I/O (disk + net). The weird thing is what is=20
getting corrupted.
Eg.:
I have /,/var,/home and /usr as seperate partitions (on the same disk) and=20
/tmp as mfs. After doing some I/O on /home the box paniced and after the=20
fsck
/var/log (and ofter stuff in /var), some stuff in /usr (eg.=20
/usr/libexec/getty) and much stuff from / were gone. In / /dev was missing=20
complety as most of /etc.
An accurate way to panic my box is to copy a large amount of from one cgd=20
to another. (I've never managed to copy more that 8GB before the box=20
panics.) But this normally only ends
in unclean disks...
Normally I don't get a crash dump, but today I got one.
panic("blkfree: bad size");
(gdb) bt
#0 0x00000001 in ?? ()
#1 0xc0263c26 in cpu_reboot (howto=3D256, bootstr=3D0x0)
at /usr/src/sys/arch/i386/i386/machdep.c:769
#2 0xc01f6e55 in panic (
fmt=3D0xc0321c18 =
"=C2=DA=E3=F0=FE]=C5=B12\bE\032=E4W\022\030n=DD=B9=EAi\024=D6=A2\rr\016=B7G=C9=
=20
=CD\215=A7;'\204)\205=C1") at /usr/src/sys/kern/subr_prf.c:242
#3 0xc01920b2 in ffs_blkfree (ip=3D0xe85c14d0, bno=3D514, size=3D8192)
at /usr/src/sys/ufs/ffs/ffs_alloc.c:1530
#4 0xc0198151 in ffs_truncate (v=3D0xe85c5ce4)
at /usr/src/sys/ufs/ffs/ffs_inode.c:427
#5 0xc022b2e9 in VOP_TRUNCATE (vp=3D0xe85c05dc, length=3D0, flags=3D0,
cred=3D0xc1cfff00, p=3D0xe858b9ac) at /usr/src/sys/kern/vnode_if.c:1490
#6 0xc01ae9fc in ufs_setattr (v=3D0xe85c5d84)
at /usr/src/sys/ufs/ufs/ufs_vnops.c:441
#7 0xc022a917 in VOP_SETATTR (vp=3D0xe85c05dc, vap=3D0xe85c5dd4,=20
cred=3D0xc1cfff00,
p=3D0xe858b9ac) at /usr/src/sys/kern/vnode_if.c:388
#8 0xc0229636 in vn_open (ndp=3D0xe85c5e84, fmode=3D1026, cmode=3D420)
at /usr/src/sys/kern/vfs_vnops.c:284
#9 0xc0224685 in sys_open (l=3D0xe855e7f8, v=3D0xe85c5f64, =
retval=3D0xe85c5f5c)
at /usr/src/sys/kern/vfs_syscalls.c:1120
#10 0xc026e1e4 in syscall_plain (frame=3D0xe85c5fa8)
at /usr/src/sys/arch/i386/i386/syscall.c:159
$NetBSD: ffs_alloc.c,v 1.70 2003/09/05 21:58:35 itojun Exp $
$NetBSD: ffs_inode.c,v 1.60 2003/08/07 16:34:30 agc Exp $
$NetBSD: ufs_vnops.c,v 1.109 2003/11/08 06:38:10 dbj Exp $
$NetBSD: subr_prf.c,v 1.93 2003/08/07 16:31:53 agc Exp $
$NetBSD: vfs_syscalls.c,v 1.201 2003/11/15 01:19:38 thorpej Exp $
$NetBSD: vfs_vnops.c,v 1.75 2003/10/15 11:29:01 hannken Exp $
$NetBSD: vnode_if.c,v 1.45 2003/08/07 16:32:05 agc Exp $
$NetBSD: machdep.c,v 1.543 2003/10/28 22:52:53 mycroft Exp $
$NetBSD: syscall.c,v 1.27 2003/10/31 03:28:13 simonb Exp $
$NetBSD: vm_machdep.c,v 1.112 2003/10/27 14:11:47 junyoung Exp $
(I can put the core for this kernel online.)
The controller is an onboard VIA controller (VIA Technologies VT8233 ATA100 =
controller).
$NetBSD: pciide_machdep.c,v 1.3 2003/10/30 21:19:54 fvdl Exp $
$NetBSD: pciide_common.c,v 1.2 2003/10/23 19:29:35 bouyer Exp $
I'm pretty sure that the disks and the cables are ok.
<precise description of the problem (multiple lines)>
>How-To-Repeat:
unknown
<code/input/activities to reproduce the problem (multiple lines)>
>Fix:
unknown
<how to correct or work around the problem, if known (multiple lines)>
>Release-Note:
>Audit-Trail:
>Unformatted:
<Please check that the above is correct for the bug being reported,>
<and append source date of snapshot, if applicable (one line).>