netbsd-bugs: kern/30038: LFS panic: lfs

Subject: kern/30038: LFS panic: lfs_segwrite: ifile read
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <carton@Ivy.NET>
List: netbsd-bugs
Date: 04/23/2005 02:10:00
>Number:         30038
>Category:       kern
>Synopsis:       LFS panic: lfs_segwrite: ifile read
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 23 02:10:00 +0000 2005
>Originator:     Miles Nordin
>Release:        NetBSD 3.99.3 (2005-04-07)
>Organization:
Ivy Ministries
>Environment:
System: NetBSD sohryu 3.99.3 NetBSD 3.99.3 (SOHRYU-$Revision: 1.1.1.7 $) #1: Tue Apr 19 13:02:46 EDT 2005  carton@castrovalva:/scratch/src-current/sys/arch/macppc/compile/SOHRYU macppc
Architecture: powerpc
Machine: macppc
>Description:
Script started on Sat Apr 23 00:59:56 2005
ezln:~$ sudo cu -l ttyC1 -s 38400
Password:
^GConnected.
dmesg 0t400
ng fsbn 365776022 of 365776022-365776037 (wd0 bn 366300310; cn 178857 tn 36 sn 2
2), retrying
pdcide0:1: bogus intr
pdcide0:1:0: piomode drive fault
wd0d: device fault reading fsbn 365776022 of 365776022-365776037 (wd0 bn 3663003
10; cn 178857 tn 36 sn 22)
pdcide0:1: bogus intr
panic: lfs_segwrite: ifile read
trap: kernel read DSI trap @ 0x10000084 by 0x3eee38 (DSISR 0x40000000, err=14)
panic: trap
db> t/l
0xd51f4d80: at panic+0x19c
0xd51f4e10: at lfs_segwrite+0x368
0xd51f4e50: at lfs_sync+0x84
0xd51f4e70: at sync_fsync+0xbc
0xd51f4e90: at VOP_FSYNC+0x60
0xd51f4ee0: at sched_sync+0x214
0xd51f4f40: at cpu_switchto+0x44
0xd51f4f50: at ADBDevTable+0x1cf57c
trap: kernel read DSI trap @ 0x10000084 by 0x3eee38 (DSISR 0x40000000, err=14)
panic: trap
Faulted in DDB; continuing...

(gdb) list *(lfs_segwrite+0x368)
0x2d1c6c is in lfs_segwrite (../../../../ufs/lfs/lfs_segment.c:680).
675                                                     (long)bp->b_flags);
676                                             ++do_panic;
677                                     }
678                             }
679                             if (do_panic)
680                                     panic("dirty blocks");
681                     }
682     #endif
683                     splx(s);
684                     VOP_UNLOCK(vp, 0);

However, the panic message printed is at lfs_segment.c line 598, not the 
line 680 that ddb 't' and gdb lead to, and I'm certain the netbsd.gdb 
matches the running kernel, so I think there is something wrong with 
either ddb or gdb on macppc.

I didn't get a core dump because I think there are no core dumps on 
macppc right now?  (see attempt below)

db> ps
 PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    WAIT
 1278          1244     1244          0 2       0    1     lfs_cleanerd segment
 1244             1     1244          0 2       0    1     lfs_cleanerd    wait
 1008             0        0          0 2 0x20200    1       lfs_writer lfswrit
 595            558      595        405 2  0x4002    1              top    poll
 558            561      558        405 2  0x4002    1            ksh93    wait
 561            527      527        405 2   0x100    1             sshd  select
 527            341      527          0 2   0x101    1             sshd   netio
 440            434      440        405 2  0x4002    1            ksh93  select
 434            407      407        405 2   0x100    1             sshd  select
 407            341      407          0 2   0x101    1             sshd   netio
 438              1      438          0 2  0x4002    1            getty   ttyin
 414              1        1          0 2  0x4000    1            getty nanosle
 404              1      404          0 2       0    1             cron nanosle
 410              1      410          0 2 0x80000    1            inetd  kqread
 341              1      341          0 2       0    1             sshd  select
 219              1      219          0 2       0    1        mount_mfs  mfsidl
 157              1      157          0 2       0    1          syslogd
 12               0        0          0 2 0x20200    1         aiodoned aiodone
>11               0        0          0 2 0x20200    1          ioflush
 10               0        0          0 2 0x20200    1       pagedaemon pgdaemo
 9                0        0          0 2 0x20200    1            nfsio  nfsidl
 8                0        0          0 2 0x20200    1            nfsio  nfsidl
 7                0        0          0 2 0x20200    1            nfsio  nfsidl
 6                0        0          0 2 0x20200    1            nfsio  nfsidl
 5                0        0          0 2 0x20200    1        atapibus0  sccomp
 4                0        0          0 2 0x20200    1         scsibus0  sccomp
 3                0        0          0 2 0x20200    1          atabus1  atardl
 2                0        0          0 2 0x20200    1          atabus0   atath
 1                0        1          0 2  0x4000    1             init    wait
 0               -1        0          0 2 0x20200    1          swapper schedul
db> ps/w
 PID          COMMAND     EMUL  PRI UTIME STIME WAIT-MSG    WAIT-CHANNEL
 1278      lfs_cleanerd   netbsd   55   4.95644.4 segment     0xd0158040
 1244      lfs_cleanerd   netbsd   32   0.0   0.0 wait        0xd6f1e20
 1008      lfs_writer   netbsd    4   0.0   0.0 lfswriter   netbsd:lfs_writer_da
emon
 595              top   netbsd   24  80.4  57.3 poll        netbsd:selwait
 558            ksh93   netbsd   32   0.1   1.7 wait        0xd6f1970
 561             sshd   netbsd   24  15.6  16.5 select      netbsd:selwait
 527             sshd   netbsd   24   0.3   0.0 netio       netbsd:ADBDevTable+0
xa167c
 440            ksh93   netbsd   24   0.2   2.3 select      netbsd:selwait
 434             sshd   netbsd   24   0.7   0.6 select      netbsd:selwait
 407             sshd   netbsd   24   0.3   0.1 netio       netbsd:ADBDevTable+0
xa131c
 438            getty   netbsd   25   0.0   0.0 ttyin       0x74500c
 414            getty   netbsd   32   0.2   6.4 nanosleep   netbsd:nanowait.0
 404             cron   netbsd   32   0.7  10.3 nanosleep   netbsd:nanowait.0
 410            inetd   netbsd   24   0.0   0.0 kqread      0xd684000
 341             sshd   netbsd   24   3.9   0.0 select      netbsd:selwait
 219        mount_mfs   netbsd   32   0.0   0.0 mfsidl      0xdfebd28
 157          syslogd   netbsd   24   0.2   5.9
 12          aiodoned   netbsd    4   0.0   1.1 aiodoned    netbsd:uvm+0x70
>11           ioflush   netbsd   17   0.0 135.5
 10        pagedaemon   netbsd    4   0.0   2.4 pgdaemon    netbsd:uvm+0x64
 9              nfsio   netbsd   32   0.0   0.2 nfsidl      netbsd:nfs_asyncdaem
on+0x38
 8              nfsio   netbsd   32   0.0   0.2 nfsidl      netbsd:nfs_asyncdaem
on+0x28
 7              nfsio   netbsd   32   0.0   0.5 nfsidl      netbsd:nfs_asyncdaem
on+0x18
 6              nfsio   netbsd   32   0.0   1.2 nfsidl      netbsd:nfs_asyncdaem
on+0x8
 5          atapibus0   netbsd   16   0.0   0.0 sccomp      0xd00b1a08
 4           scsibus0   netbsd   16   0.0   0.0 sccomp      0xd0031b08
 3            atabus1   netbsd   16   0.0   0.9 atardl      0xd51d1ed0
 2            atabus0   netbsd   16   0.0   0.0 atath       0xd00b1a2c
 1               init   netbsd   32   0.1   0.9 wait        netbsd:ADBDevTable+0
xc4d7c
 0            swapper   netbsd    4   0.0   0.4 scheduler   netbsd:proc0
db> reboot 0x104
tlp0: receive ring overrun
dumpsys: TBD
panic: wdc_exec_command: polled command not done
Stopped in pid 11.1 (ioflush) at        netbsd:cpu_Debugger+0x10:       lwz     r
0, r1, 0x14
db> 
>How-To-Repeat:
happened on an idle system after a couple days.  the lfs is a little 
less than 200GB.  it was newfs'd with 2.0 release's newfs_lfs.  the machine 
has crashed many times, and I've done 'fsck_lfs -f -y /dev/rwd0d' a few 
times, and fsck_lfs has crashed while doing this, but before this panic a 
full fsck_lfs (not -p) had succeeded so the filesystem was supposedly clean.
>Fix:
not known.  it is a development machine, so I can test whatever.