Subject: kern/30038: LFS panic: lfs_segwrite: ifile read
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <carton@Ivy.NET>
List: netbsd-bugs
Date: 04/23/2005 02:10:00
>Number: 30038
>Category: kern
>Synopsis: LFS panic: lfs_segwrite: ifile read
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Apr 23 02:10:00 +0000 2005
>Originator: Miles Nordin
>Release: NetBSD 3.99.3 (2005-04-07)
>Organization:
Ivy Ministries
>Environment:
System: NetBSD sohryu 3.99.3 NetBSD 3.99.3 (SOHRYU-$Revision: 1.1.1.7 $) #1: Tue Apr 19 13:02:46 EDT 2005 carton@castrovalva:/scratch/src-current/sys/arch/macppc/compile/SOHRYU macppc
Architecture: powerpc
Machine: macppc
>Description:
Script started on Sat Apr 23 00:59:56 2005
ezln:~$ sudo cu -l ttyC1 -s 38400
Password:
^GConnected.
dmesg 0t400
ng fsbn 365776022 of 365776022-365776037 (wd0 bn 366300310; cn 178857 tn 36 sn 2
2), retrying
pdcide0:1: bogus intr
pdcide0:1:0: piomode drive fault
wd0d: device fault reading fsbn 365776022 of 365776022-365776037 (wd0 bn 3663003
10; cn 178857 tn 36 sn 22)
pdcide0:1: bogus intr
panic: lfs_segwrite: ifile read
trap: kernel read DSI trap @ 0x10000084 by 0x3eee38 (DSISR 0x40000000, err=14)
panic: trap
db> t/l
0xd51f4d80: at panic+0x19c
0xd51f4e10: at lfs_segwrite+0x368
0xd51f4e50: at lfs_sync+0x84
0xd51f4e70: at sync_fsync+0xbc
0xd51f4e90: at VOP_FSYNC+0x60
0xd51f4ee0: at sched_sync+0x214
0xd51f4f40: at cpu_switchto+0x44
0xd51f4f50: at ADBDevTable+0x1cf57c
trap: kernel read DSI trap @ 0x10000084 by 0x3eee38 (DSISR 0x40000000, err=14)
panic: trap
Faulted in DDB; continuing...
(gdb) list *(lfs_segwrite+0x368)
0x2d1c6c is in lfs_segwrite (../../../../ufs/lfs/lfs_segment.c:680).
675 (long)bp->b_flags);
676 ++do_panic;
677 }
678 }
679 if (do_panic)
680 panic("dirty blocks");
681 }
682 #endif
683 splx(s);
684 VOP_UNLOCK(vp, 0);
However, the panic message printed is at lfs_segment.c line 598, not the
line 680 that ddb 't' and gdb lead to, and I'm certain the netbsd.gdb
matches the running kernel, so I think there is something wrong with
either ddb or gdb on macppc.
I didn't get a core dump because I think there are no core dumps on
macppc right now? (see attempt below)
db> ps
PID PPID PGRP UID S FLAGS LWPS COMMAND WAIT
1278 1244 1244 0 2 0 1 lfs_cleanerd segment
1244 1 1244 0 2 0 1 lfs_cleanerd wait
1008 0 0 0 2 0x20200 1 lfs_writer lfswrit
595 558 595 405 2 0x4002 1 top poll
558 561 558 405 2 0x4002 1 ksh93 wait
561 527 527 405 2 0x100 1 sshd select
527 341 527 0 2 0x101 1 sshd netio
440 434 440 405 2 0x4002 1 ksh93 select
434 407 407 405 2 0x100 1 sshd select
407 341 407 0 2 0x101 1 sshd netio
438 1 438 0 2 0x4002 1 getty ttyin
414 1 1 0 2 0x4000 1 getty nanosle
404 1 404 0 2 0 1 cron nanosle
410 1 410 0 2 0x80000 1 inetd kqread
341 1 341 0 2 0 1 sshd select
219 1 219 0 2 0 1 mount_mfs mfsidl
157 1 157 0 2 0 1 syslogd
12 0 0 0 2 0x20200 1 aiodoned aiodone
>11 0 0 0 2 0x20200 1 ioflush
10 0 0 0 2 0x20200 1 pagedaemon pgdaemo
9 0 0 0 2 0x20200 1 nfsio nfsidl
8 0 0 0 2 0x20200 1 nfsio nfsidl
7 0 0 0 2 0x20200 1 nfsio nfsidl
6 0 0 0 2 0x20200 1 nfsio nfsidl
5 0 0 0 2 0x20200 1 atapibus0 sccomp
4 0 0 0 2 0x20200 1 scsibus0 sccomp
3 0 0 0 2 0x20200 1 atabus1 atardl
2 0 0 0 2 0x20200 1 atabus0 atath
1 0 1 0 2 0x4000 1 init wait
0 -1 0 0 2 0x20200 1 swapper schedul
db> ps/w
PID COMMAND EMUL PRI UTIME STIME WAIT-MSG WAIT-CHANNEL
1278 lfs_cleanerd netbsd 55 4.95644.4 segment 0xd0158040
1244 lfs_cleanerd netbsd 32 0.0 0.0 wait 0xd6f1e20
1008 lfs_writer netbsd 4 0.0 0.0 lfswriter netbsd:lfs_writer_da
emon
595 top netbsd 24 80.4 57.3 poll netbsd:selwait
558 ksh93 netbsd 32 0.1 1.7 wait 0xd6f1970
561 sshd netbsd 24 15.6 16.5 select netbsd:selwait
527 sshd netbsd 24 0.3 0.0 netio netbsd:ADBDevTable+0
xa167c
440 ksh93 netbsd 24 0.2 2.3 select netbsd:selwait
434 sshd netbsd 24 0.7 0.6 select netbsd:selwait
407 sshd netbsd 24 0.3 0.1 netio netbsd:ADBDevTable+0
xa131c
438 getty netbsd 25 0.0 0.0 ttyin 0x74500c
414 getty netbsd 32 0.2 6.4 nanosleep netbsd:nanowait.0
404 cron netbsd 32 0.7 10.3 nanosleep netbsd:nanowait.0
410 inetd netbsd 24 0.0 0.0 kqread 0xd684000
341 sshd netbsd 24 3.9 0.0 select netbsd:selwait
219 mount_mfs netbsd 32 0.0 0.0 mfsidl 0xdfebd28
157 syslogd netbsd 24 0.2 5.9
12 aiodoned netbsd 4 0.0 1.1 aiodoned netbsd:uvm+0x70
>11 ioflush netbsd 17 0.0 135.5
10 pagedaemon netbsd 4 0.0 2.4 pgdaemon netbsd:uvm+0x64
9 nfsio netbsd 32 0.0 0.2 nfsidl netbsd:nfs_asyncdaem
on+0x38
8 nfsio netbsd 32 0.0 0.2 nfsidl netbsd:nfs_asyncdaem
on+0x28
7 nfsio netbsd 32 0.0 0.5 nfsidl netbsd:nfs_asyncdaem
on+0x18
6 nfsio netbsd 32 0.0 1.2 nfsidl netbsd:nfs_asyncdaem
on+0x8
5 atapibus0 netbsd 16 0.0 0.0 sccomp 0xd00b1a08
4 scsibus0 netbsd 16 0.0 0.0 sccomp 0xd0031b08
3 atabus1 netbsd 16 0.0 0.9 atardl 0xd51d1ed0
2 atabus0 netbsd 16 0.0 0.0 atath 0xd00b1a2c
1 init netbsd 32 0.1 0.9 wait netbsd:ADBDevTable+0
xc4d7c
0 swapper netbsd 4 0.0 0.4 scheduler netbsd:proc0
db> reboot 0x104
tlp0: receive ring overrun
dumpsys: TBD
panic: wdc_exec_command: polled command not done
Stopped in pid 11.1 (ioflush) at netbsd:cpu_Debugger+0x10: lwz r
0, r1, 0x14
db>
>How-To-Repeat:
happened on an idle system after a couple days. the lfs is a little
less than 200GB. it was newfs'd with 2.0 release's newfs_lfs. the machine
has crashed many times, and I've done 'fsck_lfs -f -y /dev/rwd0d' a few
times, and fsck_lfs has crashed while doing this, but before this panic a
full fsck_lfs (not -p) had succeeded so the filesystem was supposedly clean.
>Fix:
not known. it is a development machine, so I can test whatever.