Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: dump -X of large LVM based FFSv2 with WAPBL panics
Hello Jaromir,
actually I did a forced fsck on the respective FS while it was unmounted
upfront. To be sure I just ran the command again - it passes with no
errors the second time. When I run dump -X again, the panic still occurs.
Best regards,
Matthias
nuc# fsck -P /dev/mapper/vg0-photo
** /dev/mapper/rvg0-photo
** File system is clean; not checking
nuc# fsck -P -f /dev/mapper/vg0-photo
** /dev/mapper/rvg0-photo
** File system is already clean
** Last Mounted on /p
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
FREE BLK COUNT(S) WRONG IN
SUPERBLK**********************************************************************************************************************************
| 97%
SALVAGE? [yn] y
59411 files, 63408414 used, 35694535 free (2079 frags, 4461557 blocks,
0.0% fragmentation)
***** FILE SYSTEM WAS MODIFIED *****
nuc# fsck -P -f /dev/mapper/vg0-photo
** /dev/mapper/rvg0-photo
** File system is already clean
** Last Mounted on /p
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
59411 files, 63408414 used, 35694535 free (2079 frags, 4461557 blocks,
0.0% fragmentation)
nuc# mount /p
nuc# touch /p/test.ignore
nuc# umount /p
nuc# fsck -P -f /dev/mapper/vg0-photo
** /dev/mapper/rvg0-photo
** File system is already clean
** Last Mounted on /p
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
** Phase 5 - Check Cyl groups
59412 files, 63408414 used, 35694535 free (2079 frags, 4461557 blocks,
0.0% fragmentation)
nuc#
Am 15.11.2017 um 20:29 schrieb Jaromír Doleček:
Hi,
can you try if doing full forced fsck (fsck -f) would resolve this?
I've seen several such persistent panics when I was debugging WAPBL.
Even after kernel fixes I had persistent panics around ffs_newvnode()
due to disk data corruption from previous runs. This is worth trying.
Some day I plan to add some counter, so that actually boot would
actually force fsck every X boots even when clean, similarily what Linux
does with ext3/4.
Jaromir
2017-11-15 12:56 GMT+01:00 Matthias Petermann <matthias%petermann-it.de@localhost
<mailto:matthias%petermann-it.de@localhost>>:
Hello,
on my system I have observed a serious panic when doing FFSv2 dumps
under certain conditions. I did some googling on my own and found
some references regarding the lead symptom
"ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non
zero blocks ffffffffffffff00 or size 0"
but all of them ended up as solved back in 2016. So I wanted to
share my observation here, in the hope somebody can give me some
pointers how the issue could be narrowed down further.
1) Given:
- NetBSD 8.0_BETA (Kernel built from branches/netbsd-8 around
2017-11-06)
NetBSD nuc.local 8.0_BETA NetBSD 8.0_BETA (XEN3_DOM0_XHCI)
#0: Mon Nov 6 14:31:17 CET 2017
admin@nuc.local:/s/src/sys/arch/amd64/compile/XEN3_DOM0_XHCI amd64
- A large (392 GB) LVM volume hosting a FFSv2 filesystem with WAPBL
enabled
(/dev/mapper/vg0-photo mounted at /p)
- (An external USB 3.0 Drive)
2) What I tried:
- make a dump of the aforementioned filesystem, using snapshots
# dump -X -0auf /mnt/photo.0.dump /p
3) What happens then:
- the System crashes, leaving a coredump with with the following
indication:
ffs_newvnode: ino=113 on /p: gen 55fd2f1f/55fd2f1f has non zero
blocks ffffffffffffff00 or size 0
fatal page fault in supervisor mode
trap type 6 code 0x2 rip 0xffffffff8022c0cc cs 0x8 rflags
0x10246 cr2 0xfffffe82deaddf1d ilevel 0x3 rsp 0xfffffe810e6b1eb8
curlwp 0xfffffe827f736000 pid 0.4 lowest kstack 0xfffffe810e6ae2c0
panic: trap
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
trap() at netbsd:trap+0xc6b
--- trap (number 6) ---
mutex_enter() at netbsd:mutex_enter+0xc
biodone2() at netbsd:biodone2+0x9b
biodone2() at netbsd:biodone2+0x9b
biointr() at netbsd:biointr+0x3a
softint_dispatch() at netbsd:softint_dispatch+0xd3
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e6b1ff0
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:
cpu0: End traceback...
dumping to dev 0,1 (offset=168119, size=2076255):
dump
- gdb backtrace shows:
(gdb) target kvm netbsd.3.core
0xffffffff80229545 in cpu_reboot ()
(gdb) bt
#0 0xffffffff80229545 in cpu_reboot ()
#1 0xffffffff809a4afc in vpanic ()
#2 0xffffffff809a4bb0 in panic ()
#3 0xffffffff8022b176 in trap ()
#4 0xffffffff8020113e in alltraps ()
#5 0xffffffff8022c0cc in mutex_enter ()
#6 0xffffffff80a029f5 in wapbl_biodone ()
#7 0xffffffff809e2f20 in biodone2 ()
#8 0xffffffff809e2f20 in biodone2 ()
#9 0xffffffff809e303e in biointr ()
#10 0xffffffff8097bc1d in softint_dispatch ()
#11 0xffffffff80223eef in Xsoftintr ()
(gdb)
4) What I tried afterwards:
- make a dump of the aforementioned filesystem, using NO snapshots
# dump -0auf /mnt/photo.0.dump /p
-> works
- umount the filesystem, enforcing a manual fsck
-> no problems
- dumpfs -s /dev/mapper/vg0-photo
nuc# dumpfs -s /dev/mapper/vg0-photo
file system: /dev/mapper/vg0-photo
format FFSv2
endian little-endian
location 65536 (-b 128)
magic 19540119 time Wed Nov 15 12:26:52 2017
superblock location 65536 id [ 59f8026a 16319237 ]
cylgrp dynamic inodes FFSv2 sblock FFSv2 fslevel 5
nbfree 4461561 ndir 1865 nifree 24770027 nffree
2079
ncg 530 size 100663296 blocks 99102949
bsize 32768 shift 15 mask 0xffff8000
fsize 4096 shift 12 mask 0xfffff000
frag 8 shift 3 fsbtodb 3
bpg 23742 fpg 189936 ipg 46848
minfree 5% optim time maxcontig 2 maxbpg 4096
symlinklen 120 contigsumsize 2
maxfilesize 0x000800800805ffff
nindir 4096 inopb 128
avgfilesize 16384 avgfpdir 64
sblkno 24 cblkno 32 iblkno 40 dblkno 2968
sbsize 4096 cgsize 32768
csaddr 2968 cssize 12288
cgrotor 0 fmod 0 ronly 0 clean 0x01
wapbl version 0x1 location 2 flags 0x0
wapbl loc0 402688128 loc1 131072 loc2 512 loc3 3
flags none
fsmnt /p
volname swuid 0
5) Further observations:
- dump -X of other FSs on the same machine seem to work fine, but
these FSs are smaller
I'd be glad to help identifying the root cause further.
Best regards,
Matthias
--
Matthias Petermann <matthias%petermann-it.de@localhost
<mailto:matthias%petermann-it.de@localhost>> | www.petermann-it.de
<http://www.petermann-it.de>
GnuPG: 0x5C3E6D75 | 5930 86EF 7965 2BBA 6572 C3D7 7B1D A3C3 5C3E 6D75
--
Matthias Petermann <matthias%petermann-it.de@localhost> | www.petermann-it.de
GnuPG: 0x5C3E6D75 | 5930 86EF 7965 2BBA 6572 C3D7 7B1D A3C3 5C3E 6D75
Home |
Main Index |
Thread Index |
Old Index