bin/45933: fsck_ffs can't find wapbl journal on 4k-block disks

To: gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: bin/45933: fsck_ffs can't find wapbl journal on 4k-block disks
From: krinsky%bantha.org@localhost
Date: Mon, 6 Feb 2012 03:30:00 +0000 (UTC)
>Number:         45933
>Category:       bin
>Synopsis:       fsck_ffs can't find wapbl journal on 4k-block disks
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 06 03:30:00 +0000 2012
>Originator:     David Krinsky
>Release:        NetBSD 5.99.59
>Organization:
>Environment:
        
        
System: NetBSD foundation.krinsky.net 5.99.59 NetBSD 5.99.59 
(NEWBANTHA_XEN3_DOM0) #0: Thu Jan 12 20:56:21 EST 2012 
root%foundation.krinsky.net@localhost:/usr/obj/sys/arch/amd64/compile/NEWBANTHA_XEN3_DOM0
 amd64
Architecture: x86_64
Machine: amd64

fsck_ffs ident:
     $NetBSD: crt0-common.c,v 1.7 2011/06/30 20:07:35 matt Exp $
     $NetBSD: crt0.S,v 1.3 2011/07/01 02:59:05 joerg Exp $
     $NetBSD: crti.S,v 1.1 2010/08/07 18:01:35 joerg Exp $
     $NetBSD: crtbegin.S,v 1.2 2010/11/30 18:37:59 joerg Exp $
     $NetBSD: dir.c,v 1.53 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: inode.c,v 1.64 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: main.c,v 1.80 2011/08/29 14:34:59 joerg Exp $
     $NetBSD: pass1.c,v 1.49 2011/08/14 12:32:01 christos Exp $
     $NetBSD: pass1b.c,v 1.22 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: pass2.c,v 1.48 2011/08/14 12:32:01 christos Exp $
     $NetBSD: pass3.c,v 1.20 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: pass4.c,v 1.26 2011/03/06 17:08:16 bouyer Exp $
     $NetBSD: pass5.c,v 1.49 2011/08/14 12:32:01 christos Exp $
     $NetBSD: fsutil.c,v 1.20 2011/06/09 19:57:50 christos Exp $
     $NetBSD: setup.c,v 1.94 2011/08/14 12:32:01 christos Exp $
     $NetBSD: utilities.c,v 1.60 2011/06/09 19:57:52 christos Exp $
     $NetBSD: ffs_bswap.c,v 1.35 2011/03/06 17:08:38 bouyer Exp $
     $NetBSD: ffs_subr.c,v 1.47 2011/08/14 12:37:09 christos Exp $
     $NetBSD: ffs_tables.c,v 1.9 2005/12/11 12:25:25 christos Exp $
     $NetBSD: ffs_appleufs.c,v 1.12 2011/11/19 22:51:31 tls Exp $
     $NetBSD: partutil.c,v 1.11 2011/11/13 22:04:51 christos Exp $
     $NetBSD: quota2_subr.c,v 1.4 2011/06/07 14:56:13 bouyer Exp $
     $NetBSD: progress.c,v 1.5 2009/04/11 06:48:36 lukem Exp $
     $NetBSD: vfs_wapbl.c,v 1.48 2011/12/02 12:38:59 yamt Exp $
     $NetBSD: wapbl.c,v 1.5 2010/03/06 11:31:40 mlelstv Exp $
     $NetBSD: crtend.S,v 1.1 2010/08/07 18:01:34 joerg Exp $
     $NetBSD: crtn.S,v 1.1 2010/08/07 18:01:35 joerg Exp $

>Description:

When attempting to fsck an FFSv2 filesystem that had been mounted with "-o log"
but then not cleanly unmounted, fsck_ffs fails to find and read the journal if
the filesystem is on a device with a sector size of 4K:

    foundation# fsck /dev/mapper/rvgr00-wapbltest
    ** /dev/mapper/rvgr00-wapbltest

    CANNOT READ: BLK 8388608
    CONTINUE? [yn] y

    THE FOLLOWING DISK SECTORS COULD NOT BE READ: 67108864 (8388608), 67108865 
(8388608),
    Unrecognized wapbl magic: 0x00000000
    UNABLE TO READ JOURNAL FOR REPLAY
    CONTINUE? [yn] 

(This example is from a 4096 MiB FFSv2 filesystem located on a 4160
MiB LVM logical volume, which is ultimately resident on a RAID1
RAIDframe array of two 2TB Seagate Barracuda Green drives--which have
4K disk sectors.  But I've observed the same thing on filesystems
created on the same drive directly with no RAID or LVM.  Note also
that the only crashes I've experienced to trigger this have been by
killing a Xen DomU with xm destroy, but the problem doesn't appear
Xen-related; the DomUs, like the Dom0 above, are running recent
versions of -current for amd64.)

Block 8388608 is the correct location of the WAPBL journal in 512-byte blocks;
manually doing 

    dd if=/dev/mapper/rvgr00-wapbltest of=/tmp/whatever bs=512 count=131072 
skip=8388608

succeeds, and /tmp/whatever is not just zeroes.
        
>How-To-Repeat:
    - Create an FFSv2 filesystem on a device with 4K blocks (I have been 
leaving space 
        for a WAPBL journal after the filesystem but it seems unlikely that 
this is
        necessary.)
    - mount -o log /dev/{filesystem} /mnt
    - Crash
    - fsck /dev/{filesystem}

>Fix:

The problem appears to be that check_wapbl() in sbin/fsck_ffs/wapbl.c calls
wapbl_replay_start with addr=8388608 and blksize=512, as apparently specified 
in the
superblock:

(gdb) bt
#0  bread (fd=5, buf=0x7f7ff7b1e000 "", blk=8388608, size=1024)
    at /usr/src/sbin/fsck_ffs/utilities.c:342
#1  0x000000000041cf82 in wapbl_read (data=0x7f7ff7b1e000, len=1024, devvp=0x0, 
pbn=8388608)
    at /usr/src/sbin/fsck_ffs/wapbl.c:84
#2  0x000000000041c426 in wapbl_replay_start (wrp=0x624850, vp=0x0, 
off=8388608, 
    count=131072, blksize=512) at /usr/src/sys/kern/vfs_wapbl.c:2421
#3  0x000000000041d4c3 in check_wapbl () at /usr/src/sbin/fsck_ffs/wapbl.c:210
#4  0x0000000000411d68 in setup (dev=0x7f7ff7b01060 
"/dev/mapper/rvgr00-wapbltest", 
    origdev=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest")
    at /usr/src/sbin/fsck_ffs/setup.c:185
#5  0x0000000000408f35 in checkfilesys (
    filesys=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest", 
    origfs=0x7f7ff7b01060 "/dev/mapper/rvgr00-wapbltest", child=0)
    at /usr/src/sbin/fsck_ffs/main.c:306
#6  0x0000000000408de7 in main (argc=0, argv=0x7f7fffffdc60)
    at /usr/src/sbin/fsck_ffs/main.c:245
(gdb) fr 3
#3  0x000000000041d4c3 in check_wapbl () at /usr/src/sbin/fsck_ffs/wapbl.c:210
210                                     error = wapbl_replay_start(
(gdb) l
205                                     sbdirty();
206                                     ret = FSCK_EXIT_CHECK_FAILED;
207                                     break;
208                             }
209                             if (sblock->fs_flags & FS_DOWAPBL) {
210                                     error = wapbl_replay_start(
211                                         &wapbl_replay, 0, addr, count, 
blksize);
212                                     if (error) {
213                                             pfatal(
214                                                "UNABLE TO READ JOURNAL FOR 
REPLAY");
(gdb) p addr
$23 = 8388608
(gdb) p blksize
$24 = 512

But then wapbl_replay_start doesn't pass this information on to wapbl_read--it 
just
calls wapbl_read with the address in 512-byte blocks (as "pbn") without 
specifying
the blocksize:

(gdb) fr 2
#2  0x000000000041c426 in wapbl_replay_start (wrp=0x624850, vp=0x0, 
off=8388608, 
    count=131072, blksize=512) at /usr/src/sys/kern/vfs_wapbl.c:2421
2421            error = wapbl_read(scratch, 2<<log_dev_bshift, devvp, pbn);
(gdb) l
2416    
2417            pbn = logpbn;
2418    #ifdef _KERNEL
2419            pbn = btodb(pbn << log_dev_bshift);
2420    #endif
2421            error = wapbl_read(scratch, 2<<log_dev_bshift, devvp, pbn);
2422            if (error)
2423                    goto errout;
2424    
2425            wch = (struct wapbl_wc_header *)scratch;
(gdb) p pbn
$25 = 8388608

and the ultimate result is that bread tries to pread from a block way beyond 
the end of the
partition (i.e., 4096*pbn bytes into the partition instead of 512*pbn bytes), 
because
dev_bsize == 4096:

(gdb) fr 0
#0  bread (fd=5, buf=0x7f7ff7b1e000 "", blk=8388608, size=1024)
    at /usr/src/sbin/fsck_ffs/utilities.c:344
344             if ((pread(fd, buf, (int)size, offset) == size) &&
(gdb) l
339             int i, errs;
340             off_t offset;
341     
342             offset = blk;
343             offset *= dev_bsize;
344             if ((pread(fd, buf, (int)size, offset) == size) &&
345                 read_wapbl(buf, size, blk) == 0)
346                     return (0);
347             rwerror("READ", blk);
348             errs = 0;
(gdb) p offset
$37 = 34359738368
(gdb) p blk
$38 = 8388608
(gdb) p dev_bsize
$39 = 4096

I am not sufficiently familiar with this code to tell whether the cleanest 
solution
is to pass the blocksize further down, to recalculate pbn in terms of the larger
correct blocksize, to make sure the location of the log is stored in units that 
match
the device, or something else.

>Unformatted:
Prev by Date: Re: PR/45932 CVS commit: src/sys/dev/ic
Next by Date: PR/45932 CVS commit: src/sys/dev/isa
Previous by Thread: PR/45932 CVS commit: src/sys/dev/ic
Next by Thread: Re: bin/45933: fsck_ffs can't find wapbl journal on 4k-block disks
Indexes:
Home | Main Index | Thread Index | Old Index