current-users: Re: fsck problems

Subject: Re: fsck problems
To: None <current-users@sun-lamp.cs.berkeley.edu>
From: Juergen Keil <jk@tools.de>
List: current-users
Date: 07/27/1994 14:37:32
In article <Pine.3.89.9407270926.M301-0100000@newdaisy.ee.und.ac.za>
barrett@daisy.ee.und.ac.za (Alan Barrett) writes:

> I have been having trouble with fsck for several weeks now.  My root
> partition is in a state that the current fsck is unable to fix, and I
> don't like that.  Sometimes fsck core dumps with SIGSEGV, and sometimes
> it doesn't.  On the occasions that it doesn't coredump, it leaves the
> disk in an inconsistent state, which sometimes leads to kernel panics
> (something about "mangled directory").  An old fsck from -0.9 also
> SIGSEGV's.

I've had similar problems when upgrading to current last week. The
affected filesystem still uses the old inode format. I've found two
bugs in fsck:

1. On a filesystem using the old inode format, fsck always
   generates new directory entries (i.e. in lost+found) in the
   new directory record format!

2. If a directory is corrupted exactly after the '..' entry (e.g. because
   of 1.), fsck crashes with a segmentation violation. Reason is a
   directory record with d_reclen > DIRBLKSIZE. This crashes in dirscan,
   where the damaged directory record is copied into a local variable
   of exactly DIRBLKSIZE bytes.

   The record with d_reclen > DIRBLKSIZE is produced in 'fsck_readdir',
   which can be called recursively via dofix->direrror->fileerror->
   getpathname->...

   fsck_readdir increments the '..' entry's d_reclen field twice by
   the amount of bogus data contained in the directory block following
   after the '..' entry.

I suggest the following patch, which should fix both problems.

===================================================================
RCS file: /home/cvs.kurt/NetBSD/sbin/fsck/dir.c,v
retrieving revision 1.1.1.3
diff -c -r1.1.1.3 dir.c
*** 1.1.1.3	1994/07/14 12:28:45
--- dir.c	1994/07/21 19:11:32
***************
*** 190,202 ****
--- 190,214 ----
  	ndp = (struct direct *)(bp->b_un.b_buf + idesc->id_loc);
  	if (idesc->id_loc < blksiz && idesc->id_filesize > 0 &&
  	    dircheck(idesc, ndp) == 0) {
+ 		long fixed_reclen;
+ 
  		size = DIRBLKSIZ - (idesc->id_loc % DIRBLKSIZ);
  		idesc->id_loc += size;
  		idesc->id_filesize -= size;
+ 		fixed_reclen = dp->d_reclen + size;
  		fix = dofix(idesc, "DIRECTORY CORRUPTED");
  		bp = getdirblk(idesc->id_blkno, blksiz);
  		dp = (struct direct *)(bp->b_un.b_buf + dploc);
+ #if	0
  		dp->d_reclen += size;
+ #else
+ 		/*
+ 		 * dofix above might scan over our broken directory record
+ 		 * and fix it's size, too.  The old code increments
+ 		 * dp->d_reclen twice.
+ 		 */
+ 		dp->d_reclen = fixed_reclen;
+ #endif
  		if (fix)
  			dirty(bp);
  	}
***************
*** 336,341 ****
--- 348,367 ----
  	dirp->d_reclen = newent.d_reclen;
  	dirp->d_namlen = newent.d_namlen;
  	bcopy(idesc->id_name, dirp->d_name, (size_t)dirp->d_namlen + 1);
+ 
+ 	/* 
+ 	 * 'dirscan' will eventually swap the original dirp back to the
+ 	 * old inode format.  Handle the new dirent here.
+ 	 */
+ #	if (BYTE_ORDER == LITTLE_ENDIAN)
+ 		if (!newinofmt) {
+ 			u_char tmp;
+ 
+ 			tmp = dirp->d_namlen;
+ 			dirp->d_namlen = dirp->d_type;
+ 			dirp->d_type = tmp;
+ 		}
+ #	endif
  	return (ALTERED|STOP);
  }
  
-- 
Juergen Keil          jk@tools.de ...!{uunet,mcsun}!unido!tools!jk


------------------------------------------------------------------------------