Subject: Re: ffs compatibility added, fsck may complain
To: William Allen Simpson <wsimpson@greendragon.com>
From: Darrin B. Jewell <dbj@netbsd.org>
List: current-users
Date: 03/14/2004 11:15:29
I've been meaning to add an option to fsck to downgrade the
filesystem, but even that won't completely help your blind upgrade
problem.  The real answer is that booting a broken -current kernel in
a blind upgrade situation is dangerous, but that happenned months
ago, and we're past that already.

So, my current best guess as to a blind upgrade path would
be something like:

Modifty a -current fsck_ffs so that it does not try to
remount a filesystem after it has repaired it.  The following
patch should do this:

--- src/sbin/fsck_ffs/main.c.~1.49.~      Sat Jan 17 17:17:07 2004
+++ src/sbin/fsck_ffs/main.c      Sun Mar 14 11:13:24 2004
@@ -373,7 +373,7 @@
                pwarn("\n***** FILE SYSTEM WAS MODIFIED *****\n");
        if (rerun)
                pwarn("\n***** PLEASE RERUN FSCK *****\n");
-       if (hotroot()) {
+       if (0 && hotroot()) {
                struct statfs stfs_buf;
                /*
                 * We modified the root.  Do a mount update on

Compile that fsck_ffs on your existing system by cd'ing
into src/sbin/fsck_ffs and running:
  make USETOOLS=no DESTDIR=/
  cp ./fsck_ffs /root/fsck_ffs.repair
  make USETOOLS=no DESTDIR=/ cleandir

Compile a -current kernel.
Stop all processes you can without removing access to the machine.
unmount all possible filesytems except for / and /usr
Copy the new kernel into place.
  for the root or /usr filesystems, downgrade the mount to read only.
If the mount downgrades fail, don't try to force them.  Put
the old kernel back in place and look around for processes that
are still writing the disk.

Wave a few dead chickens.  Type sync.  Wait a few seconds.  Type it
again.  It should return immediately.  If you can, verify that sync
does not introduce any disk activity.  iostat -x can be useful for
this.

Verify that the fileystems are clean and up to date on disk first
by running
  /root/fsck_ffs.repair -n -f -b 16 -c 3
If this reports any required changes to the filesystems, stop.
Re-mount the filesystem read-write and put the running
kernel back in place.   Ask here about how to proceed before
continuing with the upgrade.

Upgrade the filesystems by running:
  /root/fsck_ffs.repair -b 16 -c 4
  on the raw devices of your filesystems.

Be gentle while doing this.  You don't want the old kernel
to try to access any newly upgraded superblocks.  Even a read
only access may cause a panic.  Hopefully, for filesystems
which are already mounted read-only, it will not need to
go back to disk to re-read the superblock.

reboot

I would test this upgrade path on systems that are not
in a blind upgrade situation first.  I have not tested
this upgrade path.

Good luck.

Darrin

William Allen Simpson <wsimpson@greendragon.com> writes:

> I never saw an answer to Perry (and my and I'm sure many others) 
> problem with blind updating co-lo space to more recent -current:
> 
> "Perry E. Metzger" wrote:
> > 
> > Also, the situation is REALLY unfortunate. It means that you're going
> > to end up with machines mysteriously failing on people without much
> > recourse in the field if you don't happen to remember the cure. Also,
> > people needing to blind upgrade boxes in colos will get screwed -- I'm
> > one of those.
> > 
> > Is there any way to either get the kernel to fix this for you during
> > boot, or to provide a way to fix it in advance so that fsck doesn't
> > fail during reboot? This is actually pretty important.
> > 
> I'm trying to get ready to test -current in preparation for 2.0, but 
> I'm not sure that everything will be hunky-dory after simply installing 
> a new kernel, reboot, tar zxpf base.tgz et alia, reboot.
> 
> As Perry suggests, is there a way to fix it in advance?
> -- 
> William Allen Simpson
>     Key fingerprint =  17 40 5E 67 15 6F 31 26  DD 0D B9 9B 6A 15 2C 32