extending LVM logical volumes for Xen root partitions is NOT so simple!

To: NetBSD/xen Discussion List <port-xen%NetBSD.org@localhost>
Subject: extending LVM logical volumes for Xen root partitions is NOT so simple!
From: "Greg A. Woods" <woods%planix.ca@localhost>
Date: Fri, 22 Feb 2019 16:42:12 -0800

TL/DR:  lvresize on Xen domU root volumes requires re-writing the disk label!

So, during an upgrade of my main build server, which is a Xen domU with
LVM backed filesystems, I decided to increase the size of its root
filesystem.  I had tested this many months ago and all went well, albeit
the test was with a non-root filesystem on a throw-away domU.  I had
done an fsck and the resize_ffs from the dom0 in both the test and the
upgrade.

However the first boot quite surprisingly dropped into single-user mode:

	Mon Feb 11 01:31:05 PST 2019
	Starting root file system check:
	CANNOT READ: BLK 41569216
	/dev/rxbd0a: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.
	Automatic file system check failed; help!
	ERROR: ABORTING BOOT (sending SIGTERM to parent)!
	[1]   Terminated              rc_real_work "${@}" 2>&1 |
	Done(1)                 rc_postprocess
	Enter pathname of shell or RETURN for /bin/sh: 

The more verbose output from a manual attempt gives more clues:

	future# fsck -n /dev/rxbd0a
	** /dev/rxbd0a (NO WRITE)
	
	CANNOT READ: BLK 41569216
	CONTINUE? yes
	
	THE FOLLOWING DISK SECTORS COULD NOT BE READ: 41569216, 41569217, 41569218, 41569219,
	/dev/rxbd0a: CANNOT FIGURE OUT SECTORS PER CYLINDER


Basically I had doubled the size, and now it seems none of superblocks
fsck wants to read can be read (from the domU).

Yesterday I finally figured out that it must be due to one kernel or
another (likely domU) believing the original disklabel that I'm guessing
was written by sysinst during the first install of the system.

My test of expanding an LVM-backed filesystem had been on a non-sysinst
created filesystem, and I have not been putting any labels on any of the
filesystem devices added after boot, so without a label on disk nothing
restricts access to the whole logical volume and all is well in the
domU after the filesystem has been resized to match the LV.

What's confusing is that despite the disklabel appearing in the dom0,
and appearing identical to how it appears in the domU, the dom0 system
completely ignores it and just gets on with things.  So, I'm not sure if
it is the domU kernel, or the dom0 device mapper or backend device, or
something else, which is restricting reads to the original device size,
but given the dom0 has full access to the whole resized logical volume,
it is likely to be the domU driver that has read the on-disk label and
used it.

And thus I'm hoping that it is the on-disk disklabel that is setting
this limit (since I can't find any other source of the old size still in
the dom0).  (I haven't tried looking at the related kernel code -- the
last time I read that code I had too many urges to "fix" it!  :-))

The disklabel (on disk, as seen from the dom0) is as follows:

	xentastic# disklabel -r /dev/mapper/rvg0-lv20
	# /dev/mapper/rvg0-lv20:
	type: unknown
	disk: future root00
	label: 
	flags:
	bytes/sector: 512
	sectors/track: 2048
	tracks/cylinder: 1
	sectors/cylinder: 2048
	cylinders: 10240
	total sectors: 20971520
	rpm: 3600
	interleave: 1
	trackskew: 0
	cylinderskew: 0
	headswitch: 0           # microseconds
	track-to-track seek: 0  # microseconds
	drivedata: 0 
	
	16 partitions:
	#        size    offset     fstype [fsize bsize cpg/sgs]
	 a:  20971520         0     4.2BSD   2048 16384     0  # (Cyl.      0 -  10239)
	 c:  20971520         0     unused      0     0        # (Cyl.      0 -  10239)
	 d:  20971520         0     unused      0     0        # (Cyl.      0 -  10239)

As you can see it is at 10GB, while the new logical volume is at 21GB:

	xentastic# lvm lvdisplay -v vg0/lv20
	    Using logical volume(s) on command line
	  --- Logical volume ---
	  LV Name                /dev/vg0/lv20
	  VG Name                vg0
	  LV UUID                xguHwv-fP4f-2cO0-SLzy-pNye-IAD6-H01gsB
	  LV Write Access        read/write
	  LV Status              available
	  # open                 0
	  LV Size                21.00 GiB
	  Current LE             5376
	  Segments               2
	  Allocation             inherit
	  Read ahead sectors     auto
	  - currently set to     0
	  Block device           169:8
	xentastic# dmsetup -v status /dev/mapper/vg0-lv20
	Name:              vg0-lv20
	State:             ACTIVE
	Read Ahead:        0
	Tables present:    LIVE
	Open count:        0
	Event number:      0
	Major, minor:      169, 8
	Number of targets: 2

	0 20971520 linear 
	20971520 23068672 linear 

My swap partition has a label saved to disk too, though I'm not sure how
that happened, and it appears to be a copy of the fictitious label:

	future# disklabel -r xbd1
	# /dev/rxbd1d:
	type: ESDI
	disk: Xen Virtual ESDI
	label: fictitious
	flags:
	bytes/sector: 512
	sectors/track: 2048
	tracks/cylinder: 1
	sectors/cylinder: 2048
	cylinders: 16384
	total sectors: 33554432
	rpm: 3600
	interleave: 1
	trackskew: 0
	cylinderskew: 0
	headswitch: 0           # microseconds
	track-to-track seek: 0  # microseconds
	drivedata: 0 
	
	4 partitions:
	#        size    offset     fstype [fsize bsize cpg/sgs]
	 a:  33554432         0       swap                     # (Cyl.      0 -  16383)
	 d:  33554432         0     unused      0     0        # (Cyl.      0 -  16383)


Given the current and normally low rate of change on my system's root
filesystem (it has a separate /var, etc.), and the fact it fscks fine
from the dom0, I've gone ahead and brought the domU system up multi-user
without any problem.  However I want to do another upgrade on it, and
that'll churn the filesystem, so I want to fix this issue.

My instinct is to just replace the label with a block of zeros (after
making a backup of it in the dom0 of course).  However I'm unsure if
any tools might be assuming a root disklabel exists (e.g. does sysinst
make use of it for an upgrade?).

Perhaps though the best option is to re-write the label with a newly
minted copy of what would be the fictitious label.  (I think I could do
that by blanking the label, then using disklabel to read the fictitious
label from the driver and then write it to the disk.)

I'd try one of those right now, but I'm typing this message on that
system.

Ideally I would like to see the OS handle all these issues automatically
somehow, though it wouldn't be the end of the world if there were
another step required to resize a root filesystem.

However if another step is going to always be needed then it should not
require the admin to calculate or even copy any value to facilitate it.
I.e. one should not have to copy the value for the new size into the
disklabel manually -- a new option disklabel(8), or some new tool,
should do that automatically (i.e. if there's only one non-whole
partition on the disk, and it's the same size as the whole-disk
partition, then the size of both partitions should be adjusted to match
the new size of the logical volume).  Perhaps there could be a new
option to resize_ffs to have it call disklabel to fix the label, thus
reducing the required number of actions for the admin and making it very
easy to set up boot-time scripts for cloud hosting that would
automatically resize (or at least grow) all filesystems based on the
current size of their backing stores.

-- 
					Greg A. Woods <gwoods%acm.org@localhost>

+1 250 762-7675                           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>

Follow-Ups:
- Re: extending LVM logical volumes for Xen root partitions is NOT so simple!
  - From: Michael van Elst

Prev by Date: Re: XEN3_DOMU no longer shutting down or rebooting
Next by Date: Re: extending LVM logical volumes for Xen root partitions is NOT so simple!
Previous by Thread: XEN3_DOMU no longer shutting down or rebooting
Next by Thread: Re: extending LVM logical volumes for Xen root partitions is NOT so simple!
Indexes:

Home | Main Index | Thread Index | Old Index