Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: NetBSD 10.0 RC1 PVH and XenServer/XCP



On Mon, 19 Aug 2024, Manuel Bouyer wrote:
On Thu, Aug 15, 2024 at 01:46:24PM +0100, Stephen Borrill wrote:
On Fri, 17 Nov 2023, Stephen Borrill wrote:

On Fri, 17 Nov 2023, Stephen Borrill wrote:
On Tue, 14 Nov 2023, Manuel Bouyer wrote:
On Tue, Nov 14, 2023 at 09:55:55AM +0000, Stephen Borrill wrote:
On Tue, 14 Nov 2023, Stephen Borrill wrote:
On Mon, 13 Nov 2023, Brian Buhrow wrote:
	hello.  If you place an uncompressed copy of netbsd-INSTALL.gz from
the kernels directory of a
release build as /netbsd on the iso image that you boot from, do you
then find you can boot
and, then, format and populate the xbd disk that shows up?  I've
done this to build NetBSD
systems under hosted VM offerings, specifically Linode's offerings.

I expect this would get the installer going, but without the cd being
accessible, I would have to resort to trying to fetch the sets over the
network. Lack of access to the ISO library may cause problems later too.

In PVH mode, the cdrom device is present in the xenstore at all times,
but its status does not change when an ISO image is inserted:

# xenstore-read domid
11
# xenstore-ls /local/domain/20/device/vbd
[snip]
5696 = ""
backend = "/local/domain/0/backend/vbd3/20/5696"
backend-id = "0"
device-type = "cdrom"
state = "1"
virtual-device = "5696"

Compare this to PV mode. The vbd only appears when the ISO image is
inserted, but looks like this:

# xenstore-read domid
11
# xenstore-ls /local/domain/11/device/vbd
[snip]
51760 = ""
backend = "/local/domain/0/backend/vbd3/11/51760"
backend-id = "0"
device-type = "cdrom"
event-channel = "19"
protocol = "x86_64-abi"
ring-ref = "494"
state = "4"
virtual-device = "51760"

From this it is clear the PVH device is not properly initialised and a
clue to this is found in dmesg:
xenbus0: ignoring device/vbd/5696 type cdrom

Comparing to a Linux PVH VM, I can see my logic isn't quite right. The
device, as listed in the xenstore, is identical to that found on NetBSD.
Nothing changes in the xenstore as a result of inserting an ISO.

The device is explicitly skipped:
https://nxr.netbsd.org/xref/src/sys/arch/xen/xenbus/xenbus_probe.c#455

Is this because xbd doesn't have the concept of 'not ready'?

Mainly because (as the commit log says) the emulated device isn't
disabled so the virtual cd would show up as both cd0 and an xbd instance.

There is no cd0, but I think the virtual cd shows up as the broken wd0:

wd0 at atabus0 drive 0
wd0: <ST506>
wd0: 69632 KB, 1024 cyl, 8 head, 17 sec, 512 byte/sect x 139264 sectors
wd0d: error reading fsbn 0 (wd0 nb 0; cn 0 tn 0 sn 0), xfer 30, retry 0
wd0: (aborted command)

I can confirm that commenting out the code that skips the type=cdrom
devices does lead to a hang as described in the commit message, but
ONLY if the virtual cd drive is empty, i.e. not ready. If the
machine is booted with an ISO inserted, it boots just fine and
mount_cd9660 /dex/xbd1a /mnt works as expected. So the problem is
more how xbd(4) copes with not ready devices.

And to back this up, if I umount /mnt, then eject the ISO from the
virtual cd drive, and try to run mount_cd9660 /dex/xbd1a /mnt again, I
get the permanent hang rather than ENODEV that you'd get with an empty
cd0:

mount_cd9660: /dev/cd0a on /mnt: Operation not supported by device

However I think that in the same way that the emulated NIC is masked
when the PV NIC is present, the virtual cd should be presented as an
xbd device.

The change to explicitly ignore cdrom devices has broken pure PV operation.
When an ISO image is inserted, no xbd device appears just the "xenbus0:
ignoring device/vbd/5696 type cdrom" message. There is no emulated cd0 with
PV.

I still consider ignoring the devices to be incorrect, as it's just a quick
hack instead of fixing the root problem of not ready devices hanging.
However, for the moment I propose the following change. If it's OK, I'll
commit.

It's not OK as not ready devices will still hang (and it doesn't help for
the PVH case). More tests needed, and yes probably a better fix is needed too.

This is explicitly only for the PV case as the netbsd-10 and -current behaviour is a regression over all previous versions.

Yes, the better fix would be to solve the hang when devices are not ready as that is the real problem here (for non-PV guests).

Here's what happens with Linux when the device is not ready on a non-PV guest:

# mount /dev/sr0 /mnt
mount: /mnt: no medium found on /dev/sr0.
# echo $?
32

As a workaround don't use type=cdrom

That's not an option with XenServer/XCP-NG.

If I remember properly, there is an issue with HVM: the cd device appears as
both emulated and PV, and AFAIK there's no way to disable the emulated
device the same way it's done for drives. So at last for HVM guests the
cdrom type has to be ignored (this is what linux does)

Yes, I get that, but my patch does not change behaviour for HVM or PVH (or if it does, it's not intended!).

--
Stephen


Home | Main Index | Thread Index | Old Index