On 5/29/12 4:16 PM, Roger Pau Monné wrote:
Well, I just want to provide working combo of Xen 4.1 + xl + netbsd today, specifically with the file: scheme. Since xl doesn't support calling block-<custom_scheme> like xm does, this phy:<custom_type>: scheme is just a temporary workaround, which was easy to implement.In upstream Xen we have split some of this backend checking functions into two separate files, one for NetBSD and one for Linux, so we have a little more liberty to decide what we allow, and what we deny. Also, I'm not convinced of using phy:vnd:.., since it will break Linux/NetBSD common syntax. The right solution to this from my point of view seems to be to allow the user to define custom hotplug scripts from the guest config file, so you can decide how to handle the disk (keeping both "phy:..." and "file:.." nomenclatures, which merely point to the type of disk being used, but not the backend to use to attach it).
If I read it correctly, libxl_device.c:libxl__devices_destroy() (plural) waits for all of devices of the domU to get destroyed with timeout value of LIBXL_DESTROY_TIMEOUT, which is 10 seconds (see the (!force) block). I'm not sure what would prevent xenbackendd to response to device state change from 4 to 5 in more than, say 250msec, even though there is no guarantee. It's true that theoretically it's racey, but practically it works most of the time, which is better than not working at all just for running cleanup logic.2. Another problem with xl, which I suspect is netbsd dom0 specific, is that the block script doesn't get called after domU shutdown, and so if you have vnd based disks, vnconfig -u won't get called. This problem didn't appear with xl block-attach and xl block-detach however. These changes are in patch-libxl_vbd_destroy_fixup. This patch might be readily submitted upstream to the Xen folks, but I don't know whether the intrusive file: support for netbsd and the unsanctioned phy:<subtype>: scheme in the previous patch file would be well received upstream.This fix is racy, xl will destroy the backend xenstore entries when the backend reaches state 6, but xenbackendd needs those entries to disconnect the device, so although it might work, it is prone to errors, and I guess you have been very lucky that xenbackendd has always executed the scripts before xl has deleted the backend entries . I'm working on this, and we are going to make libxl udev agnostic for 4.2. Hotplug scripts will be called from xl (the toolstack), so we can keep a strict control of when to execute them and the result of the execution, this change will also get rid of xenbackendd, and Linux and NetBSD will have the same mechanism for calling hotplug scripts.
And just for kicks, in the very beginning of my block script I put a sleep of 60 seconds if the status is 6, and it took a while to wait until vnconfig -u is called but eventually the vnd device is unconfigured. And while I was waiting for the vnd device to go away, I also ran xsdump (snippet from the Xen wiki) multiple times to look at /local/domain/0/backend, and it sticks around way after the domU is gone (no longer displayed by xl list) until, I guess, the block script calls xenstore-rm on the backend vbd entry of the domid. I'm guessing that there is some sort of ref counting going on. I'd be interested to find out why this actually works. FYI, this is what I put in the beginning of my block script for this experiment:
#!/bin/sh -e case $2 in 6) sleep 60 esacThe reason that I put the changes in libxl_devices.c:libxl__device_destroy() (singular device) is because it didn't handle the case when the device is already in state 5 at the beginning of the function. For some reason netbsd dom0 xbdback does that during shutdown, e.g. using shutdown -p now from the domU. I don't know whether the same occurs with linux dom0 vbdback. At any rate, that caused the devices to not be waited upon for destruction.
Well, keep up the good work of getting xl in 4.2 to work better with netbsd...
Cheers, Toby