tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Support for ramdisks in PVH boot



		Hi tech-kern@,

On Mon, 31 Mar 2025 04:40:30 -0000 (UTC), Pierre Pronchery wrote:

> This post is related to iMil's recent work on PVH support for
> NetBSD/amd64.
> I was unable to use his work to boot on ramdisks directly with QEMU's -
> initrd flag, when using -kernel.
> 
> Well after a deep dive into it, I think I am almost there:
> https://git.edgebsd.org/gitweb/?
> p=src.git;a=commitdiff;h=629621f41089af50584214a4d32b50ae8ee414f2
> 
> This patch:
> - extends sys/arch/amd64/amd64/genassym.cf for additional knowledge of
>   Xen's hvm_start_info (notably nr_modules and modlist_paddr)
> - extends .start_genpvh in locore.S to copy the module entries, and
> their
>   respective command lines and contents
> - teaches x86_machdep.c to load Xen modules when a VM_GUEST_GENPVH guest
> 
> The code is not working yet unfortunately.

Well, now it does; with MICROVM, on an Intel-macOS host:

> $ qemu-system-x86_64 -m 512 -accel hvf -display none -serial stdio \
>   -M microvm,rtc=off,acpi=off,pic=off -kernel netbsd-MICROVM -append \
>   console=com rw -v -initrd ramdisk-cgdroot.fs -action reboot=shutdown \
>   -D qemu.log -d cpu_reset,in_asm,guest_errors,unimp \
>   -device virtio-blk-device,drive=hd0 \
>   -drive file=ld0.img,format=raw,id=hd0
> qemu-system-x86_64: warning: host doesn't support requested feature: 
CPUID.80000001H:ECX.svm [bit 2]
> [   1.0000000] WARNING: system needs entropy for security; see entropy(7)
> [   1.0000000] [ Kernel symbol table missing! ]
> [   1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002,>  
2003,
> [   1.0000000]     2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 
> 2013,
> [   1.0000000]     2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 
2023,
> [   1.0000000]     2024, 2025
> [   1.0000000]     The NetBSD Foundation, Inc.  All rights reserved.
> [   1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
> [   1.0000000]     The Regents of the University of California.  All 
rights reserved.
> 
> [   1.0000000] NetBSD 10.99.12 (MICROVM) #0: Wed Apr  9 08:52:24 UTC 2025
> [   1.0000000]  mkrepro%mkrepro.NetBSD.org@localhost:/usr/src/sys/arch/amd64/
compile/MICROVM
> [   1.0000000] total memory = 511 MB
> [   1.0000000] avail memory = 480 MB
> [   1.0000000] KERNBASE=0xffffffff80000000
> [   1.0000000] modlist_paddr=0xffffffff80a00038 > 
cmdline_paddr=0xffffffff80ee2075 cmdline="console=com rw -v 
virtio_mmio.device=512@0xfeb00e00:12"
> [   1.0000000] Xen module info at boot (0xffffffff80a00038, 1)
> [   1.0000000] timecounter: Timecounters tick every 10.000 msec
> [   1.0000000] mainbus0 (root)
> [   1.0000000] mainbus0: Intel MP Specification (Version 1.4) (QBOOT    
000000000000)
> [   1.0000000] cpu0 at mainbus0 apid 0
> [   1.0000000] cpu0: Use lfence to serialize rdtsc
> [   1.0000000] cpu0: QEMU Virtual CPU version 2.5+, id 0x60fb1
> [   1.0000000] cpu0: node 0, package 0, core 0, smt 0
> [   1.0000000] mpbios: bus 0 is type ISA   
> [   1.0000000] ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 0x20, 
24 pins
> [   1.0000000] isa0 at mainbus0
> [   1.0000000] com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, 16-byte 
FIFO
> [   1.0000000] com0: console
> [   1.0000000] allocated pic ioapic0 type edge pin 4 level 8 to cpu0 slot 
0 idt entry 129
> [   1.0000000] pv0 at mainbus0
> [   1.0000000] virtio0 at pv0
> [   1.0000000] virtio0: kernel parameters: console=com rw -v 
virtio_mmio.device=512@0xfeb00e00:12
> [   1.0000000] virtio0: viommio: 512@0xfeb00e00:12
> [   1.0000000] virtio0: VirtIO-MMIO-v1
> [   1.0000000] virtio0: block device (id 2, rev. 0x00)
> [   1.0000000] ld0 at virtio0: features: 
0x10002e54<INDIRECT_DESC,DISCARD,CONFIG_WCE,TOPOLOGY,FLUSH,BLK_SIZE,GEOMETRY,SEG_MAX>
> [   1.0000000] ld0: Unknown SIZE_MAX, assuming 65536
> [   1.0000000] ld0: max 254 segs of max 65536 bytes
> [   1.0000000] virtio0: allocated 4227072 byte for virtqueue 0 for I/O 
request, size 1024
> [   1.0000000] virtio0: using 4194304 byte (262144 entries) indirect 
descriptors
> [   1.0000000] allocated pic ioapic0 type level pin 12 level 6 to cpu0 
slot 1 idt entry 96
> [   1.0000000] virtio0: interrupting on -1
> [   1.0000000] ld0: 1953 MB, 3968 cyl, 16 head, 63 sec, 512 bytes/sect x 
4000000 sectors
> [   1.0000000] virtio1 at pv0
> [   1.0000000] timecounter: Timecounter "lapic" frequency 1046204000 Hz 
quality -100
> [   1.0000000] timecounter: Timecounter "clockinterrupt" frequency 100 Hz 
quality 0
> [   1.0000030] timecounter: Timecounter "TSC" frequency 2410445480 Hz 
quality -100
> [   1.0000030] boot device: ld0
> [   1.0000030] md0: internal 5000 KB image area
> [   1.0000030] root on md0a dumps on md0b
> [   1.0000030] root file system type: ffs
> [   1.0000030] kern.module.path=/stand/amd64/10.99.12/modules
> [   1.0100030] WARNING: no TOD clock present
> [   1.0100030] WARNING: using filesystem time
> [   1.0100030] WARNING: CHECK AND RESET THE DATE!
> [   1.0100030] warning: no /dev/console
> Created tmpfs /dev (1835008 byte, 3552 inodes)
> Could not mount the boot partition
> erase ^?, werase ^W, kill ^U, intr ^C
> This image contains utilities which may be needed
> to get you out of a pinch.
> # 

Your help in reviewing this work before committing will be very welcome!

The patch:

From caa038822350a7f30a7975dc29386c052dca32de Mon Sep 17 00:00:00 2001
From: Pierre Pronchery <khorben%EdgeBSD.org@localhost>
Date: Mon, 31 Mar 2025 04:36:00 +0200
Subject: [PATCH] amd64: add support for -initrd with VM_GUEST_GENPVH

Tested on NetBSD/amd64
---
 sys/arch/amd64/amd64/genassym.cf |  6 +++
 sys/arch/amd64/amd64/locore.S    | 65 +++++++++++++++++++++++++++++---
 sys/arch/amd64/conf/MICROVM      |  4 ++
 sys/arch/x86/x86/x86_machdep.c   | 32 ++++++++++++++++
 4 files changed, 102 insertions(+), 5 deletions(-)

diff --git a/sys/arch/amd64/amd64/genassym.cf b/sys/arch/amd64/amd64/
genassym.cf
index d8f31cd51a22..c93c79ffb32c 100644
--- a/sys/arch/amd64/amd64/genassym.cf
+++ b/sys/arch/amd64/amd64/genassym.cf
@@ -384,6 +384,12 @@ define SIR_XENIPL_HIGH		SIR_XENIPL_HIGH
 define EVTCHN_UPCALL_MASK	offsetof(struct vcpu_info, 
evtchn_upcall_mask)
 define HVM_START_INFO_SIZE	sizeof(struct hvm_start_info)
 define START_INFO_VERSION	offsetof(struct hvm_start_info, version)
+define START_INFO_MODLIST_PADDR	offsetof(struct hvm_start_info, 
modlist_paddr)
+define START_INFO_NR_MODULES	offsetof(struct hvm_start_info, nr_modules)
+define HVM_MODLIST_ENTRY_SIZE	sizeof(struct hvm_modlist_entry)
+define MODLIST_ENTRY_CMDLINE	offsetof(struct hvm_modlist_entry, 
cmdline_paddr)
+define MODLIST_ENTRY_PADDR	offsetof(struct hvm_modlist_entry, paddr)
+define MODLIST_ENTRY_SIZE	offsetof(struct hvm_modlist_entry, size)
 define MMAP_PADDR		offsetof(struct hvm_start_info, 
memmap_paddr)
 define MMAP_ENTRIES		offsetof(struct hvm_start_info, 
memmap_entries)
 define MMAP_ENTRY_SIZE		sizeof(struct hvm_memmap_table_entry)
diff --git a/sys/arch/amd64/amd64/locore.S b/sys/arch/amd64/amd64/locore.S
index 6711b572324f..f3db58189b45 100644
--- a/sys/arch/amd64/amd64/locore.S
+++ b/sys/arch/amd64/amd64/locore.S
@@ -1106,10 +1106,60 @@ ENTRY(start_pvh)
 	shrl $2, %ecx
 	rep movsl
 
-	/* Copy cmdline_paddr after hvm_start_info */
+	/* Copy hvm_modlist_entry[] after hvm_start_info */
+	movl $RELOC(__kernel_end), %ebx
+	movl START_INFO_MODLIST_PADDR(%ebx), %esi
+	movl %edi, START_INFO_MODLIST_PADDR(%ebx)   /* Set new 
modlist_paddr in hvm_start_info */
+	movl START_INFO_NR_MODULES(%ebx), %eax /* Get nr_modules */
+	movl $HVM_MODLIST_ENTRY_SIZE, %ecx /* ecx = 
sizeof(hvm_modlist_entry) */
+	mull %ecx			 /* eax * ecx => edx:eax */
+	movl %eax, %ecx
+	shrl $2, %ecx
+	rep movsl
+
+	/* Copy the modules after the hvm_modlist_entry[] */
+	xorl %ecx, %ecx			/* ecx = i = 0 */
+	.modlist_copy:
+	movl $RELOC(__kernel_end), %ebx	/* ebx = &hvm_start_info */
+	movl START_INFO_NR_MODULES(%ebx), %eax /* eax = nr_modules */
+	cmpl %eax, %ecx			/* if (ecx == nr_modules) */
+	je .modlist_copy_done		/*   goto modlist_copy_done */
+	push %ecx
+	/* Copy the module */
+	movl START_INFO_MODLIST_PADDR(%ebx), %ebx /* ebx = 
&hvm_modlist_entry[0] */
+	movl $HVM_MODLIST_ENTRY_SIZE, %eax /* eax = 
sizeof(hvm_modlist_entry) */
+	mull %ecx			/* eax *= ecx */
+	addl %eax, %ebx			/* ebx = &hvm_modlist_entry[i] */
+	/* Copy the module's cmdline */
+	movl MODLIST_ENTRY_CMDLINE(%ebx), %esi
+	xorl %eax, %eax
+	movl %eax, MODLIST_ENTRY_CMDLINE(%ebx)
+	cmpl %eax, %esi
+	je .modlist_cmdline_copy_done
+
+	movl %edi, MODLIST_ENTRY_CMDLINE(%ebx)	/* Set new cmdline_paddr in 
hvm_modlist_entry */
+	.modlist_cmdline_copy:
+	movb (%esi), %al
+	movsb
+	cmp $0, %al
+	jne .modlist_cmdline_copy
+	.modlist_cmdline_copy_done:
+
+	/* Copy the module's content */
+	movl MODLIST_ENTRY_PADDR(%ebx), %esi /* esi = 
hvm_modlist_entry[i].paddr */
+	movl %edi, MODLIST_ENTRY_PADDR(%ebx) /* Set new paddr in 
hvm_modlist_entry */
+	movl MODLIST_ENTRY_SIZE(%ebx), %ecx /* ecx = 
hvm_modlist_entry[i].size */
+	rep movsb
+
+	pop %ecx			/* i++ */
+	inc %ecx
+	jmp .modlist_copy
+	.modlist_copy_done:
+
+	/* Copy cmdline_paddr after the modules */
+	movl $RELOC(__kernel_end), %ebx
 	movl CMDLINE_PADDR(%ebx), %esi
-	movl $RELOC(__kernel_end), %ecx
-	movl %edi, CMDLINE_PADDR(%ecx)	/* Set new cmdline_paddr in 
hvm_start_info */
+	movl %edi, CMDLINE_PADDR(%ebx)	/* Set new cmdline_paddr in 
hvm_start_info */
 	.cmdline_copy:
 	movb (%esi), %al
 	movsb
@@ -1136,11 +1186,17 @@ ENTRY(start_pvh)
 	/* announce ourself */
 	movl	$VM_GUEST_GENPVH, RELOC(vm_guest)
 
+	/* determine the amount of data needed */
+	movl	%edi, %edx
+	subl	$RELOC(__kernel_end), %edx
+
 	jmp .save_hvm_start_paddr
 
 .start_xen32:
 	pop %ebx
 	movl	$VM_GUEST_XENPVH, RELOC(vm_guest)
+	/* XXX assume hvm_start_info+dependant structure fits in a single 
page */
+	movl	$PAGE_SIZE, %edx
 
 .save_hvm_start_paddr:
  	/*
@@ -1166,9 +1222,8 @@ ENTRY(start_pvh)
 	movl	$RELOC(HYPERVISOR_shared_info_pa),%ebp
 	movl	%ebx,(%ebp)
 	movl	$0,4(%ebp)
-	/* XXX assume hvm_start_info+dependant structure fits in a single 
page */
 .add_hvm_start_info_page:
-	addl	$PAGE_SIZE, %ebx
+	addl	%edx, %ebx
 	addl	$PGOFSET,%ebx
 	andl	$~PGOFSET,%ebx
 	addl	$KERNBASE_LO,%ebx
diff --git a/sys/arch/amd64/conf/MICROVM b/sys/arch/amd64/conf/MICROVM
index 65982d42b4a9..864002a5eb25 100644
--- a/sys/arch/amd64/conf/MICROVM
+++ b/sys/arch/amd64/conf/MICROVM
@@ -23,3 +23,7 @@ machine amd64 x86 xen
 include         "arch/x86/conf/MICROVM.common"
 
 options         EXEC_ELF64      # exec ELF binaries
+options 	MODULAR		# new style module(7) framework
+
+options 	MEMORY_DISK_HOOKS	# enable md specific hooks
+options 	MEMORY_DISK_DYNAMIC	# enable dynamic resizing
diff --git a/sys/arch/x86/x86/x86_machdep.c b/sys/arch/x86/x86/
x86_machdep.c
index ab5ffaf35410..7f3d2308ba46 100644
--- a/sys/arch/x86/x86/x86_machdep.c
+++ b/sys/arch/x86/x86/x86_machdep.c
@@ -215,6 +215,32 @@ mm_md_physacc(paddr_t pa, vm_prot_t prot)
 }
 
 #ifdef MODULAR
+#ifdef XEN
+void x86_add_xen_modules(void);
+void x86_add_xen_modules(void)
+{
+	uint32_t i;
+#if defined(MEMORY_DISK_HOOKS) && defined(MEMORY_DISK_DYNAMIC)
+	struct hvm_modlist_entry *modlist;
+#endif
+
+	if (hvm_start_info->nr_modules == 0) {
+		aprint_verbose("No Xen module info at boot\n");
+		return;
+	}
+#if defined(MEMORY_DISK_HOOKS) && defined(MEMORY_DISK_DYNAMIC)
+	modlist = (void *)((uintptr_t)hvm_start_info->modlist_paddr + 
KERNBASE);
+#endif
+	for (i = 0; i < hvm_start_info->nr_modules; i++) {
+		/* XXX can be a filesystem image or ELF module or 
splashscreen */
+#if defined(MEMORY_DISK_HOOKS) && defined(MEMORY_DISK_DYNAMIC)
+		md_root_setconf(
+		    (void *)((uintptr_t)modlist[i].paddr + KERNBASE),
+		    modlist[i].size);
+#endif
+	}
+}
+#endif
 /*
  * Push any modules loaded by the boot loader.
  */
@@ -224,6 +250,12 @@ module_init_md(void)
 	struct btinfo_modulelist *biml;
 	struct bi_modulelist_entry *bi, *bimax;
 
+#ifdef XEN
+	if (vm_guest_is_pvh()) {
+		x86_add_xen_modules();
+	}
+#endif /* XEN */
+
 	biml = lookup_bootinfo(BTINFO_MODULELIST);
 	if (biml == NULL) {
 		aprint_debug("No module info at boot\n");
-- 
2.48.1

Cheers & HTH,
-- 
khorben



Home | Main Index | Thread Index | Old Index