NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
kern/59170: bogus reply from NVMM for invalid CPUID on AMD CPU
>Number: 59170
>Category: kern
>Synopsis: bogus reply from NVMM for invalid CPUID on AMD CPU
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Mar 11 08:15:00 +0000 2025
>Originator: Emile `iMil' Heitor
>Release: NetBSD 10.99.12
>Organization:
NetBSD
>Environment:
System: NetBSD nbgdb 10.99.12 NetBSD 10.99.12 (MICROVM) #9: Thu Jan 23 06:15:45 CET 2025 imil@tatooine:/home/imil/src/github.com/NetBSD-src/sys/arch/amd64/compile/obj/MICROVM amd64
Architecture: x86_64
Machine: amd64
>Description:
On AMD CPUs, NVMM default reply to invalid CPUID leafs can lead
to panic().
In sys/dev/nvmm/x86/nvmm_x86_vmx.c and sys/dev/nvmm/x86/nvmm_x86_svm.c
the following code checks for CPUID leaf boundaries:
svm_cpuid_max_basic = uimin(cpuid_level, SVM_CPUID_MAX_BASIC);
[...]
svm_cpuid_max_extended = uimin(descs[0], SVM_CPUID_MAX_EXTENDED
[...]
#define SVM_CPUID_MAX_BASIC 0xD
#define SVM_CPUID_MAX_HYPERVISOR 0x40000000
#define SVM_CPUID_MAX_EXTENDED 0x800000
[...]
if (eax < 0x40000000) { /* basic CPUID range */
if (__predict_false(eax > svm_cpuid_max_basic)) {
eax = svm_cpuid_max_basic;
svm_inkernel_exec_cpuid(cpudata, eax, ecx);
}
} else if (eax < 0x80000000) { /* hypervisor CPUID range */
if (__predict_false(eax > SVM_CPUID_MAX_HYPERVISOR)) {
eax = svm_cpuid_max_basic;
svm_inkernel_exec_cpuid(cpudata, eax, ecx);
}
} else { /* extended CPUID range */
if (__predict_false(eax > svm_cpuid_max_extended)) {
eax = svm_cpuid_max_basic;
svm_inkernel_exec_cpuid(cpudata, eax, ecx);
}
}
This portion of nvmm's code makes that on AMD CPUs, when an unsupported
CPUID leaf is asked, the following code is executed:
case 0x0000000D: /* Processor Extended State Enumeration */
if (svm_xcr0_mask == 0) {
break;
}
switch (ecx) {
case 0:
cpudata->vmcb->state.rax = svm_xcr0_mask & 0xFFFFFFFF;
if (cpudata->gxcr0 & XCR0_SSE) {
cpudata->gprs[NVMM_X64_GPR_RBX] = sizeof(struct fxsave);
} else {
cpudata->gprs[NVMM_X64_GPR_RBX] = sizeof(struct save87);
}
cpudata->gprs[NVMM_X64_GPR_RBX] += 64; /* XSAVE header */
cpudata->gprs[NVMM_X64_GPR_RCX] = sizeof(struct fxsave) + 64;
cpudata->gprs[NVMM_X64_GPR_RDX] = svm_xcr0_mask >> 32;
break;
case 1:
cpudata->vmcb->state.rax &=
(CPUID_PES1_XSAVEOPT | CPUID_PES1_XSAVEC |
CPUID_PES1_XGETBV);
cpudata->gprs[NVMM_X64_GPR_RBX] = 0;
cpudata->gprs[NVMM_X64_GPR_RCX] = 0;
cpudata->gprs[NVMM_X64_GPR_RDX] = 0;
break;
default:
cpudata->vmcb->state.rax = 0;
cpudata->gprs[NVMM_X64_GPR_RBX] = 0;
cpudata->gprs[NVMM_X64_GPR_RCX] = 0;
cpudata->gprs[NVMM_X64_GPR_RDX] = 0;
break;
}
break;
Leading to potentially harmful results. For instance, implementing
leaf 0x40000010 in a guest kernel results in receiving 3 as the TSC frequency
(svm_xcr0_mask & 0xFFFFFFFF) and thus leading to panic().
I haven't witnessed any impact on Intel processors, probably because
VMX_CPUID_MAX_BASIC is defined to 0x16, which host CPUID replies with all
registers sets to 0 and switch (eax) case 0x16 simply breaks.
Intel documentation says:
If a value entered for CPUID.EAX is higher than the maximum
input value for basic or extended function for that
processor then the data for the highest basic information
leaf is returned.
AMD documentation says nothing about how to handle that case.
>How-To-Repeat:
Implement vmware compatible TSC and LAPIC frequency request from
CPUID 0x40000010:
---
+#ifdef _KERNEL
+static uint64_t
+tsc_freq_vmware_cpuid(struct cpu_info *ci)
+{
+ uint32_t descs[4];
+ uint64_t freq;
+
+ if (ci->ci_max_ext_cpuid < 0x40000010)
+ return 0;
+
+ x86_cpuid(0x40000010, descs);
+
+ freq = descs[0];
+ if (freq == 0)
+ return 0;
+
+ aprint_verbose(
+ "got tsc frequency from vmware compatible cpuid\n");
+
+#if NLAPIC > 0
+ if (descs[1] > 0) {
+ aprint_verbose(
+ "got lapic frequency from vmware compatible cpuid\n");
+ lapic_per_second = descs[1] * 1000;
+ lapic_from_cpuid = true;
+ }
+#endif
+
+ return freq * 1000;
+}
+#endif
---
And see reported frequency:
[ 1.0000030] got tsc frequency from vmware compatible cpuid
[ 1.0000030] got lapic frequency from vmware compatible cpuid
[ 1.0000030] cpu0: TSC freq CPUID 3000 Hz
3000 being 3, the result of svm_xcr0_mask & 0xFFFFFFFF, multiplied by 1000.
Example panic():
https://releng.netbsd.org/b5reports/amd64/2025/2025.03.06.09.02.46/install.log
>Fix:
Possible fixes:
* Implement this particular wanted CPUID leaf.
* Set ECX > 0x3e? as per AMD64 Architecture Programmer's Manual:
For CPUID Fn0000_000D, if the subfunction (specified by contents of ECX)
passed as input to the instruction is greater than 3Eh, the instruction
returns zero in the EAX, EBX, ECX, and EDX registers.
* Set EAX, EBX, ECX, and EDX to 0 ourselves?
Home |
Main Index |
Thread Index |
Old Index