So, yes, there's definitely a Xen timekeeping regression exhibited on some hardware. It's definitely in the Xen kernel though, not in NetBSD (though I believe xen_clock.c can be improved, as in the version I posted some time ago). The problem begins somewhere between Xen-4.11 and Xen-4.18 (probably since 4.13 actually, though that'll be proven in about a week's time). I've reinstalled 4.11 on one of the machines and it has been running with a stable clock for nearly 15 days now, and I recall previously having it run for much longer without problem. (I've still not yet tried 4.15 or any other intermediate version -- it's a lot of work to fix it up to make it usable in my configuration). The problem is not fixed in 4.20-rc. It only seems to happen on some hardware though. I have the following machines as reported by Xen & NetBSD (two variants of the latter): CPU Vendor: Intel, Family 6 (0x6), Model 44 (0x2c), Stepping 2 (raw 000206c2) Intel(R) Xeon(R) CPU E5645 @ 2.40GHz, id 0x206c2 CPU Vendor: Intel, Family 6 (0x6), Model 23 (0x17), Stepping 6 (raw 00010676) Intel(R) Xeon(R) CPU X5460 @ 3.16GHz, id 0x10676 Intel(R) Xeon(R) CPU E5440 @ 2.83GHz, id 0x10676 (/proc/cpuinfo details for each below) The first one, model 44, works fine. It's running Xen-4.18 and has a current uptime of 83 days and it's clock, as well as those of the NetBSD and FreeBSD domUs running under it, have been rock solid. The second CPU model has the problem. To recap, what seems to happen is that the emulated TSC "suddenly" starts to run at a different rate (always slower?) after somewhere after between about 650,000 and 680,000 seconds of uptime (~7.5 days). BTW, I now remember why I was eager to upgrade Xen on all my machines -- I'm back running 4.11 and 4.13 on the two older machines and I've already had one domU lock up with a spew of "xvif1i0 GNTTABOP_copy[0] Rx -3" messages on the dom0 console. This problem was fixed in 4.18. -- Greg A. Woods <gwoods%acm.org@localhost> Kelowna, BC +1 250 762-7675 RoboHack <woods%robohack.ca@localhost> Planix, Inc. <woods%planix.com@localhost> Avoncote Farms <woods%avoncote.ca@localhost> processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 44 model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz stepping : 2 cpu MHz : 2400.09 apicid : 0 initial apicid : 32 fpu : yes fpu_exception : yes cpuid level : 11 wp : no flags : fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx rdtscp lm pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes lahf_lm dtherm ida arat clflush size : 64 processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU X5460 @ 3.16GHz stepping : 6 cpu MHz : 3158.79 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : no flags : fpu vme de tsc msr pae mce cx8 apic sep mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dtherm clflush size : 64
Attachment:
pgp05F3Xmewes.pgp
Description: OpenPGP Digital Signature