Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: timekeeping regression



So, yes, there's definitely a Xen timekeeping regression exhibited on
some hardware.

It's definitely in the Xen kernel though, not in NetBSD (though I
believe xen_clock.c can be improved, as in the version I posted some
time ago).

The problem begins somewhere between Xen-4.11 and Xen-4.18 (probably
since 4.13 actually, though that'll be proven in about a week's time).
I've reinstalled 4.11 on one of the machines and it has been running
with a stable clock for nearly 15 days now, and I recall previously
having it run for much longer without problem.

(I've still not yet tried 4.15 or any other intermediate version -- it's
a lot of work to fix it up to make it usable in my configuration).

The problem is not fixed in 4.20-rc.

It only seems to happen on some hardware though.

I have the following machines as reported by Xen & NetBSD (two variants
of the latter):

CPU Vendor: Intel, Family 6 (0x6), Model 44 (0x2c), Stepping 2 (raw 000206c2)
Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz, id 0x206c2

CPU Vendor: Intel, Family 6 (0x6), Model 23 (0x17), Stepping 6 (raw 00010676)
Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz, id 0x10676
Intel(R) Xeon(R) CPU           E5440  @ 2.83GHz, id 0x10676

(/proc/cpuinfo details for each below)

The first one, model 44, works fine.  It's running Xen-4.18 and has a
current uptime of 83 days and it's clock, as well as those of the NetBSD
and FreeBSD domUs running under it, have been rock solid.

The second CPU model has the problem.

To recap, what seems to happen is that the emulated TSC "suddenly"
starts to run at a different rate (always slower?) after somewhere after
between about 650,000 and 680,000 seconds of uptime (~7.5 days).

BTW, I now remember why I was eager to upgrade Xen on all my machines --
I'm back running 4.11 and 4.13 on the two older machines and I've
already had one domU lock up with a spew of "xvif1i0 GNTTABOP_copy[0] Rx
-3" messages on the dom0 console.  This problem was fixed in 4.18.

--
					Greg A. Woods <gwoods%acm.org@localhost>

Kelowna, BC     +1 250 762-7675           RoboHack <woods%robohack.ca@localhost>
Planix, Inc. <woods%planix.com@localhost>     Avoncote Farms <woods%avoncote.ca@localhost>


processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 44
model name      : Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz
stepping        : 2
cpu MHz         : 2400.09
apicid          : 0
initial apicid  : 32
fpu             : yes
fpu_exception   : yes
cpuid level     : 11
wp              : no
flags           : fpu de tsc msr pae mce cx8 apic sep mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx rdtscp lm pni pclmulqdq ssse3 cx16 sse4_1 sse4_2 popcnt aes lahf_lm dtherm ida arat
clflush size    : 64



processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
model name      : Intel(R) Xeon(R) CPU           X5460  @ 3.16GHz
stepping        : 6
cpu MHz         : 3158.79
apicid          : 0
initial apicid  : 0
fpu             : yes
fpu_exception   : yes
cpuid level     : 10
wp              : no
flags           : fpu vme de tsc msr pae mce cx8 apic sep mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 lahf_lm dtherm
clflush size    : 64


Attachment: pgp05F3Xmewes.pgp
Description: OpenPGP Digital Signature



Home | Main Index | Thread Index | Old Index