Port-xen archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Strange numbers from gettimeofday(2)
Jed Davis <jdev%panix.com@localhost> writes:
> Sometimes, gettimeofday(2) will return large negative numbers for the
> tv_usec field; this has been happening spontaneously, but I've been
> able to reproduce it by either pausing or ddb-breaking a domU for a
> few seconds (the exact interval varies on different hardware).
It's worse than that, actually. The problem AFAICT is in the MI
cc_microtime (in kern_microtime.c):
t.tv_usec += (cc * ci->ci_cc_ms_delta) / ci->ci_cc_denom;
while (t.tv_usec >= 1000000) {
t.tv_usec -= 1000000;
t.tv_sec++;
}
It looks like the RHS of the += winds up as complete garbage, and
sometimes the mutiply overflows and winds up making tv_usec negative.
When it's negative, that gets passed back to userspace, and e.g. BIND
will notice this and complain; when it's positive, it gets folded into
the tv_sec, and then cron sees the clock suddenly bounce back and
forth by up to ~35 minutes (1<<31 us) and does undesirable things
(like running three copies of a script at once that step on each
others' temp files).
One problem with cc_microtime is that is assumes that each CPU NetBSD
knows about is an actual physical CPU, and thus a cycle counter
timestamp taken some number of ticks ago is still valid.
That might not be a problem with HT; I don't know how that works.
Oh, of course: It will *definitely* break if the domain is paused or
in ddb for more than a second, because xen_timer_handler will call
cc_microset multiple times, passing it time(9) each time, which is
dutifully being advanced by the stacked-up calls to hardclock(9). So
the second time cc_microset sees that more or less a second has
passed, but very few CPU cycles have; it (if I read it correctly) then
estimates a very low CPU speed, and for the next second of real time
cc_microtime gets absurdly large values which overflow and cause
negativeness.
Now, &cc_microtime gets set as microtime_func because a TSC is
detected in arch/xen/i386/identcpu.c, though I don't know that Xen
will work on anything without a TSC; my first thought was that it
might be desirable to disable that and always use xen_microtime.
However: xen_microtime would need to call get_tsc_offset_ns() to be of
any use, and really shouldn't need to call get_time_values_from_xen()
as long as the timer event handler does, since (according to the docs)
a timer event will be asserted whenever a domain becomes scheduled.
Problem 1: It's still completely ignoring time(9), which is very wrong
AIUI; and resettodr() is a no-op, which makes it worse.
Problem 2: I tried this, and the shadow_tv was 42 seconds fast and
drifting slowly but noticeably forward; the dom0 was running ntpdate
from cron and was on time. (Hm... the DOM0_SETTIME is called only
when resettodr() is, so if the drift is small enough for ntpdate to
always use adjtime(2), it won't ever correct Xen's time?)
Problem 3: yamt mentioned on ICB that the tsc_timestamp in the
shared_info page might not be right for the current CPU; I haven't
checked on what Xen actually does here yet, but it seems to me that
that would be a bug, as the timestamp is useless if it's from
undefined physical CPU that we can't access.
--
(let ((C call-with-current-continuation)) (apply (lambda (x y) (x y)) (map
((lambda (r) ((C C) (lambda (s) (r (lambda l (apply (s s) l)))))) (lambda
(f) (lambda (l) (if (null? l) C (lambda (k) (display (car l)) ((f (cdr l))
(C k))))))) '((#\J #\d #\D #\v #\s) (#\e #\space #\a #\i #\newline)))))
Home |
Main Index |
Thread Index |
Old Index