Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Strange numbers from gettimeofday(2)



On Fri, Jan 13, 2006 at 11:21:58PM -0500, Jed Davis wrote:
> Jed Davis <jdev%panix.com@localhost> writes:
> 
> > Sometimes, gettimeofday(2) will return large negative numbers for the
> > tv_usec field; this has been happening spontaneously, but I've been
> > able to reproduce it by either pausing or ddb-breaking a domU for a
> > few seconds (the exact interval varies on different hardware).
> 
> It's worse than that, actually.  The problem AFAICT is in the MI
> cc_microtime (in kern_microtime.c):
> 
>         t.tv_usec += (cc * ci->ci_cc_ms_delta) / ci->ci_cc_denom;
>       while (t.tv_usec >= 1000000) {
>               t.tv_usec -= 1000000;
>               t.tv_sec++;
>       }
> 
> It looks like the RHS of the += winds up as complete garbage, and
> sometimes the mutiply overflows and winds up making tv_usec negative.
> When it's negative, that gets passed back to userspace, and e.g. BIND
> will notice this and complain; when it's positive, it gets folded into
> the tv_sec, and then cron sees the clock suddenly bounce back and
> forth by up to ~35 minutes (1<<31 us) and does undesirable things
> (like running three copies of a script at once that step on each
> others' temp files).
> 
> One problem with cc_microtime is that is assumes that each CPU NetBSD
> knows about is an actual physical CPU, and thus a cycle counter
> timestamp taken some number of ticks ago is still valid.
> 
> That might not be a problem with HT; I don't know how that works.
> 
> Oh, of course: It will *definitely* break if the domain is paused or
> in ddb for more than a second, because xen_timer_handler will call
> cc_microset multiple times, passing it time(9) each time, which is
> dutifully being advanced by the stacked-up calls to hardclock(9).  So
> the second time cc_microset sees that more or less a second has
> passed, but very few CPU cycles have; it (if I read it correctly) then
> estimates a very low CPU speed, and for the next second of real time
> cc_microtime gets absurdly large values which overflow and cause
> negativeness.
> 
> Now, &cc_microtime gets set as microtime_func because a TSC is
> detected in arch/xen/i386/identcpu.c, though I don't know that Xen
> will work on anything without a TSC; my first thought was that it
> might be desirable to disable that and always use xen_microtime.

Yes, of course this was my intend at last. I didn't notice microtime_func
was being overwritten.

> However: xen_microtime would need to call get_tsc_offset_ns() to be of
> any use, and really shouldn't need to call get_time_values_from_xen()
> as long as the timer event handler does, since (according to the docs)
> a timer event will be asserted whenever a domain becomes scheduled.
> 
> Problem 1: It's still completely ignoring time(9), which is very wrong

Yes, this part was never finished.

> AIUI; and resettodr() is a no-op, which makes it worse.

What could resettodr() do for a domU ? Only dom0 can set Xen's time.

> 
> Problem 2: I tried this, and the shadow_tv was 42 seconds fast and
> drifting slowly but noticeably forward; the dom0 was running ntpdate
> from cron and was on time.  (Hm... the DOM0_SETTIME is called only
> when resettodr() is, so if the drift is small enough for ntpdate to
> always use adjtime(2), it won't ever correct Xen's time?)

The problem is that this would cause the Xen time to go backward on
occasion, isn't it ? Do we really want this ?
I don't think there's a way to adjust the clock for the hypervisor,
at last for Xen-2.0. I'll have to check what Xen-3 offers for this.

> 
> Problem 3: yamt mentioned on ICB that the tsc_timestamp in the
> shared_info page might not be right for the current CPU; I haven't
> checked on what Xen actually does here yet, but it seems to me that
> that would be a bug, as the timestamp is useless if it's from
> undefined physical CPU that we can't access.

I think tsc_timestamp is right. the HYPERVISOR_shared_info is per-domain,
and each domain is single-CPU and attached to a CPU for its life.
I didn't check Xen's sources, I think it would be quite hard to
not get this right.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--



Home | Main Index | Thread Index | Old Index