Current-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: What to do about "WARNING: negative runtime; monotonic clock has gone backwards"
> Date: Thu, 27 Jul 2023 15:05:23 +1000
> from: matthew green <mrg%eterna.com.au@localhost>
>
> one problem i've seen in kern_tc.c when the timecounter returns
> a smaller value is that tc_delta() ends up returning a very large
> (underflowed) value, and that makes the consumers of it do a very
> wrong thing. eg, -2 becomes 2^32-2, and then eg in binuptime:
>
> 477 bintime_addx(bt, th->th_scale * tc_delta(th));
>
> or in tc_windup():
>
> 933 delta = tc_delta(th);
> 938 th->th_offset_count += delta;
> 939 bintime_addx(&th->th_offset, th->th_scale * delta);
>
> i "fixed" the time goes backwards on sparc issue a few years ago
> with this change, which avoids the above issue:
>
> http://mail-index.netbsd.org/source-changes/2018/01/12/msg091064.html
>
> but i really think that the way tc_delta() can underflow is a
> bad problem we should fix properly, i just wasn't sure of the
> right way to do it.
I don't understand, what do you mean by underflow here?
Part of the API contract of a k-bit timecounter(9) is that the
underlying clock must not have a frequency higher than f * 2^k / 2,
where f is the frequency of tc_windup calls.[*]
For example, if f = 100 Hz (i.e., hz=100), and k = 32 (as we use),
then the maximum timecounter frequency is 100 Hz * 2^32 / 2 ~= 214
GHz. Even if f = 10 Hz, this is 21.4 GHz.
Under this premise, in the duration between two tc_windup calls,
consecutive calls to get_timecount() mod 2^k can't differ by more than
2^k / 2. And each call to tc_windup resets th->th_offset_count :=
get_timecount().
So no matter how many times you call tc_delta(th) within that time,
(get_timecount() - th->th_offset_count) mod 2^k can't wrap around,
i.e., a sequence of calls must yield a nondecreasing sequence of k-bit
integers.
I don't know what the sparc timecounter frequency is, but the Xen
system timecounter returns units of nanoseconds, i.e., runs at 1 GHz,
well within these bounds. So this kind of wraparound leading to
apparently negative runtime -- that is, l->l_stime going backwards --
should not be possible as long as we are calling tc_windup() at a
frequency of at least 1 GHz / (2^k / 2) = 0.47 Hz.
That said, at a 32-bit timecounter frequency of 1 GHz, if there is a
period of about 2^32 / 1 GHz ~= 4.3sec during which we miss all
consecutive hardclock ticks, that would violate the timecounter(9)
assumptions, and tc_delta(th) may go backwards if that happens.
So I think we need to find out why we're missing Xen hardclock timer
interrupts. Should also make the dtrace probe show exactly how many
hardclock ticks in a batch happened, and should raise an alarm (with
or without dtrace) if it exceeds a threshold.
[*] Actually the limit is closer to f * 2^k, not f * 2^k / 2, but
there probably has to be a little slop for the computational
overhead of tc_windup to ensure the timehands are updated before
tc_delta would wrap around; a factor of two gives a comfortable
margin of error here.
Home |
Main Index |
Thread Index |
Old Index