> Date: Sun, 23 Oct 2022 07:39:25 -0600
> From: Warner Losh <imp%bsdimp.com@localhost>
>
> I guess a more accurate way of saying this is that leap seconds simply
> aren't reliable, cannot be made reliable, and this affects normalization
> in ways too numerous to mention due to the details of the tz files, bugs
> in the system, and lack of others to implement them correctly.
I think you mean `POSIX clocks simply aren't reliable'. They _could_
be made reliable, though, by fixing the the calendar arithmetic
formula in POSIX for mapping time_t to and from UTC -- just like POSIX
already fixed the bug in its Gregorian leap year formula.
Except they can't, at least not practically enough to be a standard. The
Gregorian Leap Year formula is a mathematical formula that needs no
further data other than the broken down time to compute. It's not an
observational calendar, but a computational or arithmetic one. UTC is an
observational calendar. We barely know if there's going to be a leap
second in the coming months, and have nothing more than a vague notion
of when the one after that might be. You must have a table of all past
leap seconds to do any kind of sensible mapping. And you also must
have some way to keep that up to date, even when machines are
powered off, or installed from not really that old media (anything older
than 6 months can't possibly have the right leap table, except by
chance). And then the question becomes how do you get it, do you
assume connectivity, some standard media format, some standard file
format, etc. All of these details means POSIX can't really fix this. And
even if they do, the current formula has been around so long there's a
lot of dusty decks of code that will likely silently break. You can ameliorate
that somewhat by inventing new interfaces, but issues like the one you go
into below will still persist.
> > The code works with either set of tzdata files, POSIX stretchy secs,
> > or UTC with leap secs - claiming that one doesn't happen, or cannot
> > happen, isn't really correct.
>
> Yea, and even 'posix stretchy sec' is really a misnomer. POSIX simply
> counts time without leap seconds. Each second is the same length,
Not at all.
If you use a monotonic timer to sample the POSIX clock before and
after a leap second, the POSIX clock will appear to have taken twice
as long as it should to pass the leap second.
Of course, it's worse. If sampled at _subsecond_ intervals, a POSIX
clock behaves _erratically_: it spontaneously rewinds itself!
Suppose we have a machine with a monotonic clock that counts SI
seconds as well as a POSIX clock:
SI monotonic POSIX
123.25 1483228799.00
123.50 1483228799.25
123.75 1483228799.50 # t0 = boottime + 123.75
124.00 1483228799.75
124.25 1483228800.00 # leap second begins at 2016-12-31T23:59:60Z
124.50 1483228800.25
124.75 1483228800.50
125.00 1483228800.75 # t1 = boottime + 125.00
125.25 1483228800.00 # POSIX clock rewinds at 2017-01-01T00:00:00Z!
125.50 1483228800.25
125.75 1483228800.50 # t2 = boottime + 125.75
126.00 1483228800.75
At supersecond resolution, t2 - t0 is a duration of 2 SI seconds, but
a POSIX clock reports a time difference POSIX(t2) - POSIX(t0) of 1, so
`POSIX seconds' are not always SI seconds -- it is not the case that
`each [POSIX] second is the same length', even ignoring physical clock
sampling error.
Right. Except during that brief interval around a leap second, all the seconds
are the same size. They aren't expanded or contracted by running the clock
at a different frequency that was done between approx 1960-1972 which is
often referred to as the rubber leap second era. It was more on that basis that
I was objecting to the turn of phrase because that sort of thing isn't happening.
And, of course, at subsecond resolution, the POSIX clock rewinds. The
monotonic clock correctly has t1 < t2, but POSIX(t1) > POSIX(t2). And
this erratic behaviour is much worse than a typical NTP-driven clock
adjustment at random times, because by design this erratic behaviour
happens on ~every computer on the planet simultaneously!
Yea, if NTP knows about the leap, it can deal with it. The problem as you
say comes in when the stratum 1 servers don't announce the leap second
far enough in advance for the implementations to cope. It then devolves to
the 'when, exactly, do you step the time back' problem since there's a couple
of choices, unless you have the 'leap smear' ntp servers which do it over
a few hours.
There's no need for this nonsense except insistence on the formula
that says every UTC day is counted by 86400 `POSIX seconds'. POSIX
could be revised to fix this bug in the clock by just not doing civil
calendar adjustments in the basic clock that goes tick-tick-tick for
counting what most people think are going to be SI seconds. For the
time_t<->UTC conversion in libc, machines with out-of-date tzdb would
just be off by a few seconds sometimes, no worse than being off by an
hour in the time_t<->localtime conversion with an out-of-date tzdb
across an updated summer time change.
Yes, it could be. But that 'off by a few seconds sometimes' is a deal killer
for many applications, especially since there's no good mechanism today
to tell applications there's a new database (though the bad one of statting
the files is often done every so often for some of the database calls).
And that's the problem with leap seconds in a nutshell: You need perfect
knowledge of them to implement them perfectly, and when you don't have
that knowledge, people shrug it off since it's only a second or two so never
really work to fix the status quo. And perfect knowledge about the future
is hard: we'd be way better off accepting a larger DUT1 than 1s and going
to an arithmetic formula because we can predict +/- 10s how many leaps there
will be in the next century....
But I guess this back and forth just reinforces your original point that we
shouldn't get into it at all in the limited space of the manual.
Warner