SYNOPSIS
#include <sys/timex.h>
int ntp_adjtime(struct timex *buf);
DESCRIPTION
struct timex {
// ...
/* Maximum error (microseconds) */
long maxerror;
/* Current time (read-only, except for
ADJ_SETOFFSET); upon return, time.tv_usec
contains nanoseconds, if STA_NANO status
flag is set, otherwise microseconds */
struct timeval time;
// ...
};
NTP
The Network Time Protocol is the most available and widely-used time synchronization protocol, but is claimed to offer the worst maximum clock error.
Overview
Usage
The API for fetching NTP’s state is ntp_adjtime(2)
. Significantly shortened:
[Kudu] Wait for NTP synchronization on startup before checking the time
Time system time can experience a large jump (forwards or backwards) when a system synchronizes with NTP for the first time. It is wise to wait for that synchronization to occur first before checking the system time.
Specifically, chrony will never step the system clock after the initial synchronization, so time will always be monotonic thereafter (as long as chrony remains in control of the system clock).
[Kudu] Call sched_yield() before checking and sending the time
Any unpredictable delay between fetching the current time and sending it across the network will increase the error on clock synchronization. One of the potential causes of unpredictable delay is preemption. Invoking sched_yield(2)
before fetching and sending the time will make it highly unlikely that the time quantum is exceeded between checking and sending the time.
[Kudu] Detect cloud type, and configure NTP to cloud’s NTP provider
A large number of users are running on the cloud, and a number of cloud providers offer highly accurate NTP endpoints. Therefore, it’s a nice addition to auto-detect the fact that the program is running on a cloud provider, and automatically configure NTP to point at the special highly-accurate NTP endpoint.
[Kudu] Always re-check STA_NANO, as it can change at runtime.
NTP will either offer your time in milliseconds or nanoseconds. Some NTP implementations will toggle that choice at runtime, so your code must tolerate transitions between the two.
[Kudu] Tolerate transient failures in ntp_adjtime
It’s possible that ntp_adjtime will fail for a brief window as the clock temporarily loses synchronization. During this time, you can continue to advance the clock and maxerror manually, and time out after a short window.
[Rust] CLOCK_MONOTONIC goes backwards
Also just be aware that monotonic clocks aren’t impervious to backward jumps. Widely deployed programs (firefox) and libraries (rust stdlib) have noticed that monotonic clocks are not as monotonic as they’re supposed to be.
Open Questions
SCM_DROP_IF_LATE
When pulling the current time and sending it across the network, calling sched_yield() makes a decent attempt at not context switching away between the first and second parts. However, calling send() on a packet, and it actually being sent, are two different things. sendmsg() accepts a flag SCM_DROP_IF_LATE, which adds a deadline when enqueing a packet into the network stack. If it can’t be sent before the deadline, then it will be dropped. This seems like it would help bound how late a packet could be sent out after the current system time was checked.
I don’t see anyone using it for this case. There’s very few uses of it at all, only 10 hits on github code search. Historically this has correlated with kernel features that are either partially implemented or have significant limitations.
TCP Timestamps
TCP Extensions for High Performance defines a TCP extension, TCP Timestamps, which places a timestamp on all packets. This is used to build Round Trip Time Measurement into the TCP protocol. While this has utility for TCP itself, this seems quite useful as a way to continuously transparently synchronize your clock with peers. However, I can’t seem to find any way that the timestamps are exposed to userspace or eBPF for processing.
PTP
Huygens
[huygens] is an NTP-like approach, offering PTP-like time bounds.
References
-
[spanner]: J. C. Corbett et al., “Spanner: Google’s Globally-Distributed Database,” in 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), Hollywood, CA, Oct. 2012, pp. 261-264. [Online].
-
[sundial]: Y. Li et al., "Sundial: Fault-tolerant Clock Synchronization for Datacenters," in 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), 2020, pp. 1171-1186. [Online].
-
[ptp-wifi]: P. Chen and Z. Yang, "Understanding Precision Time Protocol in Today’s Wi-Fi Networks: A Measurement Study," in 2021 USENIX Annual Technical Conference (USENIX ATC 21), Jul. 2021, pp. 597-610. [Online].
-
[huygens]: Y. Geng et al., "Exploiting a Natural Network Effect for Scalable, Fine-grained Clock Synchronization," in 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18), Renton, WA, Apr. 2018, pp. 81-94. [Online].
See discussion of this page on Reddit, Hacker News, and Lobsters.