Clock Synchronization

UTC (Coordinated Universal Time) is the primary time standard used worldwide to track time. Time zones are an offset from UTC. Computers track time and try to stay close to UTC. They accomplish this by synchronizing their clocks with a system that knows the time.

How Computers Keep Time

Computers maintain time using hardware and software components. When a computer boots, the operating system reads time from a battery-powered real-time clock (RTC), a chip designed for low power consumption to survive power outages, not for accuracy. Once the OS is running, it maintains a more accurate system clock by reading high-resolution hardware counters, such as the Intel timestamp counter (TSC) or the ARM Generic Timer.

Most systems represent time as elapsed seconds since an epoch: January 1, 1970, 00:00:00 UTC for Unix systems, or January 1, 1601 for Windows. This representation avoids timezone confusion, daylight saving time ambiguities, and reduces time arithmetic to integer operations. The system clock that applications see is called wall time: it tracks UTC but can jump when synchronization corrections are applied.

Accuracy, Precision, and Resolution

Accuracy measures how close a measurement is to the true value. If your clock shows 12:00:00.005 and true UTC is 12:00:00.000, your clock has 5ms of error.

Resolution is the smallest time increment a clock can represent. A nanosecond-resolution clock can distinguish events 1 nanosecond apart. Higher resolution does not guarantee accuracy.

Precision is the consistency of repeated measurements. A clock consistently 5ms fast is precise but not accurate.

When we say “NTP achieves 10ms accuracy,” we mean clocks are within 10ms of true UTC, not that they measure time in 10ms increments.

Why Physical Clocks Drift

All physical clocks drift. A quartz oscillator’s frequency depends on temperature, manufacturing variations, atmospheric pressure, humidity, and aging. Consumer hardware typically drifts at 50-100 parts per million (ppm), meaning clocks can drift apart by almost nine seconds per day. Without synchronization, distributed systems quickly lose any agreement on time.

The Clock Model

A physical clock can be modeled as:

\[C(t) = \alpha t + \beta\]

where:

\(C(t)\) is the clock’s reading at true time \(t\)
\(\alpha\) represents the clock rate (ideally 1.0, but drift causes deviation)
\(\beta\) represents the offset from true time

Drift is the rate error: how fast a clock runs compared to true time. Offset is the instantaneous difference between a clock and true time. Even after perfect synchronization (zero offset), drift causes the offset to grow again. This is why periodic resynchronization is essential.

Clock Adjustment

When synchronization detects an offset, systems prefer slewing over stepping:

Slewing gradually adjusts the rate at which the clock advances, so each tick of real time advances the system clock by slightly more or slightly less than one tick. The displayed time catches up or falls back without ever going backward. This is preferred for small offsets (typically below 128ms).

Stepping instantly jumps to the correct time. This may be used for larger offsets (often ≥ 128ms). Stepping may break applications measuring durations or using timestamps (e.g., software build environments).

Cristian’s Algorithm

The most direct synchronization approach sends a request to a time server and receives a timestamped reply. The challenge is network delay: by the time the response arrives, it no longer reflects the current time. Cristian’s algorithm assumes the delay is symmetric and the timestamp was generated at the midpoint of that delay.

Algorithm:

Client sends request at local time t₀
Server responds with timestamp T_S
Client receives reply at t₁
Client estimates time as \(T_S + \frac{t_1 - t_0}{2}\)

In reality, the server’s time may have been generated before or after the midpoint of the delay, potentially leading to an error in the time value. If we know the best-case network transit time, it will place additional limits on the error beyond the overall delay.

Error bound: If the minimum one-way delay is t_min, the error will be:

\[\epsilon \leq \frac{(t_1 - t_0) - 2t_{\min}}{2}\]

Clients can retry to find the lowest round-trip time, which yields the tightest error bound.

Additive errors: When machines synchronize in chains (A from B, B from C), errors accumulate. A’s total error = ε_A + ε_B. This is why systems generally would try to avoid a deep hierarchy.

A limitation of Cristian’s algorithm is that it has a single point of failure: the server.

Network Time Protocol (NTP)

NTP solves the single point of failure problem through a hierarchical architecture:

Stratum 0: Reference sources (GPS, atomic clocks)
Stratum 1: Servers synchronizing directly from stratum 0
Stratum 2: Servers synchronizing directly from stratum 1 servers
Higher strata: Synchronize from lower strata (maximum 15 levels)

Fault tolerance through multiple sources: NTP encourages systems to query multiple time servers and use statistical techniques to identify and reject outliers. NTP combines the remaining time offset estimates using a weighted average, with more weight given to more reliable servers. NTP tracks each server’s jitter (delay variation) and dispersion (accumulated timing uncertainty), favoring more reliable sources.

Synchronization algorithm uses four timestamps:

T₁: Client sends request
T₂: Server receives request
T₃: Server sends response
T₄: Client receives response

Offset: \(\theta = \frac{(T_2 - T_1) + (T_3 - T_4)}{2}\)

The network delay is the round-trip time minus the estimate of the processing delay on the server:

Delay: \(\delta = (T_4 - T_1) - (T_3 - T_2)\)

Clock discipline gradually adjusts the system clock. For small offsets (< 128ms), it slews. For large offsets, it steps. The discipline learns and compensates for drift over time by adjusting the tick frequency of the system clock.

SNTP is a stripped-down subset suitable for clients that only consume time. It omits the sophisticated filtering and clock discipline of full NTP.

Precision Time Protocol (PTP)

PTP achieves sub-microsecond synchronization through hardware timestamping. Network interface cards with PTP support capture packet transmission and receipt timestamps at the physical layer, eliminating millisecond-level variability from software network stacks.

Architecture: A grandmaster clock provides authoritative time. Unlike NTP, where clients initiate requests, PTP is master-initiated: the grandmaster periodically multicasts sync messages.

The Best Master Clock Algorithm (BMCA) automatically selects the most suitable grandmaster based on priority, clock quality, accuracy, and stability.

PTP uses a four-message exchange:

Sync message at T₁
Follow_Up containing T₁
Delay_Req from slave at T₃
Delay_Resp containing T₄

The first two messages are split because some hardware cannot embed an accurate timestamp in the Sync message itself. The Follow_Up message exists only to deliver the timestamp T₁ that the master captured when it sent Sync.

Offset: \(\frac{(T_2 - T_1) - (T_4 - T_3)}{2}\)

Cost: Unlike NTP, PTP requires specialized network cards and switches with hardware timestamping support.

TrueTime

Google’s Spanner database uses a system called TrueTime that takes a different approach to distributed time. Instead of returning a single time value, TrueTime returns an interval that is guaranteed to contain true UTC. A typical interval might be 7 milliseconds wide. Spanner uses these intervals to order globally distributed transactions: when a transaction commits, Spanner waits out the uncertainty interval before acknowledging, so any later transaction is guaranteed to have a strictly later timestamp.

TrueTime requires GPS clocks and atomic clocks in every data center, so it is out of reach for most systems. The interesting contrast with NTP and PTP is that TrueTime makes the uncertainty in distributed time explicit rather than hiding it inside a point estimate.

Cloud Time Services

Cloud providers offer high-accuracy time services as managed infrastructure. AWS Time Sync, Azure Precision Time Protocol, and Google Public NTP each combine GPS and atomic clocks to deliver microsecond to sub-millisecond accuracy without requiring customers to deploy GPS hardware. Google’s NTP service uses leap second smearing, spreading the leap second over many hours so applications never see a 23:59:60 second or a sudden one-second jump. Hybrid and cross-cloud deployments still need to account for skew between providers.

When Physical Time Is Not Enough

Even perfectly synchronized physical clocks cannot order events that occur faster than clock resolution. At hundreds of thousands of events per second, many events share the same timestamp. More fundamentally, network delays obscure true ordering: an event timestamped earlier at one machine might arrive at another machine after local events with later timestamps.

For distributed databases, the question that has to be answered is one of causality: whether one update could have seen another, not chronology. This leads to logical clocks.

What You Don’t Need to Study

Specific oscillator frequencies (e.g., 32,768 Hz for RTC) or piezoelectric physics
ppm values for specific oscillator types
The Intel TSC or ARM Generic Timer
Windows epoch date (1601); knowing the Unix epoch (1970) is sufficient
Any exact thresholds for slewing vs stepping
Details of adjtimex system call
Specific accuracy numbers for different NTP configurations
The exact formulas for computing NTP or PTP offset and delay
PTP message format details