pk.org: CS 417/Lecture Notes

Decentralized Storage -- The Domain Name System

A Planet-Scale Distributed Data Store

Paul Krzyzanowski – 2026-03-09

The Domain Name System (DNS) is so ubiquitous that it is easy to forget it is a distributed database. Every time you connect to a web server, fetch an email, or make an API call, a distributed lookup system resolves a human-readable name like cs.rutgers.edu to an IP address. DNS handles hundreds of billions of queries per day across a global infrastructure with no single server, no single point of control, and no central database.

DNS was designed in 1983 by Paul Mockapetris at USC/ISI. Before DNS, name-to-address mappings were maintained in a flat text file called HOSTS.TXT, distributed by the Network Information Center at SRI International. Every host on the ARPANET periodically downloaded a fresh copy. By the early 1980s this was already breaking down. The ARPANET had a few thousand hosts; the Internet was about to have millions. Mockapetris’s design replaced the flat file with a hierarchical, distributed, cacheable database.


The Structure of the Namespace

DNS organizes names in a hierarchy. A fully qualified domain name like www.cs.rutgers.edu. is read right to left: the root (.), then the top-level domain, or TLD (edu), then the second-level domain (rutgers), then the subdomain (cs), then the hostname (www).

The DNS namespace is managed through delegation. The root is managed by a small number of root name servers (13 logical addresses, now anycast to hundreds of physical servers worldwide). The root delegates responsibility for each top-level domain to a set of TLD name servers. ICANN1 manages .edu; Rutgers manages rutgers.edu; the CS department manages cs.rutgers.edu. Each zone owner controls the authoritative data for its portion of the namespace without asking permission from the level above.

This delegation makes DNS both decentralized and administratively scalable. No central authority knows every IP address; each organization maintains only its own records.


Resolution

When you type www.cs.rutgers.edu into a browser, your operating system contacts a recursive resolver, typically provided by your ISP or a public service like Google’s 8.8.8.8 or Cloudflare’s 1.1.1.1. The recursive resolver does the work of navigating the hierarchy.

Your machine runs a stub resolver: a minimal component built into the operating system that simply forwards queries to the configured recursive resolver and waits for the final answer. The term “recursive” refers to the resolver’s role. It resolves the name fully on the client’s behalf. Internally, it uses iterative queries, asking each nameserver in the hierarchy for a referral to the next, because authoritative nameservers return referrals rather than chasing lookups themselves.

Each level of the hierarchy is served by an authoritative nameserver: a server that holds the definitive records for a zone and answers queries about names within it.

The resolver asks a root name server: “Who is responsible for .edu?” The root server responds with the addresses of the .edu TLD servers. The resolver asks a .edu TLD server: “Who is responsible for rutgers.edu?” That server responds with the addresses of Rutgers’s authoritative name servers. The resolver asks a Rutgers name server: “Who is responsible for cs.rutgers.edu?” Rutgers delegates to the CS department’s servers. Finally, the resolver asks the CS name server for the IP address of www.cs.rutgers.edu and gets the answer. It returns the address to your browser.

You can inspect these delegations using the dig command. For example,

dig +trace www.cs.rutgers.edu

Alternatively, you can use a web interface like Simple DNS Plus.

This is iterative resolution: the recursive resolver performs each step in sequence, collecting referrals. The resolver does all the work; the name servers each respond with either an answer or a referral to the next level of the hierarchy and do not chain queries on the resolver’s behalf.


Caching

DNS resolution touches multiple servers for every new query, which would be impossibly slow and costly if done for every request. The solution is aggressive caching.

Every DNS response carries a Time To Live (TTL) value set by the zone owner. The recursive resolver caches the response for the duration of the TTL. Subsequent queries for the same name are answered from cache without hitting the authoritative servers.

TTL values are a policy knob. A TTL of 86400 seconds (24 hours) means cached answers are valid for a day; changing an IP address takes up to a day to propagate worldwide. A TTL of 60 seconds means changes propagate in a minute, but the authoritative servers then handle far more traffic. Operators tune TTL based on how often addresses change and how quickly they need changes to take effect.

Caching happens at multiple levels. The recursive resolver caches responses. The operating system has its own resolver cache. Browsers maintain yet another cache. This layered caching is why “clearing your DNS cache” sometimes requires flushing at multiple levels to actually see an updated address.


Limitations

DNS is not strongly consistent. Because answers are cached, a client may hold a stale address for up to a TTL period after an address change. If you change an IP address, some clients will still use the old address until their cached entry expires.

DNS provides limited querying. You can look up a name, but you cannot ask “which names resolve to this IP address?” without a separate mechanism. There is no general search facility.

DNS was not originally designed with security. An attacker who can intercept DNS responses can redirect traffic to a malicious server – a DNS spoofing or cache poisoning attack. DNSSEC was developed to address this using cryptographic signatures, but deployment has been slow and uneven.

Finally, DNS is read-mostly. Updates to authoritative data are made by zone administrators, and changes propagate through the cache TTL mechanism rather than through a distributed write protocol. This makes DNS excellent for data that changes infrequently, but unsuitable for data that changes at write frequencies comparable to a database.


DNS as a Distributed System Design Lesson

DNS illustrates several principles that appear throughout distributed systems.

Hierarchical organization avoids the need for any single node to hold complete knowledge. No server knows all names; each knows only its own zone and where to refer queries it cannot answer.

Delegation lets each sub-authority manage its own data independently. Rutgers does not need to ask ICANN’s permission to add a hostname to rutgers.edu.

Caching reduces load on authoritative servers at the cost of bounded staleness. The TTL makes the consistency window explicit and tunable by the zone owner.

Read-mostly workloads naturally lend themselves to caching. DNS was designed around the observation that names change rarely and are read constantly. A system with frequent writes and strong consistency requirements would need a very different design.

These properties are the reason DNS has worked at Internet scale for four decades.


Next: Week 6 Study Guide

Back to CS 417 Documents


  1. ICANN (Internet Corporation for Assigned Names and Numbers) is a global, non-profit organization that coordinates unique identifiers like domain names (e.g., .com) and IP addresses.