When & Where

The final exam will be held in our regular classroom on Monday, May 11, 2026. It will start promptly at 4:00 pm and you will have until 7:00 pm to complete it.

Exam rules

Be sure to arrive on time. If you arrive after the exam starts, you will not be allowed to take it.

This will be a closed book, closed notes exam. Calculators, phones, augmented reality glasses, laptops, and tablets are neither needed nor permitted. If you have these devices, you must turn them off, put them out of sight, and not access them for the duration of the exam.

No other electronic devices are permitted except for hearing aids, pacemakers, electronic nerve stimulators, other implanted medical devices, or electronic watches that function only as timekeeping devices or chronographs.

Bring a couple of pens or pencils with you, an eraser, and your ID.

Plan to use a pen only if you are supremely confident in not changing your mind about your answers. . Check here for information about pencils, sharpeners, and the craft of pencil sharpening.

Past exams

The exam will be similar in structure to mid-semester exams but will cover material for the entire course.

You can use my past exams as a guide to what this exam may look like, but realize there are differences in topics and in the sequencing of the topics. Expect around 25 multiple-choice questions. I do not refer to old exams when I come up with a new one, so it is likely that many of the topics that I considered important in past exams will show up on future exams. Some material may have changed, however, so do not worry about questions that appear to relate to topics we have not covered.

Get past 417 exams here.

Study guide

You are responsible for the course material up through, and including, the week 13 lecture on clusters.

The study guide is a concatenation of the study guides from the past lectures. It attempts to cover most of the material you should know. It is not a substitute for the lectures, lecture material, and other reading matter. All the material may not be in the guide. My goal is to put most of the information you need to know a concise with fewer elaborations.

You can also prepare your own guide, which would be a much better way to prepare for the exam!

→ Get the study guide

Topics

Here is a list of topics you are expected to know for the exam. It should be viewed as a coverage guide rather than a catalog of detailed algorithms or definitions to memorize.

Use it as a checklist. Go through each item and ask yourself whether you understand the concept and could recognize or apply it in context. If not, review the relevant notes.

Because the exam is multiple choice, your focus should be on understanding and being able to reason about the material, not on memorizing or reciting formal definitions.

Topics that you be familiar with and may be on the exam include:

Introduction & Key Concepts

Distributed system - definition
Single system image
Moore’s law
Vertical vs. horizontal scaling
Transparency goals
Nines of Availability (not the calculations)
Failure models
- Partial failure
- Single point of failure
- Redundancy
- Availability vs. reliability
- Series vs. parallel failures (understand the impact but you don’t need to memorize the formulas)
Types of failures: Fail-stop, fail-restart, omission, delays, partition, Byzantine failure
Latency: synchronous and asynchronous transmission modes
State, replicas, caching
- Caching vs. replication
- Stale data
- Consistency
- Global state
Transparency goals (just a high-level understanding)
- Location, Migration, Replication, Concurrency
Service models
- Centralized
- Client-server
- Multi-tier (how does it compare with layered architectures?)
- Peer-to-peer
- Hybrid P2P
You don’t need to know any of the “-as-a-service” models

Networking

Understand the concept of protocols and layering
- I will not ask you to enumerate the OSI or IP protocol stack, but you should understand the functions of the data link, network, transport, and application layers.
Data Link Layer
- Local Area Network (LAN)
- Ethernet - service guarantees
Network Layer: purpose
Transport Layer: TCP & UDP
- Transport endpoints (ports) vs. network endpoints (addresses)
- Port numbers
- TCP: connection-oriented service, full-duplex connections, reliable & in-order data delivery, flow control, and congestion control
- UDP: connectionless service, no retransmits, drop bad packets
- - What are advantages of using TCP or using UDP?
QUIC: why was it created?
Protocol encapsulation concept
Sockets
- Concept
- Operations at the system call level: create, bind, connect, listen, shutdown, communication (know what they do, not what the parameters are)
General principles of the design of the Internet
- Connection of disparate networks via routers
- Unreliable communication
- End-to-end principle
- Fate sharing
- Best-effort packet delivery

Remote procedure calls

Concept of a remote procedure call (RPC)
Stub functions (proxies and skeletons)
Stub generation
Interface definition language (IDL) concept
Marshalling
- Data representation, endianess
- Serialization: canonical vs. receiver-makes-right
- Pass by reference issue
- Explicit versus implicit typing
Semantics of remote procedure calls
- Failure of RPCs
- at most once and at least once semantics
Idempotent vs. non-idempotent functions, retries
Idempotency keys
Purpose of an RPC compiler
Purpose of an RPC name service
Versioning
Timeouts vs. deadlines
Surrogate process (Microsoft)
Marshaling and serialization examples
- RPC-specfic solutions: XDR, NDR
- Google Protocol Buffers
- JSON
- XML
Garbage collection
- Reference counting, leases, pinging (heartbeats)

Web services

Motivation for web services
XML-RPC: general concept
SOAP: just that it extended XML-RPC; purpose of WSDL
REST
- What problems was it trying to solve?
- How is its approach different from RPC?
- CRUD operations: POST, GET, PUT, DELETE
Stateful vs. stateless services
Advantages of HTTP/2 vs older HTTP
gRPC: why was it created? General principles
Need for observability; request IDs

Clock synchronization

What is UTC?
System clock, wall time
What’s an epoch?
Accuracy, resolution, precision
Clock drift, offset, and jitter
Compensation: slewing and stepping
Clock discipline
Cristian’s algorithm
- Understand the goal and formula for Cristian’s algorithm
- Error bounds. Remember that errors are cumulative (additive).
NTP
- NTP synchronization subnet
- NTP strata
- What is jitter?
- You don’t need to memorize the NTP(SNTP) formula but be sure to understand how it gives you the same result as Cristian’s
- You don’t need to know how dispersion and jitter are calculated
- You don’t need to know the NTP message structure or validation tests
PTP
- Goal of PTP vs. NTP
- Role of grandmaster
- What is the best master clock? (Don’t remember the priority order)
- Don’t memorize the formula

Logical clocks

Causal vs. concurrent events
Lamport’s algorithm
- Goals
- Happened-before relationship
- Partial ordering vs. total ordering
- Algorithm: know how it works
- Deficiencies
- Remedy for creating total order
Vector clocks
- Goals
- Algorithm
- Know how to identify concurrent events by comparing timestamps
- Understand how to represent vector clocks if the group membership is unknown (sets of node:clock values)
Hybrid logical clocks
- Goals
- Role of the two values
- Understand when L and C change

Group communication

Unicast, broadcast, multicast
I won’t ask about anycast
Sending vs. receiving vs. delivering; holdback queue
Understand each of these failure modes:
- Crash failure
- Omission failure
- Byzantine failure
- Partition failure
Reliability of message delivery
- Unreliable, best-effort delivery
- Best-effort reliable
  - feedback implosion
- Reliable multicast
- Durable multicast
- Publish-subscribe model (concept)
Message Ordering
- Unordered
- Single Source FIFO (SSF): understand how to achieve this
- Causal (partial) ordering: understand how this is achieved
  - When is a message not delivered?
- Total order: understand how to achieve this
- Synchronous order: understand what this means
  - What is a sync primitive?
Atomic multicast: What’s cmobined?
IP multicasting
- IP multicast address – what is its purpose?
- Internet Group Management Protocol (IGMP)
  - Purpose (host to edge router)
  - Understand the basic protocol: membership query, membership report, leave group
  - What is pruning?
- Protocol Independent Multicast (PIM)
  - IGMP vs. PIM
  - PIM Dense Mode multicast: flooding
  - PIM Sparse Mode multicast
  - Rendezvous points

Failure detection

What does the FLP impossibility result tell us about reaching agreement?
Push-based heartbeating
Pull-based heartbeating/pinging
Phi accrual failure detector
- Just understand what it tries to do
- You don’t need to know how φ maps to probabilities

Virtual Synchrony

Group membership service: purpose
What is a view and a view change?
What is a stable vs. unstable message?
Steps in a view change: what does flushing do?
What is a state transfer and when is it needed?

Mutual exclusion

Understand properties: safety, liveness, fairness
Types: centralized, token-based, contention-based
Central-server algorithm: messages involved
Distributed mutual exclusion algorithms:
- Token ring
- Lamport
- Ricart & Agrawala: optimizatin over Lamport’s
- Understand pros and cons of each

Leader election algorithms

Bully algorithm: how does it work?
Ring algorithm
- How does it work?
- How does it avoid duplicate elections?

Consensus

Partitions, split-brain problem
What is quorum?
Purpose of replicated state machines
What are replicated logs?
Understand consensus requirements
- Validity
- Agreement
- Termination (= progress)
FLP Impossibility Result: what does it say?
What is the purpose of Paxos (not the algorithm or phases)?
Raft
- Goal: fault tolerant consensus algorithm
- Raft quorum
- States: follower, candidate, leader
- Terms and term numbers
- Leader election, timeouts, heartbeat, split votes
- AppendEntries
- Log matching, committing entries
- You don’t need to know about membership changes

Coordination Services

Goals, replication
Chubby
- Purpose: lock manager, name server, and simple file system
- Architecture of a Chubby cell: master and replicas
- Named resources (files, locks)
- Event notification (& purpose)
- Whole file reads and writes
- Don’t try to memorize the API
- Client cache consistency
- Consensus for replication
Zookeeper
- Persistent vs. Ephemeral nodes
- You don’t have to know about sequential znodes
- You don’t hae to know the consistency model
- You don’t hae to know locks can be constructed
etcd
- Similarities with Chubby and Zookeeper (replicas via consensus-based log replication)
- Key difference: leases are different objects
What is the purpose of fencing tokens?

Network-Attached Storage

Transparency via VFS, mounting filesystems
Access models: remote access vs. upload/downloa
Semantics: sequential vs. session semantics
Stateful vs. stateless design
Caching options
Case studies
- NFS (original design)
  - Goals
- You do not need to know the automounter (we did not discuss it in class)
  - UDP vs TCP transport
  - Directory and file access protocol (don’t memorize the list of RPC calls but understand why they exist, what lookup does, why there’s no open)
  - Caching, validation, possible inconsistencies
  - Know that a user-level lock manager was added to NFS but otherwise don’t bother with features of successive versions.
- AFS (original design)
  - Goals
  - Caching model, semantics, callbacks
  - Volumes, uniform global namespace
- Coda
  - Goals
  - Replicated volumes
  - Accessible Volume Storage Group
  - Client modification log
  - Resolution
  - Disconnected operation
  - Reintegration
- SMB
  - Goals
  - Caching model: oplocks/leases (don’t memorize oplocks or leases but know what their purpose is)
  - Microsoft DFS: just know that it adds consistent naming, similar to AFS
Enhancements to NFS and SMB
- You don’t have to know version numbers (e.g., NFSv4 vs SMB2)
- You don’t have to knw anything about the SMB3 features
- Just understand the point of the mechanisms
- SMB2 – understand these concepts that improved performance
  - Pipelining
  - Compounding
- NFSv4
  - the protocol is now stateful
  - supports better caching (similar to oplocks)
  - compound RPC (compounding)

Distributed Lookup (Distributed Hash Tables)

Purpose: distributed, highly available key-value store
What is consistent hashing?
Content-Addressable Network
- Multi-dimensional hashes, zones, routing, node insertion
Chord
- Ring, successor nodes
- How do finger tables make query routing more efficient?
Amazon Dynamo
- Understand the goal: distributed, highly available key-value store
- Functional interface: get, put
- Consistency model for replication (eventually consistent, but tunable via N, W, R)
- Goals: incremental scalability, symmetry, decentralization, heterogenous systems
- Conflict resolution
  - Who does it?
  - Use of vector timestamps for conflict detection
- Partitioning among multiple nodes
- What is a virtual node and why are they a good idea?
- Hinted handoff

Distributed Lookup: DNS

Stub resolver
Recursive resolver
Authoritative server
Caching/consistency

Distributed file systems

GFS
- Design goals, data & metadata
- chunkservers - what do they do?
- master - what does it do?
- Google cluster environment: colocate computation and data on same set of machines
- operation log
- why large chunks?
- writing & replication: data flow vs. control flow
- You don’t have to know record append and snapshot features
- You don’t have to know how checksums and chunk version numbers are used
HDFS
- HDFS is practically a clone of GFS. Focus on GFS terminology. If I use any HDFS terminology, I will identify the GFS equivalent
- NameNode = Master
- DataNode = Chunkserver
- blocks = chunks
- Transaction log = Operation log
- You do not have to know any of the differences between GFS and HDFS (they’re minor)
Dropbox (from the homework video and notes)
- Goal
- How does server traffic load differ from most other sites that store data (facebook, twitter, …)?
- What’s a metadata server (metaserver)?
- Why was a notification server added? Polling vs. notifications

Distributed transactions

Transactions, write-ahead log, commit, abort
ACID properties
Concurrency control
- Optimistic vs. pessimistic
- Read locks vs. write locks
- Two-phase locking
- Strong Strict Two-Phase Locking (SS2PL)
- Private workspace (for OCC)
- Multi-version concurrency control (MVCC)
- Snapshot isolation
Deadlock
- Wait-for-graph
- Centralized detection, phantom deadlocks
- Chandy-Misra-Haas algorithm, probe messages
- Timestamp-based prevention: wait-die and wound-wait
- You don’t have to remember the difference between wait-die and wound-wait
Two-phase commit protocol
- Understand the need for a log
- What happens in each of the phases?
- Problems if coordinator or participants die
- Uncertain state
Three-phase commit protocol
- Precommit phase
- Supporting coordinator recovery
- Main problem with 3PC
Consistency models
- Linearizability
- Sequential consistency
- Causal consistency
- Eventual consistency
Scaling via replication and Brewer’s CAP Theorem
- Consistency, Availability, Partitions
- PACELC: how does it extend CAP?
Eventual consistency: BASE approach vs. ACID

Distributed Databases

NoSQL models
Key-value stores
Column-family or wide-column stores

Bigtable

Row keys, column families, column qualifiers
Timestamps
Sparse storage
Tablets, splitting, versions
Tablet servers, master server
Memtable, SSTable
Consistency model
Single-row atomicity; no transactions across rows

Cassandra

Peer-to-peer design
Distributed hash table, hash ring, virtual nodes
Data model: Partition key, Clustering columns
Replication factor
Tunable consistency
Memtable, SSTable, compaction

Spanner

Splits and Paxos groups
Two-phase commit
Strict two-phase locking
Wound-wait
Multiversion concurrency control
TrueTime and commit wait
Read-only transactions
Snapshot reads
External consistency

Database Comparisons

Key-range partitioning vs. hash partitioning
Bigtable vs. Cassandra
Paxos vs. two-phase commit
Serializability vs. external consistency

Distributed Computation

MapReduce

Master and workers
Shards
Map tasks, reduce tasks
Intermediate key-value pairs
Partitioning by key
Shuffle and sort
Locality
Stragglers, speculative execution
Failure handling

BSP

Supersteps
Barrier synchronization
Message passing
Checkpointing

Pregel / Giraph

Vertex-centric computation
Supersteps
Message passing
Vote to halt
Active and inactive vertices
Global termination

Spark

Driver program, workers, executors
RDDs, immutability
Partitioning
Transformations and actions
Lazy evaluation
Lineage
Caching
Narrow vs. wide dependencies
Shuffle

Distributed Machine Learning

I won’t ask about Distributed Machine Learning

Comparisons

MapReduce vs. Spark
BSP vs. Pregel
Batch vs. iterative workloads

Data in Motion

Event streaming
CDNs
Peer-to-peer distribution

Kafka and event streaming

Queuing vs. publish-subscribe
Brokers
Producers, consumers, consumer groups
Topics and partitioned logs
Offsets
Ordering within a partition
Replication: Leaders and followers
Scale and fault tolerance
Retention and replay

Stream prcessing

Stateless vs. stateful processing
Idempotent sinks
Backpressure
Event time vs. processing time
Tumbling, sliding, session windows
Watermarks
Spark Structured Streaming
- Micro-batch model
- Unbouned table
- Checkpointing
Apache Flink
- Just know that it does record-at-a-time streaming

CDNs

Flash crowd problem
Browser caching, caching proxies, load balancing, mirroring
Origin servers and edge servers
Push CDN, pull CDN
Static content, dynamic content, streaming video

Akamai

Overlay network, routing
Caching hierarchy
Request mapping
Edge caching

BitTorrent

Peer-to-peer architecture
.torrent file
Swarm
Trackers, seeders, and leechers
Pieces, Rarest-first
Tit-for-tat incentive model
Use of DHT (not how it is implemented)
Scalability and fault tolerance

Edge computing

I won’t ask you about edge computing

Comparisons

RabbitMQ vs. Kafka
Event streaming vs. traditional messaging
CDN delivery vs. direct origin delivery
DNS-based routing vs. anycast
CDN vs. BitTorrent distribution

Security in Distributed Systems

Security goals: Confidentiality, integrity, authentication, authorization, non-repudiation
Lateral movement

Cryptography

Symmetric and asymmetric cryptography
Hashes, MACs, digital signatures
Replay attacks, freshness mechanisms

TLS and certificates

TLS vs. mTLS: what they do, what they authenticate, not the protocol steps
Certificates, certificate authorities (CA), certificate chain concept: purpose of each
Expiration, rotation, revocation

Identity and access

Authentication vs. authorization
OAuth, OpenID Connect: what is each used for?
Access tokens, ID tokens, refresh tokens
JWTs and claims: just the concept
Workload identity, SPIFFE - just what it does
Principle of least privilege

Architecture and operations

Zero Trust concept
Micro-segmentation
API gateway: purpose
Service meshes: purpose
Don’t bother remembering north-south, east-west
You don’t have to know secret management
Short-lived credentials vs. long-lived keys

Design mistakes

Trusting the internal network
Authenticating users but not services
Long-lived shared secrets
Missing authorization checks
Over-privileged roles
Storing secrets

Clusters

Definition, single system image
Commodity hardware
Clustering types: high performance (HPC), high availability (HA), load balancing. storage (just the goals)
Top-of-rack (ToR) switch, Spine-leaf topology
Purpose of Remote DMA (RDMA)
Handling failure: heartbeats, fencing, split brain problem, quorum
Borg: purpose
- Borgmaster role
- Borglet role
- Scoring and filtering
- I won’t ask about cgroups, namespaces, jobs, tasks, and allocs
Kubernetes: purpose
- Pod, scheduler, kubelets
- I won’t ask about etcd, controller manager, or cloud controller manager.
- I won’t ask about container runtime or Kube-proxies
- I won’t ask about kubernetes controllers or Services
Load balancing
- I won’t ask about load-balancing strategies
- Advantage of stateless services
I won’t ask about health checks
Cluster interconnect: understand there’s an overhead to leaving the rack and to leaving the datacenter
What is a clustered file system? How does it differ from a network (or distributed) file system?
What is a heartbeat network?
Active/active vs. active/passive operation
Failover types: cold, warm, hot; multidirectional, cascading

Last update: Tue May 05 12:24:11 2026

CS 417 Exam Info

When & Where

Exam rules

Past exams

Study guide

Topics

Introduction & Key Concepts

Networking

Remote procedure calls

Web services

Clock synchronization

Logical clocks

Group communication

Failure detection

Virtual Synchrony

Mutual exclusion

Leader election algorithms

Consensus

Coordination Services

Network-Attached Storage

Distributed Lookup (Distributed Hash Tables)

Distributed Lookup: DNS

Distributed file systems

Distributed transactions

Distributed Databases

Bigtable

Cassandra

Spanner

Database Comparisons

Distributed Computation

MapReduce

BSP

Pregel / Giraph

Spark

Distributed Machine Learning

Comparisons

Data in Motion

Kafka and event streaming

Stream prcessing

CDNs

Akamai

BitTorrent

Edge computing

Comparisons

Security in Distributed Systems

Cryptography

TLS and certificates

Identity and access

Architecture and operations

Design mistakes

Clusters