Message Handling Fundamentals
- Sending
- The act of transmitting a message from an application through the communication layer to the network.
- Receiving
- The act of a machine accepting a message from the network; the message has arrived but is not yet visible to the application.
- Delivering
- The act of passing a received message to the application; this is when the application actually sees and processes the message.
- Holdback queue
- A buffer where received messages are held when they cannot be delivered immediately, such as when waiting for earlier messages to maintain ordering guarantees.
- Multicast
- One-to-many communication where a single message is delivered to a specific group of processes.
- Unicast
- One-to-one communication where a message is sent to a single recipient.
- Broadcast
- One-to-all communication where a message is sent to every process on the network.
IP Multicast
- IP Multicast
- Network-layer multicast using UDP, IGMP, and PIM; works well in controlled environments but is blocked by most ISPs on the public internet.
- Internet Group Management Protocol (IGMP)
- Protocol operating between hosts and local routers that allows hosts to join and leave multicast groups dynamically through membership reports.
- Protocol Independent Multicast (PIM)
- Multicast routing protocol that distributes traffic between routers using the existing unicast routing table for reverse path forwarding.
- Reverse Path Forwarding (RPF)
- Technique where routers accept multicast traffic only if it arrives on the interface used to reach the source, preventing loops.
- PIM Dense Mode (PIM-DM)
- PIM mode using flood-and-prune approach; floods multicast traffic everywhere, then prunes branches where there are no receivers. Appropriate when most subnets have receivers.
- PIM Sparse Mode (PIM-SM)
- PIM mode requiring explicit joins to a Rendezvous Point; routers send Join messages toward the RP to build a shared distribution tree. Appropriate when receivers are sparsely distributed.
- Rendezvous Point (RP)
- In PIM Sparse Mode, a designated router that serves as a meeting point where sources send traffic and receivers join to receive it.
- Prune message
- In PIM Dense Mode, a message sent upstream by routers with no interested receivers to stop receiving multicast traffic for a group.
- Join message
- In PIM Sparse Mode, a message sent toward the Rendezvous Point to join a multicast group and build the distribution tree.
Multicast Reliability Levels
- Unreliable multicast
- Best-effort delivery with no guarantees; messages may be lost, duplicated, or delivered to only some recipients.
- Best-effort reliable multicast
- Multicast guaranteeing delivery to all live recipients if the sender completes without crashing; does not handle sender failures during transmission.
- Reliable multicast
- Multicast guaranteeing agreement (if any correct process delivers, all correct processes eventually do), integrity (at most once, only if sent), and validity (sender delivers to itself).
- Agreement (multicast property)
- Guarantee that if any correct process delivers a message, all correct processes eventually deliver it.
- Integrity (multicast property)
- Guarantee that messages are delivered at most once and are identical to what was sent.
- Validity (multicast property)
- Guarantee that if a correct process multicasts a message, it will eventually deliver that message to itself.
- Durable multicast
- Reliable multicast with persistence; messages written to stable storage before acknowledgment, surviving crashes and restarts.
- Publish-subscribe (pub/sub)
- Communication pattern where publishers send to named topics and subscribers register interest in topics; decouples senders from receivers. The topic acts as a named multicast group.
Multicast Ordering Levels
- Unordered delivery
- No guarantees about message sequence; messages may arrive in any order at different recipients.
- Single source FIFO ordering (SSF)
- Guarantee that messages from the same sender are delivered in the order they were sent. Formally: if a process sends m then m′, every process that delivers m′ will have already delivered m. Implemented using per-sender sequence numbers.
- Causal ordering
- Guarantee that if message m1 happened-before message m2, then m1 is delivered before m2 at all processes; implies single source FIFO ordering.
- Vector timestamp
- A vector clock attached to a message, used to implement causal ordering; the receiver buffers messages until all causally preceding messages have been delivered.
- Total ordering
- Guarantee that all processes deliver all messages in the same order; does not imply causal or single source FIFO ordering.
- Agreement property
- The key property of reliable multicast: if any correct process delivers a message, then all correct processes eventually deliver it. Provides “all or nothing” semantics.
- Sequencer
- A designated process that assigns global sequence numbers to achieve total ordering in multicast; single point of failure and potential bottleneck.
- Atomic multicast
- Reliable multicast with total ordering; also called atomic broadcast or ABCAST. Equivalent in power to consensus.
- Synchronous ordering
- A barrier primitive (sync) that blocks until all in-flight messages have been delivered, creating logical groups or epochs of messages with clean boundaries between them.
- Sync primitive
- A barrier operation that blocks until all previously sent messages have been delivered to all recipients; used to create well-defined message epochs, particularly for view changes.
- Real-time ordering
- Hypothetical ordering where messages would be delivered in actual physical time order; impossible to implement perfectly due to clock synchronization limits.
Failure Detection
- Failure detector
- A distributed oracle that provides information about which processes have crashed; imperfect in asynchronous systems.
- FLP impossibility
- The result by Fischer, Lynch, and Paterson proving that consensus cannot be guaranteed in asynchronous systems where even one process might crash.
- False positive
- An error where a failure detector incorrectly suspects a live process has crashed.
- False negative
- An error where a failure detector fails to detect that a process has crashed.
- Heartbeat
- A periodic message sent by a process to indicate it is alive.
- Push-based heartbeating
- Failure detection where monitored processes send heartbeats to monitors.
- Pull-based heartbeating
- Failure detection where monitors periodically query (ping) processes and expect responses.
- Phi accrual failure detector
- A failure detector that learns normal heartbeat timing patterns and outputs a continuous suspicion level (φ) on a logarithmic scale, where φ = k means roughly 10^(−k) probability the delay is normal variation.
Group Membership and Virtual Synchrony
- Group membership service (GMS)
- A layer within each process that monitors other members using failure detection, participates in view change protocols, and notifies the application when membership changes.
- View
- A snapshot of group membership containing a unique identifier (typically a monotonically increasing number) and a list of member processes; all processes in a view agree on its membership.
- Stable message
- A message that has been received by all current group members. Stability is confirmed when the sender receives acknowledgments from all members. Only stable messages can be delivered to applications.
- Message stability
- The property that a message has been received by all group members. Essential for view changes: only stable messages are delivered before transitioning to a new view.
- View change
- A protocol that transitions all group members from one view to another when membership changes, ensuring agreement on which messages were delivered in the old view.
- Flush message
- In the view change protocol, a message exchanged by processes containing message IDs or stability summaries to ensure consistency before transitioning to a new view.
- View leader (coordinator)
- A designated member that drives the view change protocol; not a single point of failure since a new leader is elected if the current one fails.
- Virtual synchrony
- A model developed by Ken Birman that makes group membership changes appear to happen synchronously with message delivery, even in asynchronous systems.
- View synchrony
- The guarantee that if a message is delivered in some view, it is delivered in that same view at all processes that deliver it.
- ISIS
- A distributed programming toolkit developed at Cornell in the 1980s that introduced virtual synchrony; used in production at NYSE, Swiss Stock Exchange, French ATC, and US Navy AEGIS.
- GBCAST
- The barrier primitive in ISIS used to coordinate group membership changes, ensuring all messages from the old view are delivered before transitioning to a new view.
Distributed Mutual Exclusion
- Distributed mutual exclusion
- Ensuring that at most one process is in a critical section at any time in a distributed system without shared memory.
- Critical section
- A code region that must be executed by at most one process at a time.
- Safety (mutual exclusion)
- The property that at most one process is in the critical section at any time.
- Liveness (mutual exclusion)
- The property that if a process requests the critical section and no process holds it forever, the requester eventually enters.
- Fairness (mutual exclusion)
- The property that there exists a bound on the number of times other processes may enter the critical section before a waiting process is granted access.
- Centralized mutual exclusion
- A coordinator-based approach requiring 3 messages per entry (request, grant, release); simple but coordinator is single point of failure.
- Lamport’s mutual exclusion algorithm
- A distributed algorithm using Lamport timestamps to order requests; each process maintains a request queue and enters when its request is first and all acknowledgments received. Requires 3(N−1) messages.
- Ricart-Agrawala algorithm
- An optimization of Lamport’s algorithm that eliminates release messages by deferring replies to lower-priority requesters until exiting the critical section. Requires 2(N−1) messages.
- Token ring mutual exclusion
- An algorithm where a token circulates among processes in a logical ring; only the token holder may enter the critical section. Provides bounded waiting but requires token recovery if lost.
Leader Election
- Leader election
- The process of selecting a single coordinator from a group of distributed processes.
- Coordinator
- A designated process responsible for sequencing operations, making decisions, or managing a shared resource.
- Bully algorithm
- A leader election algorithm where the process with the highest ID becomes coordinator; uses ELECTION, OK, and COORDINATOR messages. Assumes synchronous model with timeouts. Worst case O(n²) messages.
- ELECTION message
- A message sent to higher-ID processes to initiate a new leader election.
- OK message
- In the bully algorithm, a response indicating that a higher-ID process is alive and will take over the election.
- COORDINATOR message
- A message announcing the winner of a leader election to all processes.
- Ring election algorithm
- A leader election algorithm where an election message circulates around a logical ring, collecting the highest process ID; the process receiving its own ID wins. Also called Chang-Roberts algorithm.
- Chang-Roberts algorithm
- The ring-based election algorithm where election messages circulate clockwise; each process forwards larger IDs or substitutes its own. Worst case 3N−1 messages.