Distributed Agreement

Leader Election

Paul Krzyzanowski

March 2, 2021

Goal: Allow all processes in a group to elect exactly one of the processes in the group. All processes should agree to elect the same process.

Election algorithms are designed to allow a collection of processes to agree upon a single coordinator, or leader. All candidate processes are considered to be equally capable of acting as the leader. The task is to select exactly one of these processes — and make sure that every process is aware of the selection. In designing these algorithms, we assume that every process has a unique ID (for example, a process ID combined with the machine’s address). The algorithms we examine in this section will choose a winner (the leader) by selecting either the highest available process ID or the lowest one. It is important that whatever criteria is used will never result in one process choosing a different winner than another even multiple concurrent elections are run.

Bully algorithm

The bully algorithm selects the largest process ID as the leader. When a process detects a non-responding leader, it sends an ELECTION message to all processes with a higher ID number and waits for any replies. If the process gets no replies within a certain time, it announces itself as a leader.

When a process receives an ELECTION message it immediately sends a response back to the requestor (so the requestor won’t become the leader) and holds an election to see if there are any higher-numbered processes that will respond. If none respond, then it declares itself as leader.

Ring algorithm

The ring election algorithm requires arranging a logical communication ring among the processes in the group (as we had in the token ring mutual exclusion algorithm). When a process detects a non-responding leader, it creates an ELECTION message containing its own process ID and sends it to its successor – the next process in the ring. If that process is dead, then it sends the message to the following successor. This sequence continues, with each process sending an ELECTION message to the next process in the ring, skipping dead processes until it finds a process that can receive the message.

When a process receives an ELECTION message, it adds its own process ID to the list of processes i in the message body and sends that message out to its neighboring process. If a process receives an ELECTION message where its own process ID is at the head of the list, it knows that is the ELECTION message that was started by this process. It has traveled fully around the ring.

The sender looks at the list of active processes in the message body and chooses a leader from among them. Since multiple elections may have been started concurrently, the algorithm for choosing the leader must yield the same result among the different list. Hence, selecting the highest or the lowest process ID will work. Selecting the first or last process in the list will not.

Chang and Roberts algorithm

The Chang and Roberts ring algorithm improves on the ring algorithm in two ways. First, there is no need to keep the entire list of live processes in the election messages; we can perform the election as we go along. If the goal is to vote for the highest-numbered live process ID and a process receives an election message, it does one of two things: (1) passes it untouched if its process ID is smaller than the one in the message, or (2) replaces the process ID in the message with its own if the process ID is greater than the one in the message. When a process receives a message that contains its own process ID, it knows that the message has come full circle and it is the winner. It will then notify each process of the outcome.

The second optimization is to avoid having multiple election messages circulate if multiple processes have initiated elections concurrently. To do this, a process keeps track if it has participated in an election (i.e., has initiated or forwarded an election message). If a process receives a message with a smaller process ID than its own and it is a participant in an election, it simply discards the message. If, on the other hand, the received message contains a greater process ID than its own, it will forward the message even if it is a participant in an election since the message contains a candidate that takes priority over the previous one that was forwarded. Once election results are announced, each process clears its status of participating in an election. There is still the opportunity to have multiple elections circulate (no harm done) but they are aborted when it makes sense to do so.

Partitions

One problem with election algorithms is when a network gets partitioned (also known as segmented): one set of processes is separated from another and cannot communicate with the other. In this case, each segment may elect its own leader and each sub-group continues with its own computations with results that diverge from each other, no longer forming a single cohesive distributed system. This is a split brain situation.

One approach that is often taken to avoid this problem is to require a majority of prospective leaders to be available. A quorum is the minimum number of processes that need to be available to participate in an operation. In this case, the operation is an election and we need a quorum of over 50% of the processes. If a network is partitioned into two or more segments, at most one of the partitions will have over 50% of the processes. Those processes that do not will refuse to elect a leader.

Some clustered environments may employ a redundant network or some alternate communication mechanism, such a shared storage, to be able to check the state of remote processes and determine if a process is dead or a network link is broken.