The key invariant of Paxos is that once a majority of acceptors have accepted a value, that value is now chosen, and a different value can never reach majority.
Paxos is a powerful protocol with,
- No single point of failure
- Graceful handling of slow replicas
- General purpose ordering
The Paxos protocol is confusing but simple,
- A proposer chooses a new proposal number and sends a prepare request to each member in a majority of acceptors
- Each acceptor responds with a promise to never accept a proposal numbered less than and the proposal with the highest number less than that it has accepted (if any)
- If the proposer receives the responses from a majority, then it can issue a proposal with number and value , where is the value of the highest-numbered proposal among them, or otherwise any value selected by the proposer
This protocol essentially guarantees exclusion across majorities
It is an absolute requirement that the information to handle this protocol persists i.e. the highest-numbered proposal an acceptor has accepted
This whole protocol is really amazingly clever and handles any failure
Practical Usage
In order to learn about accepted values, we designate distinguished learners (this is done for practical reasons) to learn about accepted values and forward to any other learners when a value is chosen (majority)
To guarantee progress, we choose a distinguished proposer to propose the acceptors
In the general algorithm, each process acts as a proposer, acceptor, and learner. The algorithm chooses a leader, which plays the roles of the distinguished proposer and learner.
To guarantee uniqueness, one must find a way for proposers to choose numbers from disjoint sets (via some identification system)
Replicated State Machine
We can implement a replicated state machine with Paxos by having a separate instance of the protocol for every machine transition
Clients send commands to the leader, who slots commands into the sequence
If we let the leader propose commands ahead of the last chosen command (which we ought to allow to some extent for performance), then a gap of commands could potentially arise. To handle this, we fill in gaps with no-op commands
When we have a distinguished leader, most of our concurrency issues are really only edge cases (multiple leaders could exist for various reasons depending on the setup). We only really need to execute phase 1 when a leader is newly chosen, and this can be done on infinitely many instances (after all, we can operate on ranges just fine). Because of this, the cost of the RSM is really just phase 2, the accept message.
Lamport comments that phase 2 has the “minimum possible cost of any algorithm for reaching agreement in the presence of faults”. I have no idea what this practically means but he sounds very authoritative
The easiest way to handle a changing set of servers is through the state machine itself, which can track the set with ordinary commands
Question
What mechanism allows Paxos to gracefully handle a slow replica?
Using the language of the paper, Paxos only requires that a majority of acceptors are responsive in order to make progress. However, if the majority relies on a slow acceptor, the proposers can safely resend proposals with higher numbers. Or in a version of the protocol without ignored messages, proposers would know to resend the exact same proposals. Once the slow acceptor finishes its progress, the learners will be notified (either individually or through a distinguished leader) and all learners will see the proposed value as chosen. As a side point, if the distinguished proposer is stalled for any reason, a new proposer can be elected to continue progress.