Practical Byzantine Fault Tolerance
The idea of Byzantine fault tolerance was only theoretical for a while. The alternative is essentially fail-stop, killing any misbehaving server.
This a replicated state machine approach
We assume an attacker essentially has full control of (out of ) machines and can delay a message for an arbitrary but limited amount of time
An attacker cannot fail a client or break our cryptographic schemes
Let’s iterate on a design,
- Clients and servers have public/private key pairs and sign every message. We wait for all servers to reply with our answer. But this breaks liveness!
- Let’s include servers. We wait for a majority () matching replies. Our problem is that our majority may include bad nodes.
- nodes works. If we wait for replicas, we know a majority of nodes must be correct
A client sends pre-prepare messages to every replica
When a replica receives a pre-prepare, it multicasts a prepare message
Once a replica receives prepare messages, it may execute the attached operation and forward the result to the client
What do we do when the primary appears faulty? We must have a view-change protocol