Byzantine Fault Tolerance

Practical Byzantine Fault Tolerance

The idea of Byzantine fault tolerance was only theoretical for a while. The alternative is essentially fail-stop, killing any misbehaving server.

This a replicated state machine approach

We assume an attacker essentially has full control of $\leq f$ (out of $3 f + 1$ ) machines and can delay a message for an arbitrary but limited amount of time

An attacker cannot fail a client or break our cryptographic schemes

Let’s iterate on a design,

Clients and servers have public/private key pairs and sign every message. We wait for all servers to reply with our answer. But this breaks liveness!
Let’s include $2 f + 1$ servers. We wait for a majority ( $f + 1$ ) matching replies. Our problem is that our majority may include bad nodes.
$3 f + 1$ nodes works. If we wait for $2 f + 1$ replicas, we know a majority of nodes must be correct

A client sends pre-prepare messages to every replica
When a replica receives a pre-prepare, it multicasts a prepare message
Once a replica receives $2 f + 1$ prepare messages, it may execute the attached operation and forward the result to the client

What do we do when the primary appears faulty? We must have a view-change protocol

Binyamin's Notes

Explorer