Chardonnay: Fast and General Datacenter Transactions for On-Disk Databases

Chardonnay is a key-value store optimized for single datacenter deployments with fast 2PC, using a novel lock-free read protocol

  • Chardonnay uses fast RPC to let all committing transactions read from a global epoch counter, which is incremented periodically independent of transactions
  • Chardonnay runs read-write transactions in a dry run before running for real to prefetch keys
  • Prefetching allows for lightweight deadlock avoidance

Collectively, this enables Chardonnay to have excellent performance under high contention compared to other similar systems. The availability of fast RPs makes distributed designs look similar to multi-core designs.

Chardonnay is made up of the following,

  1. Epoch service
  2. KV service
  3. Transaction state store
  4. Client library

The epoch service is a RSM which increments every 10ms. One queries from each replica and takes the epoch as the majority value.

The key universe is partitioned into ranges, which are assigned to a number of range servers, comprised of a database and Paxos WAL

One range replica is leased (under a number of epochs) as the leader, and maintains a lock table for 2PL, similar to Spanner. Load balancing happens automatically but is not the focus of this paper.

The client coordinates 2PC in Chardonnay

Prefetching makes the biggest difference when the transaction must wait on a colder item while holding a lock on an item with high contention. In this case, prefetching eliminates the waiting.

Question

At the end of Section 6.2, the Chardonnay paper says that a snapshot read transaction can be made linearizable by waiting for the epoch to advance by one. By “linearizable” the paper means that the transaction observes the results of all transactions that completed before it started. Why does waiting make the transaction linearizable?

Chardonnay’s epoch versioning system guarantees consistency on the scale of epochs. Once a transaction observes an epoch of , it knows that the versions of values at the end of epoch will be consistent. However within an epoch, this is not necessarily true. So if the client wants external consistency, it must wait for epoch to begin before it will be able to safely observe changes made in epoch .