The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases
What is the motivation?
As main memory capacity has grown dramatically in commercial database systems, index performance has become a greater bottleneck. To achieve greater efficiency, data structures used in modern systems must consider many new variables, such as usage of SIMD parallelism and CPU caches. This paper aims to design a structure that’s aware of these factors. In particular, the authors’ are focused on the use of indexes in main memory database systems.
How and why have past approaches fallen short?
Traditional index structures like T-trees operate under the assumption of uniform memory access times, which is no longer true in modern systems with multiple layers of caching. B trees and binary trees solve some of these issues at the cost of more expensive operations. Another problem is that their operations cannot be pipelined easily. Some solutions have been proposed at the cost of allowing incremental updates. Meanwhile, hash tables are very appealing but cannot support range queries or quick resizing.
What is the key contribution?
The paper’s key contribution is the adaptive radix tree (ART), a fast and space-efficient indexing structure that takes advantage of radix comparisons. The authors’ show how the space consumption of ART is bounded to 52 bytes per key (as a maximum) and show how common data-types can be transformed to be radix ordered.
ART uses a high fanout, path compression and lazy expansion to reduce the tree height and limit space consumption.
What is missing?
The paper does a great job of justifying its design decisions. However, I think the authors could have explored the various parameters of ART in more detail, and experimented with alternative inner node representations. ART seems to perform extremely well compared to existing indexes, but perhaps it could be tuned with more precise experimentation to achieve even better throughput.
For example, the authors’ briefly mention that path compression leads to higher space usage on 32 bit keys. This makes the reader wonder what other decisions are dependent on the key size.
My last criticism is that the paper does not clearly explain the advantages of ART over existing radix trees. They only include GPT (another radix tree) as a comparison in experiments.
How does the paper support its claims?
The authors’ do a convincing number of experiments, evaluating ART against CSB, FAST, GPT, RB, and HT. They evaluate CPU branching and cache hit/misses. They measure single-threaded and multi-threaded lookup throughput. They also measure insertion throughput, as well as the impact of skew, cache size, and update percentage. They even integrate ART into their in-memory system HyPer for more realistic statistics.
What are some next steps?
The authors’ intend to explore synchronization techniques to allow for concurrent updates on ART.
As I wrote earlier, I think the authors could explore the decision space of the inner node representation. With more precise measurements, perhaps more performance could be extracted from the system. This could also involve clarifying how different systems would benefit from different index settings.