Modern Hardware

We have explored a lot of different index designs, like B $^{+}$ -tree, LSM-tree, radix tree, hash index, bitmap index, zonemaps, and cracking.

	Point	Short range	Long range	Skew	Updates
B $^{+}$ -tree	Good	Good	Good	Good	Ok
LSM tree	Good	Bad	Ok	Good	Good
Radix tree	Good	Good	Good	Bad	Ok
Hash Index*	Good	Ok	Bad	Bad	Good
Bitmap index**	Good	Ok	Ok	Good	Bad
Zonemaps	Ok	Ok	Good	Good	Ok
Cracking	Good	Good	Good	Bad	Bad
*depends on whether values are discrete
**generally very fast, limited applicability

Note that radix tree and cracking both have improvements that can be made (think ART and meta-adaptive index).

Anyways… Which index do we use??

Our design primitives are,

How do we organize the data? Global data organization (sorted, unsorted, logging)
How do we search the data? Global search algorithm (scan, tight-loop search, direct addressing)
Can we accelerate search using metadata? Indexing technique (zonemaps/imprints, trees, hashing)
How do we update/add to data? Data modification policy (in-place, out-of-place, deferred in-place)

How do we make hardware conscious designs? We’ve already talked about minimizing disk reads. To go further, let’s look at the cache hierarchy…

Each chip in a multi-socket multicore server contains a memory controller, inter-socket links, cores with their corresponding L1 and L2 caches, and a shared L3 cache. The cores do the actual work, and contain the arithmetic logic unit along with registers. Modern physical cores actually consist of multiple logical cores, sets of registers which share the ALU.

The cache system is fairly self-explanatory. When fetching for data, the chip must check each level of the memory hierarchy. If data is not stored in L3 cache then it must be fetched from memory. Where we fetch from is very important, NUMA (non-uniform memory access) means that memory access time is relative to the processor.

Binyamin's Notes

Explorer