NoSQL means Not only SQL

  • Unstructured documents or files
  • Schema-less
  • Data is stored in an application-friendly way
  • Possible duplication

There are a few classifications of NoSQL systems,

  1. Key-value stores
  2. Document stores
  3. Columns stores
  4. Graph stores

Key-value pairs rely heavily on the Log-Structured Merge-Tree (LSM-tree)
The LSM-tree aims to improve inserting over the tree, which requires I/Os to insert a single entry

  1. LSM-tree uses very fast insertion and doesn’t typically delete entries; instead newer insertions are always read before older insertions, so updated or blank versions of data are inserted to update or delete an entry
  2. Reads are also competitive, since the LSM-tree keeps track of lots of data in order to increase speed
  3. Additionally, LSM-trees are tunable and can tradeoff read and write performance depending on application needs
  4. Finally, LSM-trees offer great space utilization compared to trees which are 66% full on average

The size ratio defines how much each lower level increases

Principles,

  1. Buffering ingestion (inserts are buffered and done in batches)
  2. Immutable files on storage (data is never changed once it’s moved to disk)
  3. Out-of-place updates & deletes
  4. Periodic data layout reorganization

Once the buffer is filled, it’s sorted and flushed into the next level (handling updates/deletes). Reads on entries still in buffer can be fetched without I/Os to disk

When leveling (eager merging), we merge right away when able,

  • Good read performance
  • Good space amplification
  • But high write amplification (37GB to 1GB potentially?)

When tiering (lazy merging), we don’t merge right away until the level has reached it’s capacity,

  • Poor read performance
  • Poor space amplification
  • Good ingestion performance

There are many hybrid designs that mix tiering and leveling

Without a filter or index, searching for something in the LSM tree is quite slow! Binary search on each level requires random accesses

Fence pointers are stored in memory and locate the minimum and maximum values in each page. This means there will only be potentially I/Os

Bloom filters tells you if an entry is in a page with high certainty (with some false positives, around 1%)