NoSQL means Not only SQL
- Unstructured documents or files
- Schema-less
- Data is stored in an application-friendly way
- Possible duplication
There are a few classifications of NoSQL systems,
- Key-value stores
- Document stores
- Columns stores
- Graph stores
Key-value pairs rely heavily on the Log-Structured Merge-Tree (LSM-tree)
The LSM-tree aims to improve inserting over the tree, which requires I/Os to insert a single entry
- LSM-tree uses very fast insertion and doesn’t typically delete entries; instead newer insertions are always read before older insertions, so updated or blank versions of data are inserted to update or delete an entry
- Reads are also competitive, since the LSM-tree keeps track of lots of data in order to increase speed
- Additionally, LSM-trees are tunable and can tradeoff read and write performance depending on application needs
- Finally, LSM-trees offer great space utilization compared to trees which are 66% full on average
The size ratio defines how much each lower level increases
Principles,
- Buffering ingestion (inserts are buffered and done in batches)
- Immutable files on storage (data is never changed once it’s moved to disk)
- Out-of-place updates & deletes
- Periodic data layout reorganization
Once the buffer is filled, it’s sorted and flushed into the next level (handling updates/deletes). Reads on entries still in buffer can be fetched without I/Os to disk
When leveling (eager merging), we merge right away when able,
- Good read performance
- Good space amplification
- But high write amplification (37GB to 1GB potentially?)
When tiering (lazy merging), we don’t merge right away until the level has reached it’s capacity,
- Poor read performance
- Poor space amplification
- Good ingestion performance
There are many hybrid designs that mix tiering and leveling
Without a filter or index, searching for something in the LSM tree is quite slow! Binary search on each level requires random accesses
Fence pointers are stored in memory and locate the minimum and maximum values in each page. This means there will only be potentially I/Os
Bloom filters tells you if an entry is in a page with high certainty (with some false positives, around 1%)