Note that some of the pages under this course are a bit incomplete.
Overview
This class is an introduction to “modern,” unstructured databases
- Data is generated at an unprecedented rate and volume, so it’s very important that systems SCALE
- How do we query unstructured data?
- New system design trends to meet modern requirements
- Playing with large-scale, commercial storage engines
Ask and answer! Advanced topics should foster discussion
Philosophy,
- Cutting-edge research
- Question everything, do not take things at face value
- Interactive and collaborative
We read lots of papers in this class, do presentations and reviews,
- Learn to read technical papers
- Learn to critique constructively
- Learn to prepare slides and present
LSM trees are key
The class project will be the capstone of this course,
- By the first week of October, have a clear plan by project proposal
- By the first week of November, significant preliminary work should be done
- By the end of the semester, present the key ideas of the implementation/approach along with experimental results to support any claims
Can relational databases support modern requirements? Yes, but performance is paramount. That’s why NoSQL (Not only SQL) has become so popular.
Designing a database kernel is complex, but the core is data structures and algorithms which define how to store and access data.
The main operations are put, get, scan, range scan, and count
Choices,
Link to original
- What is the key/value?
- Are they stored together?
- Is the read/write ratio constant or changing?
- What index to use?
- How to handle concurrency?
- How to handle memory limitations?
- What about privacy and security?
- How to guarantee robustness?
- How to minimize costs?