Introduction
In this class, we will learn
- How to model and design good databases
- How to store and manage data
- How to query data
- How to reason about query performance
- How to update data
Always ask questions!
- There are no stupid questions (supposedly)
Big data
- Exponential data growth requires efficient database systems
- “Every two days we generate as much data as we did since the dawn of humanity until 2003.” -Eric Schmidt, 2010
- 5 Vs
- Volume (size)
- Velocity (rate)
- Variety (sources)
- Veracity (accuracy)
- Value (utility)
Databases
- Model real life data
- Include entities and relationships
- A sophisticated piece of software, used to deal with data efficiently
- “Relational databases are the foundation of western civilization” -Bruce Lyndsey
Database management systems
- Always answer queries in well-defined, consistent ways
- Have mechanisms for handling power outages, concurrency
Data model
- A collection of concepts describing data
- They can be relational, key-value, graphs, time series, documents
- Relational data models are the most commonly used
- Relations exist through tables
- Schema describes the columns of each table
Abstraction
- The users view data through external schemas
- Underlying any external schema is an internal, conceptual schema
- Underneath this, the data is physically stored
- Data should have logical data independence and physical data independence
SQL
- Structured Query Language
- A very powerful declarative language
Query semantics
Link to original
- Atomicity - executed entirely or not at all
- Consistency - leaves DB in a consistent state
- Isolation - behaves as if it is executed alone (even if queries are actually interleaved)
- Durability - once completed, is never lost