Data model: A collection of concepts that allows us to translate real-world constructs into computer-understandable structures

  • Relation, entity-relationship, key-value, graph, time series, document…

Proper modeling is difficult

  1. Requirements analysis
  2. Conceptual design, often using an ER model
  3. Logical design, translating into the DBMS data model
  4. Schema refinement, consistency and normalization
  5. Physical design, indexes and disk layout
  6. Security design, access control

Conceptual design

  • What are the entities?
  • What are the relationships?
  • What must be stored for each?
  • What are the constraints?

ER models

There are a lotta rules,

  1. Attributes go in circles
  2. Entities go in rectangles
  3. Relationships go in diamonds
  4. ISA relationships go in triangles
  5. Lines connect attributes to entities, attributes to relationships, and entities to relationships
  6. Primary keys are underlined
  7. An arrow from an entity to a relationship indicates that entity’s uniqueness to that relationship
  8. Bolded lines from an entity to a relationship indicate a total relationship, where each entity belongs to a relationship
  9. A dashed underlined primary key indicates a weak entity, one whose uniqueness/existence depends on a one-to-many relationship

A ternary relationship (or more generally n-ary) relationship binds three entities. This can be necessary to fix redundancy of information.

Again, key constraints are one-to-one, one-to-many, etc. Participation constraints are whether all of an entity must participate in a relationship.

There are also other constraints that must sometimes be specified.
The overlap constraint specifies whether an entity can belong to multiple specifications in a hierarchy.
The covering constraint specifies whether an entity must be part of a specification in a hierarchy.

Aggregation means treating an entire relationship set as an entity set, for the purpose of connecting it to a new “broader” relationship.

All this brings up a few design choices:

  • Should a concept be an attribute or an entity?

Relational vs. Key-value

In a relational model there are tables with rows & columns. There is a well defined schema. The data model fits the data rather than functionality. Duplication of data is avoided.

In a key-value model there is unstructured documents and files, without any schema. Data is stored in an application-friendly way.

Relational database

A relational database stores a collection of distinct relations

Each relation is made up of a schema which has the name of the relation and the name/type information of each column, and an instance which is a table with rows & columns

  • Cardinality is the number of rows
  • Degree is the number of columns

How do we translate the ER model to a relational database?

SQL

Structured Query Language, introduced in the 1970s, still used by 40% of all developers globally
The Data Definition Language specifies the schema operations, constraints, and security
The Data Manipulation Language specifies queries for viewing, analyzing, or modifying data

Overview:

CREATE TABLE <name> (<field> <domain>, ... )
 
INSERT INTO <name> (<field names>) VALUES (<field values>)
 
DELETE FROM <name> WHERE <condition>
 
UPDATE <name> SET <field name> = <value> WHERE <condition>
 
SELECT <fields> FROM <name> WHERE <condition>

We need integrity constraints!

The primary key is a field that uniquely identifies any row
A set of fields is a superkey if no two distinct tuples can have same values in all key fields
A candidate key is any set of fields that could identify each record uniquely. One candidate key must be chosen to be primary.

SQL introduces constraints to help with this.

PRIMARY KEY (<field names>)
UNIQUE (<field names>)
FOREIGN KEY (<field>) REFERENCES <name>

Here’s some tips for translating ER diagrams into a relational database:

  • For ISA hierarchies