Entity-Relationship Modeling

Data model: A collection of concepts that allows us to translate real-world constructs into computer-understandable structures

Relation, entity-relationship, key-value, graph, time series, document…

Proper modeling is difficult

Requirements analysis
Conceptual design, often using an ER model
Logical design, translating into the DBMS data model
Schema refinement, consistency and normalization
Physical design, indexes and disk layout
Security design, access control

Conceptual design

What are the entities?
What are the relationships?
What must be stored for each?
What are the constraints?

ER models

There are a lotta rules,

Attributes go in circles
Entities go in rectangles
Relationships go in diamonds
ISA relationships go in triangles
Lines connect attributes to entities, attributes to relationships, and entities to relationships
Primary keys are underlined
An arrow from an entity to a relationship indicates that entity’s uniqueness to that relationship
Bolded lines from an entity to a relationship indicate a total relationship, where each entity belongs to a relationship
A dashed underlined primary key indicates a weak entity, one whose uniqueness/existence depends on a one-to-many relationship

A ternary relationship (or more generally n-ary) relationship binds three entities. This can be necessary to fix redundancy of information.

Again, key constraints are one-to-one, one-to-many, etc. Participation constraints are whether all of an entity must participate in a relationship.

There are also other constraints that must sometimes be specified.
The overlap constraint specifies whether an entity can belong to multiple specifications in a hierarchy.
The covering constraint specifies whether an entity must be part of a specification in a hierarchy.

Aggregation means treating an entire relationship set as an entity set, for the purpose of connecting it to a new “broader” relationship.

All this brings up a few design choices:

Should a concept be an attribute or an entity?
…

Relational vs. Key-value

In a relational model there are tables with rows & columns. There is a well defined schema. The data model fits the data rather than functionality. Duplication of data is avoided.

In a key-value model there is unstructured documents and files, without any schema. Data is stored in an application-friendly way.

Relational database

A relational database stores a collection of distinct relations

Each relation is made up of a schema which has the name of the relation and the name/type information of each column, and an instance which is a table with rows & columns

Cardinality is the number of rows
Degree is the number of columns

How do we translate the ER model to a relational database?

SQL

Structured Query Language, introduced in the 1970s, still used by 40% of all developers globally
The Data Definition Language specifies the schema operations, constraints, and security
The Data Manipulation Language specifies queries for viewing, analyzing, or modifying data

Overview:

CREATE TABLE <name> (<field> <domain>, ... )
 
INSERT INTO <name> (<field names>) VALUES (<field values>)
 
DELETE FROM <name> WHERE <condition>
 
UPDATE <name> SET <field name> = <value> WHERE <condition>
 
SELECT <fields> FROM <name> WHERE <condition>

We need integrity constraints!

The primary key is a field that uniquely identifies any row
A set of fields is a superkey if no two distinct tuples can have same values in all key fields
A candidate key is any set of fields that could identify each record uniquely. One candidate key must be chosen to be primary.

SQL introduces constraints to help with this.

PRIMARY KEY (<field names>)
UNIQUE (<field names>)
FOREIGN KEY (<field>) REFERENCES <name>

Here’s some tips for translating ER diagrams into a relational database:

For ISA hierarchies

Binyamin's Notes

Explorer

Conceptual design

ER models

Relational vs. Key-value

Relational database

SQL

Table of Contents