Schema Refinement

Not all databases are good! A bad schema has a lot of redundancy
We can normalize a database by decomposing it into smaller parts

The tuple $X$ determines $Y$ if $X$ uniquely defines a value of $Y$
This is called a functional dependency, and arises from the application of a database

There are some axioms to determine functional dependencies

Reflexivity: $X$ determines any subset of $X$
Augmentation: If $X \to Y$ then $X, Z \to Y, Z$
Transitivity: If $X \to Y$ and $Y \to Z$ then $X \to Z$
Union: If $X \to Y$ and $X \to Z$ then $X \to Y, Z$
Decomposition: If $X \to Y, Z$ then $X \to Y$ and $X \to Z$

Attribute closure is the set of all attributes that can be logically derived by an attribute
To calculate,

Start with an empty set
Use reflexivity on the LHS attributes
Start from the first FD given if attributes on LHS are in closure, add attributes on RHS to closure
Repeat

If the set of attributes determines all other attributes then it is a superkey and is a candidate key if none of its subsets are superkeys.

FD closure is the set of all functional dependencies that can be logically derived by a given set of FDs
The minimal cover of $F$ is the smallest subset $S$ such that $S + = F +$ where $X +$ denotes the FD closure of $X$
To find the minimal cover,

Put all the FDs in standard form, with one attribute on the RHS
Check if equivalence is preserved after removing an attribute
Delete redundant FDs

A decomposition breaks $R (A)$ into $R_{1} (A_{1})$ and $R_{2} (A_{2})$

A decomposition is lossless-join if it can be recovered back through joins
To test if a decomposition $R_{1} (A_{1})$ and $R_{2} (A_{2})$ is lossless-join,

Compute $A_{1} \cap A_{2} = X$
Compute $X^{+}$
If $A_{1} \in X^{+}$ OR $A_{2} \in X^{+}$ then it’s a lossless-join

A decomposition is dependency preserving if the functional dependencies are preserved

Split $F_{1}$ & $F_{2}$ into $R_{1}$ and $R_{2}$
Compute $F^{+}$ & $(F_{1} \cup F_{2})^{+}$
Check if $F^{+} = (F_{1} \cup F_{2})^{+}$

The reason these are NOT the same is because functional dependencies are determined from the application level, not from the data. Even though a lossless-join decomposition can be joined together to retrieve the same data, the individual tables will lack the original functional dependencies

Normalization

0NF is not normalized
1NF and 2NF are simple restrictions that any well-structured DBMS meets
3NF and BCNF are stricter restrictions that any good DBMS should have
4NF and 5NF are out of scope

In BCNF, all FDs are either trivial or LHS is a super-key
BCNF decomposition is always lossless but may not be dependency preserving

If you need to keep your dependencies, you probably have to add redundancy.
In 3NF, all FDs are either trivial, LHS is a super-key, or RHS is part of a candidate key
3NF is lossless and dependency preserving

To decompose to 3NF while preserving dependencies,

Apply BCNF decomposition until in 3NF
Compute minimal cover $F^{"}$ of $F$
For all non-preserved FD $X \to A$ in $F^{"}$ add a new relation $R (X, A)$

2NF does this too but also guarantees that no non-key attribute can be derived from a partial key (part of a key)

In 1NF, every field must be atomic or single-valued

Binyamin's Notes

Explorer

Normalization

Table of Contents