Transactions

A recap of the DBMS layer-cake:

Queries
Query processor
Query optimizer
Files & access methods
Buffer manager
Disk space manager
Data

What if multiple queries access the same data simultaneously?
What if the system crashes in the middle of a transaction?

To generalize, how can we always maintain correctness?

The naive approach might make sure that one and only one transaction is running at a time. Before any transaction starts, create a copy of the database. If the transaction is successful, replace the old database with the new one, otherwise discard the dirty copy.

The correctness criteria are ACID: atomicity, consistency, isolation, and durability

Atomicity means either all actions in a transaction happen successfully or none of them happen at all
Consistency means the database is always consistent at the end of each transaction
Isolation means queries are never affected by simultaneous execution of another query
Durability means that a transaction’s changes must always persist

Consistency

Integrity Contraints come from the application level requirements,

Primary key
Foreign key
NOT NULL
Domain constraints
If the DBMS feels a constraint is being violated, the transaction is aborted

Every operation is seen as a read or a write, a transaction starts with BEGIN and ends with COMMIT or ABORT

Concurrency

In practice, databases run several millions of transactions simultaneously, by interleaving operations as much as possible

This is difficult, because interleaving only sometimes works. An interleaved schedule is good if it is serializable, i.e. if it is equivalent to some serial schedule

A serial schedule is a schedule that does not interleave transactions
$A$ and $B$ are equivalent if the final result of executing $A$ is the same as executing $B$

A transaction cannot read different values of a data object during the same transaction if it did not set the value itself

A schedule is conflict serializable if it can be converted to a serial schedule while ordering all conflicting operations the same way

This is true if its precendence graph is acyclic

Locks

Transactions request for locks and the lock manager either grants or blocks lock requests
Once completed, transactions release their locks

However, sometimes an operation just involves reading data, in which case multiple locks should be offered; shared locks are for transactions that only read, exclusive locks are for transactions that intend to write

However, this is still not a magic solution (think unrepeatable reads)

Two-phase locking (2PL) is a protocol that can safeguard such problems. The idea is that a transaction cannot request a new lock once it has released a lock. This works well but doesn’t work so great if a transaction needs to abort, because of cascading rollbacks

In strict 2PL, all locks are held until the end of the transaction. This fixes the cascading rollback problem but can cause deadlocks

We can fix deadlocks by detecting them; maintain a graph of stalled transactions and detect cycles, then select a victim transaction and KILL IT

We could also fix deadlocks by preventing them to begin with:
In priority-based lock access, if a transaction tries to acquire a conflicting lock, the DBMS may decide to terminate one of them

Wait-Die: old waits for young
Wound-Wait: young waits for old

Logging

Atomicity and durability are both very important properties of database systems. So what should we do if a system crashes in the middle of a transaction?

Hardware failure
Software bugs
Logical errors
State errors
Storage media failure (we don’t solve this)

To recap, the buffer manager is responsible for managing the buffer pool in memory

Assumptions,

Non-volatile storage has no mechanical problems
Updates are in-place, meaning they are realized by first reading data from disk, updating memory, and updating disk
Strict 2PL is in case, no concurrency issues

One BIG problem, is that even if two transactions seem to not exist on the same data, the buffer pool only deals with pages at a time.

If T1 commits successfully to a page and T2 aborts on the same page, then it seems the committed changes of T1 would need to be fixed somehow and redone.

Two requirements can be used to help,

Force changes to disk, all changes of a transaction must be written to non-volatile storage before it commits (but this can get expensive, since it potentially requires refetching the page from memory before committing)
No-stealing changes of other transactions; Any changes made by uncommitted transactions cannot be written to non-volatile storage (but this creates memory pressure for longer/larger transactions)

Here’s an alternative,

No-forceing changes to disk; A transaction can commit even before its changes are written back to non-volatile storage
Steal changes of other transactions; when writing back data of one transaction to non-volatile storage data from other transactions may also be written back

These don’t ensure anything! Instead, we must undo and redo transactions in order to fix database state after a crash

The DBMS maintains a WAL, or write-ahead log, which keeps track of all changes made in memory. The WAL is flushed to disk before a page is written to disk

Transactions are defined with a BEGIN and COMMIT record
For every operation, include a transaction ID, data ID, value before modification, and value after modification

Binyamin's Notes

Explorer

Consistency

Concurrency

Locks

Logging

Table of Contents