Query Processing

The query optimizer does magic to find a good way to execute queries. There is an inherent tradeoff between finding the “best” plan and finding it quickly.

“Find sailor names who have reserved boat ID 103 and have a rating >5”

SELECT name
FROM Sailors S, Reserves R
WHERE S.id = R.sid
	AND bid = 103
	AND rating > 5

$π_{name} (σ_{bid = 103 \land rating > 5} (Sailors ⋈_{S.sid = R.sid} Reserves))$
or $π_{name} (σ_{rating > 5} (Sailors ⋈_{S.sid = R.sid} σ_{bid = 103} (Reserves)))$
or $π_{name} (σ_{rating > 5} (Sailors) ⋈_{S.sid = R.sid} σ_{bid = 103} (Reserves))$

This final one minimizes the data going into the join

How does it work?

Find the relational algebra equivalences, storing as relational algebra (RA) trees
Estimate the cost of different RA trees, using the size of the query result set, cost estimation for I/O and CPU
Explore the search space for different operations, like the choice of join algorithm, order of operations, index selection, etc.

A bad query processor does the simplest route, computing the cross product on all relations and then applying each selection predicate.

A good query optimizer picks a “good” plan “quickly,” using a faster implementation technique for each operator, exploring fast but equivalent query plans, and using cost modeling to find the cheapest plans

Optimizing Selection

Say you have two relations,
Sailor: sid, name, rating, age

Tuple length $t_{S} = 50$ B
Tuples per page $B_{S} = 80$
Total pages $P_{S} = 500$
Reserves: sid, bid, day, rname
Tuple length $t_{R} = 40$ B
Tuples per page $B_{R} = 100$
Total pages $P_{R} = 1000$

Therefore, there are $40$ K sailor tuples that take up $2$ MB and $100$ K reserves tuples that take up $4$ MB

Selections can be point or range queries. To facilitate a query efficiently,

Check for indexes
Estimate the size of the result
Think about the access path

Selectivity $s$ (ranges from 0-1) is used to estimate the percentage of tuples that will qualify for a selection clause

When estimating a point query, if

Relation $R$ is stored as a heap file with no index: scan until the entry is found $P_{R} /2$
$R$ is stored as sorted on the query attribute with no index: binary search $lo g_{2} P_{R}$
$R$ has an index on the query attribute
probe the tree-index, read the data $lo g_{F} P_{R} + 1$
or probe the hash-index, read the data $1.2 + 1$ (20% collision rate)

Indexes are super helpful on heap files, but not as important on sorted files.

When estimating a range query, if

Relation $R$ is stored as a heap file with no index: scan the whole file $P_{R}$
$R$ is sorted on the query attribute with no index: binary search and then scan to the end of the range $lo g_{2} P_{R} + P_{R} \cdot s$
Now if $P_{R} = 1000$ and $s = 0.1$ then the cost is $200$ I/Os
If $s = 1$ then the cost comes out to $1100$ I/Os, which means the binary search was redundant
$R$ has a clustered index on the query attribute: index probe, scan to the end of the range $lo g_{F} P_{R} + P_{R} \cdot s$
$R$ has an unclustered index on the query attribute: index probe, scan each resulting page $lo g_{F} P_{R} + (P_{R} \cdot B_{R}) \cdot s$

This last one is quite bad! A good query optimizer wouldn’t try to do this and would rather scan the heap file directly. The situation can be improved by fetching all the page IDs found from the index and then fetching them in order. This way, the same pages are not read more than once.

Optimizing Projections

SQL does not remove duplicates by default, so we must use DISTINCT

Read all qualifying tuples
Sort the result set (lexicographically)
Remove adjacent duplicates

If $P_{R} = 1000$ , $B_{R} = 100$ , $t_{R} = 40$ B, and the output size is $10$ B
The total cost involves,

Reading $R$ and filtering necessary attributes $1000 + 250$
Sorting the runs takes $2 \cdot 250$ and merging the runs takes another $2 \cdot 250$
Then removing duplicates takes another $250$

The total here is 2500 I/Os

We can do better with in-memory heapsort, which has an average run length of $2 \cdot B$

1000 pages on disk are read $B - 1$ pages at a time, 1000 I/Os
The individual pages are sorted and then filtered before going through the output buffer, 250 I/Os
In-memory heap sort $⌈ 250/ (2 \cdot 20)⌉ = 7$ runs, remove duplicates on the fly as we merge in Pass 1, 250 I/Os

This makes a total of 1500 I/Os

Optimizing Joins

Joins are one of the most common operations in databases. Any interesting query has a join. However, they can also be very costly, especially if executed poorly.

The simplest join query

SELECT *
FROM Reserves R, Sailors S
WHERE R.sid = S.sid

The worst way you could do this is with a cross-product $R \times S$ .

Simple nested-loop join,
$\forall$ tuple $r \in R$
$\forall$ tuple $s \in S$
if $r . attr == s . attr$ , add $⟨ r, s ⟩$ to result
This is pretty bad, clearly there are many redundant I/Os and the cost is $P_{R} + (P_{R} \cdot B_{R}) \cdot P_{S}$
However, note that swapping the inner and outer relation does affect the cost

Page-oriented nested-loop join does first clear optimization for this procedure
$\forall$ page $P_{R} \in R$
$\forall$ page $P_{S} \in S$
$\forall$ tuple $r \in P_{R}$
$\forall$ tuple $s \in P_{S}$
if $r . attr == s . attr$ , add $⟨ r, s ⟩$ to result
This is better, the cost is $P_{R} + P_{R} \cdot P_{S}$ I/Os

Index nested-loop join,
$\forall$ tuple $r \in R$
probe index on $S$
fetch tuple $s \in S$
such that $r . attr == s . attr$
add $⟨ r, s ⟩$ to result
This is much better, the cost is $P_{R} + (P_{R} \cdot B_{R}) \cdot (index cost + expected matches)$
$index cost$ is the probe cost (about 3 I/Os for a $B^{+}$ -tree index and 1.2 I/Os for a hash index
In this case, $expected matches$ is 1, since each $R$ refers to one $S$ (the relation is listed at the top of this document)

Block nested-loop join
Now what if we have $B$ buffers?

1 page is used to store the intermediate result
1 page is used to stream the inner relation $S$
$B - 2$ pages are used to store $B - 1$ pages of the outer relation $R$

$\forall$ block of $B - 2$ pages of $R$
$\forall$ page $P_{S} \in S$
$\forall$ tuple $r \in B - 2$ pages of $R$
$\forall$ tuple $s \in P_{S}$
if $r . attr == s . attr$ , add $⟨ r, s ⟩$ to result

The cost is $P_{R} + P_{S} \cdot ⌈ \frac{P _{R}}{B - 2} ⌉$

Sort-merge join
This takes two phases,

Sort: Sort both relations on the join attribute
Merge: Parse through the two sorted relations to find matching tuples

sort $R$ on the join attribute (sid)
sort $S$ on the join attribute (sid)
$it_{R} \to R_{sorted}$ ; $it_{S} \to S_{sorted}$ ;
while $it_{R}$ and $it_{S}$
if $it_{R} > it_{S}$
increment $it_{S}$
if $it_{R} < it_{S}$
increment $it_{R}$
backtrack $it_{S}$ if necessary
else if $it_{R} = it_{S}$
add $⟨ r, s ⟩$ to result
increment $it_{S}$

Backtracking is only necessary because of duplicate values

The best case cost of merging is the $P_{S} + P_{R}$
The worst case cost of merging is $P_{S} \cdot P_{R}$
Sorting (with external merge sort) takes $2 \cdot P_{R} \cdot (1 + ⌈ lo g_{B - 1} P_{R} / B ⌉) + 2 \cdot P_{S} \cdot (1 + ⌈ lo g_{B - 1} P_{S} / B ⌉)$
This gets significantly better with more buffers

Sort-merge is useful in cases when the join output is expected to be sorted on the join attribute, or when either relation is already sorted

It should be avoided if either relation has many duplicates

Hash join
Again, there are two phases,

Hashing/Building: Use a hash function $h_{1}$ on the join attribute to populate the hash table
Probing: Scan the inner relation page-wise, use the same hash function $h_{1}$ to probe for matching tuples

$\forall$ tuple $r \in R$
hash $h_{1} (r)$ into hash table $HT_{R}$
$\forall$ tuple $s \in P_{S}$
if $h_{1} (s) \in HT_{R}$ , add $⟨ r, s ⟩$ to result

In the hash table, we store key + value/id

The total cost is $P_{R} + P_{S}$
However, this requires that $P_{R}$ fits in memory

Partitioned hash join
Two phases,

Partitioning/Building: Use $h_{1}$ on the join attribute to create partitions, on both relations
Probing: Fetch partitions to memory, use a different hash function $h_{2}$ to probe and find matching tuples

First all the $R$ tuples are hashed, then all the $S$ tuples are hashed
When a buffer page is filled, it’s moved to memory as a partition

Then, for each potentially matching partition pair, we bring the partitions into memory, run a second hash function on them on a new table, and any matches are recorded

Each relation must be read, written, and read again: $3 \cdot (P_{R} + P_{S})$

If the smaller of the two partitions does not fit in $B - 2$ pages then it must be further partitioned

Binyamin's Notes

Explorer

Optimizing Selection

Optimizing Projections

Optimizing Joins

Table of Contents