The query optimizer does magic to find a good way to execute queries. There is an inherent tradeoff between finding the “best” plan and finding it quickly.
“Find sailor names who have reserved boat ID 103 and have a rating >5”
SELECT name
FROM Sailors S, Reserves R
WHERE S.id = R.sid
AND bid = 103
AND rating > 5
or
or
This final one minimizes the data going into the join
How does it work?
- Find the relational algebra equivalences, storing as relational algebra (RA) trees
- Estimate the cost of different RA trees, using the size of the query result set, cost estimation for I/O and CPU
- Explore the search space for different operations, like the choice of join algorithm, order of operations, index selection, etc.
A bad query processor does the simplest route, computing the cross product on all relations and then applying each selection predicate.
A good query optimizer picks a “good” plan “quickly,” using a faster implementation technique for each operator, exploring fast but equivalent query plans, and using cost modeling to find the cheapest plans
Optimizing Selection
Say you have two relations,
Sailor: sid, name, rating, age
- Tuple length B
- Tuples per page
- Total pages
Reserves: sid, bid, day, rname - Tuple length B
- Tuples per page
- Total pages
Therefore, there are K sailor tuples that take up MB and K reserves tuples that take up MB
Selections can be point or range queries. To facilitate a query efficiently,
- Check for indexes
- Estimate the size of the result
- Think about the access path
Selectivity (ranges from 0-1) is used to estimate the percentage of tuples that will qualify for a selection clause
When estimating a point query, if
- Relation is stored as a heap file with no index: scan until the entry is found
- is stored as sorted on the query attribute with no index: binary search
- has an index on the query attribute
probe the tree-index, read the data
or probe the hash-index, read the data (20% collision rate)
Indexes are super helpful on heap files, but not as important on sorted files.
When estimating a range query, if
- Relation is stored as a heap file with no index: scan the whole file
- is sorted on the query attribute with no index: binary search and then scan to the end of the range
Now if and then the cost is I/Os
If then the cost comes out to I/Os, which means the binary search was redundant - has a clustered index on the query attribute: index probe, scan to the end of the range
- has an unclustered index on the query attribute: index probe, scan each resulting page
This last one is quite bad! A good query optimizer wouldn’t try to do this and would rather scan the heap file directly. The situation can be improved by fetching all the page IDs found from the index and then fetching them in order. This way, the same pages are not read more than once.
Optimizing Projections
SQL does not remove duplicates by default, so we must use DISTINCT
- Read all qualifying tuples
- Sort the result set (lexicographically)
- Remove adjacent duplicates
If , , B, and the output size is B
The total cost involves,
- Reading and filtering necessary attributes
- Sorting the runs takes and merging the runs takes another
- Then removing duplicates takes another
The total here is 2500 I/Os
We can do better with in-memory heapsort, which has an average run length of
- 1000 pages on disk are read pages at a time, 1000 I/Os
- The individual pages are sorted and then filtered before going through the output buffer, 250 I/Os
- In-memory heap sort runs, remove duplicates on the fly as we merge in Pass 1, 250 I/Os
This makes a total of 1500 I/Os
Optimizing Joins
Joins are one of the most common operations in databases. Any interesting query has a join. However, they can also be very costly, especially if executed poorly.
The simplest join query
SELECT *
FROM Reserves R, Sailors S
WHERE R.sid = S.sidThe worst way you could do this is with a cross-product .
Simple nested-loop join,
tuple
tuple
if , add to result
This is pretty bad, clearly there are many redundant I/Os and the cost is
However, note that swapping the inner and outer relation does affect the cost
Page-oriented nested-loop join does first clear optimization for this procedure
page
page
tuple
tuple
if , add to result
This is better, the cost is I/Os
Index nested-loop join,
tuple
probe index on
fetch tuple
such that
add to result
This is much better, the cost is
is the probe cost (about 3 I/Os for a -tree index and 1.2 I/Os for a hash index
In this case, is 1, since each refers to one (the relation is listed at the top of this document)
Block nested-loop join
Now what if we have buffers?
- 1 page is used to store the intermediate result
- 1 page is used to stream the inner relation
- pages are used to store pages of the outer relation
block of pages of
page
tuple pages of
tuple
if , add to result
The cost is
Sort-merge join
This takes two phases,
- Sort: Sort both relations on the join attribute
- Merge: Parse through the two sorted relations to find matching tuples
sort on the join attribute (sid)
sort on the join attribute (sid)
; ;
while and
if
increment
if
increment
backtrack if necessary
else if
add to result
increment
Backtracking is only necessary because of duplicate values
The best case cost of merging is the
The worst case cost of merging is
Sorting (with external merge sort) takes
This gets significantly better with more buffers
Sort-merge is useful in cases when the join output is expected to be sorted on the join attribute, or when either relation is already sorted
It should be avoided if either relation has many duplicates
Hash join
Again, there are two phases,
- Hashing/Building: Use a hash function on the join attribute to populate the hash table
- Probing: Scan the inner relation page-wise, use the same hash function to probe for matching tuples
tuple
hash into hash table
tuple
if , add to result
In the hash table, we store key + value/id
The total cost is
However, this requires that fits in memory
Partitioned hash join
Two phases,
- Partitioning/Building: Use on the join attribute to create partitions, on both relations
- Probing: Fetch partitions to memory, use a different hash function to probe and find matching tuples
First all the tuples are hashed, then all the tuples are hashed
When a buffer page is filled, it’s moved to memory as a partition
Then, for each potentially matching partition pair, we bring the partitions into memory, run a second hash function on them on a new table, and any matches are recorded
Each relation must be read, written, and read again:
If the smaller of the two partitions does not fit in pages then it must be further partitioned