The query optimizer does magic to find a good way to execute queries. There is an inherent tradeoff between finding the “best” plan and finding it quickly.

“Find sailor names who have reserved boat ID 103 and have a rating >5”

SELECT name
FROM Sailors S, Reserves R
WHERE S.id = R.sid
	AND bid = 103
	AND rating > 5


or
or

This final one minimizes the data going into the join

How does it work?

  1. Find the relational algebra equivalences, storing as relational algebra (RA) trees
  2. Estimate the cost of different RA trees, using the size of the query result set, cost estimation for I/O and CPU
  3. Explore the search space for different operations, like the choice of join algorithm, order of operations, index selection, etc.

A bad query processor does the simplest route, computing the cross product on all relations and then applying each selection predicate.

A good query optimizer picks a “good” plan “quickly,” using a faster implementation technique for each operator, exploring fast but equivalent query plans, and using cost modeling to find the cheapest plans

Optimizing Selection

Say you have two relations,
Sailor: sid, name, rating, age

  • Tuple length B
  • Tuples per page
  • Total pages
    Reserves: sid, bid, day, rname
  • Tuple length B
  • Tuples per page
  • Total pages

Therefore, there are K sailor tuples that take up MB and K reserves tuples that take up MB

Selections can be point or range queries. To facilitate a query efficiently,

  1. Check for indexes
  2. Estimate the size of the result
  3. Think about the access path

Selectivity (ranges from 0-1) is used to estimate the percentage of tuples that will qualify for a selection clause

When estimating a point query, if

  1. Relation is stored as a heap file with no index: scan until the entry is found
  2. is stored as sorted on the query attribute with no index: binary search
  3. has an index on the query attribute
    probe the tree-index, read the data
    or probe the hash-index, read the data (20% collision rate)

Indexes are super helpful on heap files, but not as important on sorted files.

When estimating a range query, if

  1. Relation is stored as a heap file with no index: scan the whole file
  2. is sorted on the query attribute with no index: binary search and then scan to the end of the range
    Now if and then the cost is I/Os
    If then the cost comes out to I/Os, which means the binary search was redundant
  3. has a clustered index on the query attribute: index probe, scan to the end of the range
  4. has an unclustered index on the query attribute: index probe, scan each resulting page

This last one is quite bad! A good query optimizer wouldn’t try to do this and would rather scan the heap file directly. The situation can be improved by fetching all the page IDs found from the index and then fetching them in order. This way, the same pages are not read more than once.

Optimizing Projections

SQL does not remove duplicates by default, so we must use DISTINCT

  1. Read all qualifying tuples
  2. Sort the result set (lexicographically)
  3. Remove adjacent duplicates

If , , B, and the output size is B
The total cost involves,

  1. Reading and filtering necessary attributes
  2. Sorting the runs takes and merging the runs takes another
  3. Then removing duplicates takes another

The total here is 2500 I/Os

We can do better with in-memory heapsort, which has an average run length of

  1. 1000 pages on disk are read pages at a time, 1000 I/Os
  2. The individual pages are sorted and then filtered before going through the output buffer, 250 I/Os
  3. In-memory heap sort runs, remove duplicates on the fly as we merge in Pass 1, 250 I/Os

This makes a total of 1500 I/Os

Optimizing Joins

Joins are one of the most common operations in databases. Any interesting query has a join. However, they can also be very costly, especially if executed poorly.

The simplest join query

SELECT *
FROM Reserves R, Sailors S
WHERE R.sid = S.sid

The worst way you could do this is with a cross-product .

Simple nested-loop join,
tuple
tuple
if , add to result
This is pretty bad, clearly there are many redundant I/Os and the cost is
However, note that swapping the inner and outer relation does affect the cost

Page-oriented nested-loop join does first clear optimization for this procedure
page
page
tuple
tuple
if , add to result
This is better, the cost is I/Os

Index nested-loop join,
tuple
probe index on
fetch tuple
such that
add to result
This is much better, the cost is
is the probe cost (about 3 I/Os for a -tree index and 1.2 I/Os for a hash index
In this case, is 1, since each refers to one (the relation is listed at the top of this document)

Block nested-loop join
Now what if we have buffers?

  • 1 page is used to store the intermediate result
  • 1 page is used to stream the inner relation
  • pages are used to store pages of the outer relation

block of pages of
page
tuple pages of
tuple
if , add to result

The cost is

Sort-merge join
This takes two phases,

  1. Sort: Sort both relations on the join attribute
  2. Merge: Parse through the two sorted relations to find matching tuples

sort on the join attribute (sid)
sort on the join attribute (sid)
; ;
while and
if
increment
if
increment
backtrack if necessary
else if
add to result
increment

Backtracking is only necessary because of duplicate values

The best case cost of merging is the
The worst case cost of merging is
Sorting (with external merge sort) takes
This gets significantly better with more buffers

Sort-merge is useful in cases when the join output is expected to be sorted on the join attribute, or when either relation is already sorted

It should be avoided if either relation has many duplicates

Hash join
Again, there are two phases,

  1. Hashing/Building: Use a hash function on the join attribute to populate the hash table
  2. Probing: Scan the inner relation page-wise, use the same hash function to probe for matching tuples

tuple
hash into hash table
tuple
if , add to result

In the hash table, we store key + value/id

The total cost is
However, this requires that fits in memory

Partitioned hash join
Two phases,

  1. Partitioning/Building: Use on the join attribute to create partitions, on both relations
  2. Probing: Fetch partitions to memory, use a different hash function to probe and find matching tuples

First all the tuples are hashed, then all the tuples are hashed
When a buffer page is filled, it’s moved to memory as a partition

Then, for each potentially matching partition pair, we bring the partitions into memory, run a second hash function on them on a new table, and any matches are recorded

Each relation must be read, written, and read again:

If the smaller of the two partitions does not fit in pages then it must be further partitioned