Analyzing Quick-Sort

Suppose that we want to sort $x_{1}, x_{2}, \dots, x_{n}$ . The quick-sort algorithm is defined as follows,

For $n = 2$ , put the two values in order
For $n > 2$ , choose a random value $x_{i}$ . Put all values smaller than $x_{i}$ on the left of $x_{i}$ and all values greater on the right of $x_{i}$ . Run quick-sort on these two smaller subsets $[1, i - 1]$ and $[i + 1, n]$ .

Define $X$ as the number of comparisons necessary to quick-sort $n$ distinct numbers. $E [X]$ is then a measure of the effectiveness of quick-sort.

We express $X$ as a sum of smaller random variables to make this easier. First, let’s use $[n]$ as an alias for the $n$ numbers (smallest number is $1$ , second to smallest is $2$ , largest is $n$ ). For $1 \leq i < j \leq n$ , $I (i, j) = 1$ if $i$ and $j$ are directly compared. Therefore, $X = i = 1 \sum n - 1 j = i + 1 \sum n I (i, j)$ .

$E [X] = i = 1 \sum n - 1 j = i + 1 \sum n P {i and j are ever compared}$

All values will initially be in the same bucket. $j$ and $i$ will continue to be in the same bucket until either $i$ , $j$ , or a number between them is chosen as a pivot. They will only be compared if one of them is chosen, otherwise they get separated forever :(

Anyways, this means $P {i and j are ever compared} = \frac{2}{j - i + 1}$ , so $E [X] = i = 1 \sum n - 1 j = i + 1 \sum n \frac{2}{j - i + 1}$

$j = i + 1 \sum n \frac{2}{j - i + 1} \approx \int_{i + 1}^{n} \frac{2}{x - i + 1} d x = [2 lo g (x - i + 1)]_{i + 1}^{n} \approx 2 lo g (n - i + 1)$

$E [X] \approx i = 1 \sum n - 1 2 lo g (n - i + 1) \approx 2 \int_{1}^{n - 1} lo g (n - x + 1) d x = 2 \int_{2}^{n} lo g (y) d y$
$\approx 2 n lo g (n)$

In big- $O$ notation, this means that quick-sort is $O (n lo g n)$ , as expected.

Binyamin's Notes

Explorer