A Connection Between Binary Search Trees and Quicksort

Consider the running time of inserting any $n$ elements into an initially empty binary search tree.

In the best case, the process results in a balanced tree similar to the one shown below. In this case, note there are $n$ nodes to be created, each requiring at most $\left\lfloor\log_2 n\right\rfloor$ comparisons to insert, since the number of comparisons required is bound by the depth of the tree. Thus, the number of comparisons needed to insert all $n$ elements is $O(n \ln n)$.

In the worst case, the insertions are made "in order", resulting in a very unbalanced tree like the one shown below. Here there is one comparison to needed to place the $E$, two comparisons needed to place the $N$, three needed for $T$, etc., ending with six comparisons needed to place the $Z$. In general, we have $$1 + 2 + 3 + \cdots + (n-1) = n(n-1)/2$$ comparisons required to insert $n$ elements "in order", which of course is $O(n^2)$.

Fear not, however! We can protect ourselves against this worst case by randomly shuffling the order of insertion so that it is highly unlikely for them to be inserted "in order". By doing this, we don't have a guarantee that the tree is perfectly balanced -- but typically it is at least well-balanced (i.e., balanced enough that searching is still $O(\log_2 n)$).

So in the best case, we have a process that is $O(n\ln n)$. In the worst, the same process is $O(n^2)$, but we can protect against this worst case by randomly shuffling the order of elements. Sound familiar?

Recall that quicksort has the same behavior! What's even more amazing -- this agreement is not by chance -- there is a strong connection between these two processes.

If instead of inserting $n$ elements in some given order into an initially empty binary search tree, we put these $n$ elements into an array in a related order and apply the quicksort algorithm -- the exact same comparisons get made! The only difference is in the order in which they are made.

Let us consider a concrete example. Suppose you wish to insert the letters $A$ to $F$ into a binary search tree.

After shuffling the order of insertions to help protect us from an unbalanced tree, we end up inserting our letters into an initially empty binary search tree in the following order: $$\begin{array}{|c|c|c|c|c|c|c|}\hline C & A & G & E & D & B & F\\\hline \end{array}$$

This, of course, results in the following tree:

Now, let us turn our attention to sorting the same initial elements with quicksort.

Again, to minimize the probability of the worst-case occurring, we randomly shuffle the order of the elements. Consider one special ordering that could occur as a result:

$$\begin{array}{|c|c|c|c|c|c|c|}\hline C & B & A & G & F & D & E\\\hline \end{array}$$ Now consider the state of the array (shown below) after each pivot placement in the quicksort algorithm applied to this special initial ordering. Before each pivot placement, the pivots are compared with some subset of the letters in the array. The letters compared with the various pivots are shown in different colors. Red letters were compared with $C$; blue letters with $A$; pink with $G$; and green with $E$.

Note how the red letters below are precisely the ones beneath node $C$ in the tree above, and hence are the letters compared with $C$ during insertion. Similarly, the only blue letter below (i.e., B) was the only letter beneath and compared with $A$ during insertion. Likewise, the pink letters were the ones beneath and compared to G, and finally the two green letters were the two letters beneath and compared with $E$ in the tree.

C  B  A  G  F  D  E   initial order with pivot C 
*

A  B  C  G  F  D  E   placed C, now using pivot A  
*

A  B  C  G  F  D  E   placed A, now using pivot G
         *

A  B  C  E  F  D  G   placed G, now using pivot E
         *

A  B  C  D  E  F  G   after placing pivot E 

Consequently, the number of comparisons needed by any letter to be inserted into the tree corresponds exactly with the number of times that letter ends up colored in the array states during the quicksort algorithm shown above.

To see this, recall the different order in which the letters were inserted into the tree (i.e., $C,A,G,E,D,B,F$).

$C$ was the first to be inserted, and thus is positioned at the root without the need to be compared with anything. Notice $C$ is also never colored in arrays shown.

$A$ is next inserted, being only compared with $C$ and ending up on its left -- and $A$ is only colored once in the quicksort trace (red, for $C$).

$G$ is inserted after that, again compared only with $C$ before being made $C$'s right child -- and $G$ is only ever colored once as well.

$E$ is inserted into the tree next, being compared with both $C$ and $G$, before ending up as the left child of $G$ -- while $E$ shows up in the array colored twice (red and pink).

This one-to-one correspondence continues for the rest of the letters inserted into the tree. As such, the number of comparisons made when the quicksort algorithm is applied to $C,B,A,G,F,D,E$ is exactly the same number of comparisons made when $C,A,G,E,D,B,F$ are inserted in that order into an empty binary search tree.

Any order of elements inserted into a binary search tree can similarly be paired with a (likely different) order of the same elements that when sorted with quicksort achieves this duality in terms of comparisons made for both processes.

Granted, proving this is a bit more involved than one might initially think. As evidence of this, note that multiple insertion orders can create the same tree (consider changing the order in which the leaves $B,D,$ and $F$ are added above, or more generally the order in which elements at the same level are added.) Likewise, does the order of letters that never serve as pivots in an application of quicksort affect the number of comparisons made?

Still, it is not too difficult given an insertion order for a binary search tree to come up with an array order that could serve as its "quicksort dual". Can you see how? (Hint: think recursively)


This assumes, of course, that we have control over the insertion order. In constructing a general binary search tree class to be used by others, however, the insertion order will be driven by the client -- not the author of the binary search tree class.