Searching and Sorting - First Thoughts

Often in programming, we must work with large arrays of data. As a simple example, imagine working with student records at a large university. A small piece of that array might look similar to the table shown below. Notice, each row (like the one shown in blue) corresponds to a single item -- a student, in this case.

Suppose we frequently must search for the student information associated with a particular last name. When we search an array for an item containing a particular value (like the name "Andrews"), the value we seek is referred to as the key.

Chen	A	991-878-4944	308 Blair
Rohde	A	231-343-5555	343 Forbes
Gazsi	B	766-093-9873	101 Brown
Furia	A	766-093-9873	101 Brown
Kanaga	B	898-122-9643	22 Brown
Andrews	A	664-480-0023	097 Little
Battle	C	874-088-1212	121 Whitman

Not surprisingly, we can search for a given student by name much more quickly if the student records are stored in order by student name. As such, before conducting searches of this type, we should sort the array so that students appear in ascending order by student name, as shown below:

Andrews	A	664-480-0023	097 Little
Battle	C	874-088-1212	121 Whitman
Chen	A	991-878-4944	308 Blair
Furia	A	766-093-9873	101 Brown
Gazsi	B	766-093-9873	101 Brown
Kanaga	B	898-122-9643	22 Brown
Rohde	A	231-343-5555	343 Forbes

As it turns out, there are many ways to sort arrays, and some methods are more efficient than others. We, of course, wish to analyze these sorting algorithms so that we know which one is the best to use in a particular situation. To this end, let us "standardize" our analysis in the following ways:

Let us assume that we always wish to sort items so that their keys are in ascending order.
Let us consider the cost for each algorithm investigated in terms of the number of comparisons and exchanges needed to completely sort $N$ items.
Let us also note whether the sorting requires extra memory (e.g., a copy of the array to be sorted), or if the sorting can be accomplished "in place" solely through exchanging pairs of elements in the array repeatedly.

Also -- so that we don't have to rewrite the sorting methods for every type of data we might wish to sort -- we assume that the data we wish to sort implements the Comparable interface.

This interface requires the presence of a compareTo() method so that one can campare the "size" of some $v$ and $w$ with v.compareTo(w). Specifically, if $v \lt w$, the method should return a negative value; if $v = w$, the method should return $0$; and if $v \gt w$, the method should return a positive value.

As an example, suppose we define a Rectangle class and wish to be able to compare rectangles based on their area. Such a class could implement the Comparable interface in the following way:

public class Rectangle implements Comparable<Rectangle> {  // <-- notice the type parameter! 
    
    int width;
    int height;
    
    public Rectangle(int width, int height) {
        this.width = width;
        this.height = height;
    }

    public int compareTo(Rectangle o) {
        int diffArea = this.width * this.height - o.width * o.height;
        return diffArea;
    }
    
    public static void main(String[] args) {
        Rectangle r1 = new Rectangle(3,5);
        Rectangle r2 = new Rectangle(4,4);

        // Below, we invoke the compareTo() method.
        // Since r2 > r1, we know that r2.compareTo(r1) will be positive.
        // Had r2 < r1, then r2.compareTo(r1) would have been negative. 
        // Had r2 == r1, then r2.compareTo(r1) would have been zero. 
        System.out.println(r2.compareTo(r1)); 
    }
}

Importantly, when we implement the Comparable interface we want the associated compareTo() method to define what is called a total order -- that is to say, a binary relation $\le$ that satisfies the following three properties:

Antisymmetry: if both $v \le w$ and $w \le v$, then $v = w$
Transitivity: if both $v \le w$ and $w \le x$, then $v \le x$
Totality (also called Connexity): either $v \le w$ or $w \le v$ or both

These properties are all satisfied for the example shown above involving rectangular areas and for many of the orderings with which we are familiar (e.g., the orderings of the natural and real numbers, chronological orderings of dates or times, lexiographic ordering for strings of text), but they are certainly not guaranteed.

Consider the game of rock, paper, scissors, whose rules are shown below. Notice that rock loses to paper ($rock \le paper$), paper loses to scissors ($paper \le scissors$), but rock does not lose to scissors ($rock \not \le scissors$). As such, transitivity in this case does not hold, even though antisymmetry and totality do.

As another example, consider the relation defined on the natural numbers $\{1,2,3,\ldots\}$ by $a \mid b$, meaning $a$ divides $b$ without remainder. Here we have antisymmetry (if $a \mid b$ and $b \mid a$, then $a = b$) and transitivity (if $a \mid b$ and $b \mid c$, then $a \mid c$), but not totality (note for example $2 \not \mid 3$ and $3 \not \mid 2$).

If we expand this relation to the integers $\{\ldots,-3,-2,-1,0,1,2,3,\ldots\}$, we see that antisymmetry fails as well. For instance, $2 \mid 6$ and $-2 \mid 6$, but $2 \neq -2$.

Finally, we should ensure that if one tries to compare some $v$ and $w$ using v.compareTo(w) where $v$ and $w$ are incompatible types (or if either one is null) that an exception is thrown.

Provided a class properly implements the Comparable interface, we can use it to define one of two "helper" methods for sorting. These methods (named less() and exch()) allow us to quickly identify every time we compare two items or exchange two items during a sorting process. Consequently, they make the running-time cost functions for each sorting algorithm (or bounds on these) much easier to find.