Symbol Tables / Dictionaries

A symbol table (also called a dictionary) is an abstract data type that associates different keys (a.k.a. symbols) with various values. We use the word "value" here in a very general sense -- it often the case that the value in question is a reference to an object that contains a considerable amount of information.

As a simple example, consider an actual physical dictionary -- the kind you find on your bookshelf. Every word is associated with some definition, as the table below suggests:

word (key) definition (value)
aardvarka large, nocturnal, burrowing ant-eating mammal
boisterousrough and noisy; clamorous
cringeto shrink, bend or crouch, especially in fear

The primary operations a symbol table needs to accomplish are:

1. the insertion of a new key-value pair (which includes changing the value of an existing key to something new); and
2. being able to search the symbol table for a given key to return its associated value.

To understand why, think again about our example of a physical dictionary. Someone had to write it -- all of these word-definition pairs had to be initially "inserted" into the book. Further, think about how a dictionaries are used -- one can quickly search for some unrecognized word (the key) to find its definition (the value), but they are generally not used in the other direction (i.e., to find a word with a given definition).

Some example applications of symbol tables in various contexts, along with their respective keys sought and associated values include:

application purpose of search key value
dictionary find definition word definition
book index find relevant pages term list of page numbers
account management process transactions account number transaction details
web search find relevant web pages keyword list of page names
compiler find type and value variable name type and value
routing table route internet packets destination best route
reverseDNS find domain name IP address domain name
genomics find markers DNA string known positions
file system find file on disk filename location on disk

In some of these other contexts it may also be useful to be able to determine the number of pairs the symbol table holds or if it is empty; to determine if there is a value associated with a given key; to delete a key (and its associated value) from the symbol table; and to be able to iterate through all of the keys in the table.

With these other uses in mind, we can use the following as a reasonable SymbolTable interface:

public interface SymbolTable<Key, Value> {

void put(Key key, Value value);  // inserts a key-value pair into the table,
// or updates the value associated with a key if it

boolean contains(Key key);       // returns true if the table contains the specified key

Value get(Key key);              // returns the value paired with given key

void delete(Key key);            // removes the given key (and its associated value)
// from the table

boolean isEmpty();               // returns true if the table is empty

int size();                      // returns the number of key-value pairs in the table

Iterable<Key> keys();            // returns an Iterable collection of all keys in the table
}


While the values involved can be any generic type, having the keys be of a type that implements the Comparable interface will increase our options when it comes to how the symbol table can be implemented.

Additionally, it is a best practice that the keys be of an immutable type (e.g., Integer, Double, String, etc.)

Some Easy, but Inefficient Implementations

One could imagine implementing an elementary symbol table using an unordered linked list, where each node stored three things: a key, a value, and a reference to the next node. However, due to its unordered nature, one would need to traverse the entire list to search for a given key -- a process that requires $\sim n$ comparisons in the worst case, and $\sim n/2$ comparisons on average.
Similarly, since inserting new key-value pairs includes updates of values associated with existing keys, in any insertion we'll have to search for the key first. If we find it, we'll update the corresponding value; if we don't, we can add the key-value pair to the front of the list. This makes the number of comparisons for insertions of brand new keys $\sim n$.
If we switch to storing the necessary keys and values using ordered arrays, we can dramatically decrease the comparison cost of searching by employing a binary search for the key sought. This makes both the average and worst-case number of comparisons $\sim \ln n$.
Unfortunately, insertions still cause a problem. Consider the case where the new key-value pair needs to be inserted at the beginning of the array. So that the order of the array is maintained, this means that all $n$ elements of the existing array must be shifted to the right by one position. Consequently, the cost of insertion (in terms of exchanges needed) is still $\sim n$ in the worst case (and $n/2$ on average).