Symbol Tables / Dictionaries

A symbol table (also called a dictionary) is an abstract data type that associates different keys (a.k.a. symbols) with various values. We use the word "value" here in a very general sense -- it often the case that the value in question is a reference to an object that contains a considerable amount of information.

As a simple example, consider an actual physical dictionary -- the kind you find on your bookshelf. Every word is associated with some definition, as the table below suggests:

word (key)	definition (value)
aardvark	a large, nocturnal, burrowing ant-eating mammal
boisterous	rough and noisy; clamorous
cringe	to shrink, bend or crouch, especially in fear
⋮	⋮

The primary operations a symbol table needs to accomplish are:

the insertion of a new key-value pair (which includes changing the value of an existing key to something new); and
being able to search the symbol table for a given key to return its associated value.

To understand why, think again about our example of a physical dictionary. Someone had to write it -- all of these word-definition pairs had to be initially "inserted" into the book. Further, think about how dictionaries are used -- one can quickly search for some unrecognized word (the key) to find its definition (the value), but they are generally not used in the other direction (i.e., where one knows a definition, and seeks a word defined that way).

Some example applications of symbol tables in various contexts, along with their respective keys sought and associated values include:

application	purpose of search	key	value
dictionary	find definition	word	definition
book index	find relevant pages	term	list of page numbers
file share	find song to download	name of song	computer ID
account management	process transactions	account number	transaction details
web search	find relevant web pages	keyword	list of page names
compiler	find type and value	variable name	type and value
routing table	route internet packets	destination	best route
DNS	find IP address	domain name	IP address
reverseDNS	find domain name	IP address	domain name
genomics	find markers	DNA string	known positions
file system	find file on disk	filename	location on disk

In some of these other contexts it may also be useful to be able to determine the number of pairs the symbol table holds or if it is empty; to determine if there is a value associated with a given key; to delete a key (and its associated value) from the symbol table; and to be able to iterate through all of the keys in the table.

With these other uses in mind, we can use the following as a reasonable SymbolTable interface:

public interface SymbolTable<Key, Value> {
    
   void put(Key key, Value value);  // inserts a key-value pair into the table,
                                    // or updates the value associated with a key if it 
                                    // is already present
    
   boolean contains(Key key);       // returns true if the table contains the specified key
    
   Value get(Key key);              // returns the value paired with given key
    
   void delete(Key key);            // removes the given key (and its associated value)
                                    // from the table
    
   boolean isEmpty();               // returns true if the table is empty
    
   int size();                      // returns the number of key-value pairs in the table
    
   Iterable<Key> keys();            // returns an Iterable collection of all keys in the table
}

While the values involved can be any generic type, having the keys be of a type that implements the Comparable interface will increase our options when it comes to how the symbol table can be implemented.

Additionally, it is a best practice that the keys be of an immutable type (e.g., Integer, Double, String, etc.)

Some Easy, but Inefficient Implementations

Unordered Linked List

One could imagine implementing an elementary symbol table using an unordered linked list, where each node stored three things: a key, a value, and a reference to the next node. However, due to its unordered nature, one would need to traverse the entire list to search for a given key -- a process that requires $\sim n$ comparisons in the worst case, and $\sim n/2$ comparisons on average.

Similarly, since inserting new key-value pairs includes updates of values associated with existing keys, in any insertion we'll have to search for the key first. If we find it, we'll update the corresponding value; if we don't, we can add the key-value pair to the front of the list. This makes the number of comparisons for insertions of brand new keys $\sim n$.
Ordered Array

If we switch to storing the necessary keys and values using ordered arrays, we can dramatically decrease the comparison cost of searching by employing a binary search for the key sought. This makes both the average and worst-case number of comparisons $\sim \ln n$.

Unfortunately, insertions still cause a problem. Consider the case where the new key-value pair needs to be inserted at the beginning of the array. So that the order of the array is maintained, this means that all $n$ elements of the existing array must be shifted to the right by one position. Consequently, the cost of insertion (in terms of exchanges needed) is still $\sim n$ in the worst case (and $n/2$ on average).

There is, however, another and better way -- one involving a new data structure called a binary search tree...

Symbol Tables / Dictionaries

Some Easy, but Inefficient Implementations

Unordered Linked List

Ordered Array