| ![]() |
A symbol table (also called a dictionary) is an abstract data type that associates different keys (a.k.a. symbols) with various values. We use the word "value" here in a very general sense -- it often the case that the value in question is a reference to an object that contains a considerable amount of information.
As a simple example, consider an actual physical dictionary -- the kind you find on your bookshelf. Every word is associated with some definition, as the table below suggests:
| word (key) | definition (value) |
|---|---|
| aardvark | a large, nocturnal, burrowing ant-eating mammal |
| boisterous | rough and noisy; clamorous |
| cringe | to shrink, bend or crouch, especially in fear |
The primary operations a symbol table needs to accomplish are:
To understand why, think again about our example of a physical dictionary. Someone had to write it -- all of these word-definition pairs had to be initially "inserted" into the book. Further, think about how dictionaries are used -- one can quickly search for some unrecognized word (the key) to find its definition (the value), but they are generally not used in the other direction (i.e., where one knows a definition, and seeks a word defined that way).
Some example applications of symbol tables in various contexts, along with their respective keys sought and associated values include:
| application | purpose of search | key | value |
|---|---|---|---|
| dictionary | find definition | word | definition |
| book index | find relevant pages | term | list of page numbers |
| file share | find song to download | name of song | computer ID |
| account management | process transactions | account number | transaction details |
| web search | find relevant web pages | keyword | list of page names |
| compiler | find type and value | variable name | type and value |
| routing table | route internet packets | destination | best route |
| DNS | find IP address | domain name | IP address |
| reverseDNS | find domain name | IP address | domain name |
| genomics | find markers | DNA string | known positions |
| file system | find file on disk | filename | location on disk |
In some of these other contexts it may also be useful to be able to determine the number of pairs the symbol table holds or if it is empty; to determine if there is a value associated with a given key; to delete a key (and its associated value) from the symbol table; and to be able to iterate through all of the keys in the table.
With these other uses in mind, we can use the following as a reasonable SymbolTable interface:
public interface SymbolTable<Key, Value> {
void put(Key key, Value value); // inserts a key-value pair into the table,
// or updates the value associated with a key if it
// is already present
boolean contains(Key key); // returns true if the table contains the specified key
Value get(Key key); // returns the value paired with given key
void delete(Key key); // removes the given key (and its associated value)
// from the table
boolean isEmpty(); // returns true if the table is empty
int size(); // returns the number of key-value pairs in the table
Iterable<Key> keys(); // returns an Iterable collection of all keys in the table
}
While the values involved can be any generic type, having the keys be of a type that implements the Comparable interface will increase our options when it comes to how the symbol table can be implemented.
Additionally, it is a best practice that the keys be of an immutable type (e.g., Integer, Double, String, etc.)
One could imagine implementing an elementary symbol table using an unordered linked list, where each node stored three things: a key, a value, and a reference to the next node. However, due to its unordered nature, one would need to traverse the entire list to search for a given key -- a process that requires $\sim n$ comparisons in the worst case, and $\sim n/2$ comparisons on average.
Similarly, since inserting new key-value pairs includes updates of values associated with existing keys, in any insertion we'll have to search for the key first. If we find it, we'll update the corresponding value; if we don't, we can add the key-value pair to the front of the list. This makes the number of comparisons for insertions of brand new keys $\sim n$.
If we switch to storing the necessary keys and values using ordered arrays, we can dramatically decrease the comparison cost of searching by employing a binary search for the key sought. This makes both the average and worst-case number of comparisons $\sim \ln n$.
Unfortunately, insertions still cause a problem. Consider the case where the new key-value pair needs to be inserted at the beginning of the array. So that the order of the array is maintained, this means that all $n$ elements of the existing array must be shifted to the right by one position. Consequently, the cost of insertion (in terms of exchanges needed) is still $\sim n$ in the worst case (and $n/2$ on average).
There is, however, another and better way -- one involving a new data structure called a binary search tree...