A symbol table (also called a dictionary) is an abstract data type that associates different keys (a.k.a. symbols) with various values. We use the word "value" here in a very general sense -- it often the case that the value in question is a reference to an object that contains a considerable amount of information.
As a simple example, consider an actual physical dictionary -- the kind you find on your bookshelf. Every word is associated with some definition, as the table below suggests:
word (key) | definition (value) |
---|---|
aardvark | a large, nocturnal, burrowing ant-eating mammal |
boisterous | rough and noisy; clamorous |
cringe | to shrink, bend or crouch, especially in fear |
The primary operations a symbol table needs to accomplish are:
To understand why, think again about our example of a physical dictionary. Someone had to write it -- all of these word-definition pairs had to be initially "inserted" into the book. Further, think about how dictionaries are used -- one can quickly search for some unrecognized word (the key) to find its definition (the value), but they are generally not used in the other direction (i.e., where one knows a definition, and seeks a word defined that way).
Some example applications of symbol tables in various contexts, along with their respective keys sought and associated values include:
application | purpose of search | key | value |
---|---|---|---|
dictionary | find definition | word | definition |
book index | find relevant pages | term | list of page numbers |
file share | find song to download | name of song | computer ID |
account management | process transactions | account number | transaction details |
web search | find relevant web pages | keyword | list of page names |
compiler | find type and value | variable name | type and value |
routing table | route internet packets | destination | best route |
DNS | find IP address | domain name | IP address |
reverseDNS | find domain name | IP address | domain name |
genomics | find markers | DNA string | known positions |
file system | find file on disk | filename | location on disk |
In some of these other contexts it may also be useful to be able to determine the number of pairs the symbol table holds or if it is empty; to determine if there is a value associated with a given key; to delete a key (and its associated value) from the symbol table; and to be able to iterate through all of the keys in the table.
With these other uses in mind, we can use the following as a reasonable SymbolTable
interface:
public interface SymbolTable<Key, Value> { void put(Key key, Value value); // inserts a key-value pair into the table, // or updates the value associated with a key if it // is already present boolean contains(Key key); // returns true if the table contains the specified key Value get(Key key); // returns the value paired with given key void delete(Key key); // removes the given key (and its associated value) // from the table boolean isEmpty(); // returns true if the table is empty int size(); // returns the number of key-value pairs in the table Iterable<Key> keys(); // returns an Iterable collection of all keys in the table }
While the values involved can be any generic type, having the keys be of a type that implements the Comparable
interface will increase our options when it comes to how the symbol table can be implemented.
Additionally, it is a best practice that the keys be of an immutable type (e.g., Integer
, Double
, String
, etc.)
One could imagine implementing an elementary symbol table using an unordered linked list, where each node stored three things: a key, a value, and a reference to the next node. However, due to its unordered nature, one would need to traverse the entire list to search for a given key -- a process that requires $\sim n$ comparisons in the worst case, and $\sim n/2$ comparisons on average.
Similarly, since inserting new key-value pairs includes updates of values associated with existing keys, in any insertion we'll have to search for the key first. If we find it, we'll update the corresponding value; if we don't, we can add the key-value pair to the front of the list. This makes the number of comparisons for insertions of brand new keys $\sim n$.
If we switch to storing the necessary keys and values using ordered arrays, we can dramatically decrease the comparison cost of searching by employing a binary search for the key sought. This makes both the average and worst-case number of comparisons $\sim \ln n$.
Unfortunately, insertions still cause a problem. Consider the case where the new key-value pair needs to be inserted at the beginning of the array. So that the order of the array is maintained, this means that all $n$ elements of the existing array must be shifted to the right by one position. Consequently, the cost of insertion (in terms of exchanges needed) is still $\sim n$ in the worst case (and $n/2$ on average).
There is, however, another and better way -- one involving a new data structure called a binary search tree...