next up previous contents index CD CD Algorithms
Next: Priority Queues Up: Data Structures Previous: Data Structures

Dictionaries

  

  figure6819

Input description: A set of n records, each identified by one or more key fields.

Problem description: Build and maintain a data structure to efficiently locate, insert, or delete the record associated with any query key q.

Discussion: The abstract data type ``dictionary'' is one of the most important structures in computer science. Dozens of different data structures have been proposed for implementing dictionaries including hash tables, skip lists, and balanced/unbalanced binary search trees - so choosing the right one can be tricky. Depending on the application, it is also a decision that can significantly impact performance. In practice, it is more important to avoid using a bad data structure than to identify the single best option available.  

An essential piece of advice is to carefully isolate the implementation of the dictionary data structure from its interface. Use explicit calls to subroutines that initialize, search, and modify the data structure, rather than embedding them within the code. This leads to a much cleaner program, but it also makes it easy to try different dictionary implementations to see how they impact performance. Do not obsess about the cost of the procedure call overhead inherent in such an abstraction. If your application is so time-critical that such overhead can impact performance, then it is even more essential that you be able to easily experiment with different implementations of your dictionary.    

In choosing the right data structure for your dictionary, ask yourself the following questions:

Once you understand what your needs are, try to identify the best data structure from the list below:

Implementations: LEDA (see Section gif) provides an extremely complete collection of dictionary data structures in C++, including hashing, perfect hashing, B-trees, red-black trees, random search trees, and skip lists. Given all of these choices, their default dictionary implementation is a randomized search tree [AS89], presumably reflecting which structure they expect to be most efficient in practice.      

XTango (see Section gif) is an algorithm animation system for UNIX and X-windows that includes animations of such dictionary data structures as AVL trees, binary search trees, hashing, red-black trees, and treaps (randomized search trees). Many of these are interesting and quite informative to watch. Further, the C source code for each animation is included.      

The 1996 DIMACS implementation challenge focused on elementary data structures like dictionaries. The world's best available implementations were likely to be identified during the course of the challenge, and they are accessible from http://dimacs.rutgers.edu/ .    

Bare bones implementations in C and Pascal of a dizzying variety of dictionary data structures appear in [GBY91], among them several variations on hashing and binary search trees, and optimal binary search tree construction. See Section gif for details.  

Implementation-oriented treatments of a variety of dictionary data structures appear in [BR95], including hashing, splay trees, red-black trees, and what looks like a thorough implementation of B-trees. Code in C for these data structures is included in the text and is available on disk for a modest fee.

Notes: Mehlhorn and Tsakalidis [MT90b] give a thorough survey of the state of the art in modern data structures. Knuth [Knu73a] provides a detailed analysis and exposition on fundamental dictionary data structures but misses such modern data structures as red-black and splay trees. Gonnet and Baeza-Yates [GBY91] provide implementations (in C and Pascal), detailed references, and experimental results for a wide variety of dictionary data structures. We defer to these sources to avoid giving original references for each of the data structures described above.

Good expositions on red-black trees [GS78] include [BR95, CLR90, Woo93]. Good expositions on splay trees [ST85] include [Tar83, Woo93]. Good expositions on B-trees [BM72] include [BR95, CLR90]. Good expositions on hashing includes [Meh84, Woo93].

Several modern data structures, such as splay trees, have been studied via amortized analysis, where we bound the total amount of time used by any sequence of operations. In an amortized analysis, we show that if a single operation is very expensive, this is because we have already benefited from enough cheap operations before it to pay off the higher cost. A data structure realizing an amortized complexity of O(f(n)) is less desirable than one whose worst-case complexity is O(f(n)) (since a very bad operation might still occur) but better than one with an average-case complexity O(f(n)), since the amortized bound will achieve this average on any input.    

Newer dictionary data structures that explicitly incorporate randomization into the construction include randomized search trees [AS89] and skip lists [Pug90].  

Related Problems: Sorting (see page gif), Searching (see page gif).    


next up previous contents index CD CD Algorithms
Next: Priority Queues Up: Data Structures Previous: Data Structures

Algorithms
Mon Jun 2 23:33:50 EDT 1997