next up previous contents index CD CD Algorithms
Next: Median and Selection Up: Combinatorial Problems Previous: Sorting

Searching

  

  figure11392

Input description: A set S of n keys, a query key q.

Problem description: Where is q in S?

Discussion: Searching means different things to different people. Searching for the global maximum or minimum of a function is the problem of unconstrained optimization and is discussed in Section gif. Chess playing programs search for the best move to make next by using alpha-beta minimax search, which is an exhaustive search of the possible moves using a variation of backtracking (see Section gif).      

Here we consider the simple task of searching for a key in a list or in an array, which is a fundamental problem associated with information retrieval.     Dictionary data structures maintain efficient access to sets of keys under insertion and deletion and are discussed in Section gif. Typical dictionaries include binary trees and hash tables.

We treat searching here as a problem distinct from dictionaries because simpler and more efficient solutions emerge when our primary interest is static searching.   These little data structures can yield large performance improvements when properly employed in an innermost loop.     Also, several of the ideas from list and array searching, such as binary search and self-organization, apply to other problems and justify our attention.   

There are two basic approaches to array searching: sequential search and binary search. Both are simple, yet have interesting and subtle variations. In sequential search, we simply start from the front of our list or array of keys and compare each successive item against the key until we find a match or reach the end. In binary search, we start with a sorted array of keys. To search for key q, we compare q to the middle key tex2html_wrap_inline27905 . If q is before tex2html_wrap_inline27907 , it must reside in the top half of our set; if not, it must reside in the bottom half of our set. By repeating this process on the correct half, we find the key in a total of tex2html_wrap_inline27909 comparisons. This is a big win over the n/2 comparisons we expect with sequential search. See Section gif for more on binary search.

Sequential search is the simplest algorithm, and likely to be fastest on up to 10-20 elements. For such tiny problems, forget about binary search. Beyond 50-100 elements, there is no question that binary search will be more efficient than sequential search, even factoring in the cost of the sorting (assuming multiple queries).   Other issues do come into play, however, particularly in identifying the proper variant of the algorithm:

Implementations: The basic sequential and binary search algorithms are simple enough to implement that you should likely do them yourself. Still, the routines described below may be useful as models.

Gonnet and Baeza-Yates provides code fragments in C and Pascal for sequential, binary, and interpolation search, as well as for related dictionary structures. LEDA (see Section gif) provides a sorted array data type in C++ that supports binary search. Many textbooks include implementations of binary search, including [MS91]. See Section gif for details.        

Notes: Mehlhorn and Tsakalidis [MT90b] give a through survey of the state-of-the-art in modern data structures. Knuth [Knu73a] provides a detailed analysis and exposition on all fundamental search algorithms and dictionary data structures but omits such modern data structures as red-black and splay trees. Gonnet and Baeza-Yates [GBY91] provide detailed references and experimental results for a wide variety of search algorithms.

Manber [Man89] provides an interesting discussion of variations of binary search, including one-sided binary search and searching for an index in A where tex2html_wrap_inline27925 .

  In linear interpolation search on an array of sorted numbers, the next position probed is given by

displaymath27903

where q is the query numerical key and S the sorted numerical array. If the keys are drawn independently from a uniform distribution, the expected search time is tex2html_wrap_inline27927 [YY76]. Expositions on interpolation search include [Raw92].

Nonuniform access patterns can be exploited in binary search trees by structuring them so that popular keys are located near the root, thus minimizing search time.   Dynamic programming can be used to construct such optimal search trees in tex2html_wrap_inline27929 time [Knu73b]. Expositions include [AHU74].     Splay trees are self-organizing tree structures, as discussed in Section gif.

Related Problems: Dictionaries (see page gif), sorting (see page gif).    


next up previous contents index CD CD Algorithms
Next: Median and Selection Up: Combinatorial Problems Previous: Sorting

Algorithms
Mon Jun 2 23:33:50 EDT 1997