Randomization

Next: Bucketing Techniques Up: Approaches to Sorting Previous: Divide and Conquer

Randomization

Suppose we select, at random, one of the n items we seek to sort. Call this element p. In quicksort, we separate the n-1 other items into two piles: a low pile containing all the elements that will appear before p in sorted order and a high pile containing all the elements that will appear after p in sorted order. After sorting the two piles and arranging the piles correctly, we have succeeded in sorting the n items.



Quicksort(A, low, high)

		 if (low < high)

				 ploc = Partition(A,low,high)

				 Quicksort(A,low, ploc - 1)

				 Quicksort(A, ploc+1, high)



Partition(A,low,high)

		 swap(A[low],A[random(  )])

		 pivot = A[low]

		 leftwall = low

		 for i = low+1 to high

				  if (A[i] < pivot) then

						  leftwall = leftwall+1

						  swap(A[i],A[leftwall])

		 swap(A[low],A[leftwall])

Mergesort ran in time because we split the keys into two equal halves, sorted them recursively, and then merged the halves in linear time. Thus whenever our pivot element is near the center of the sorted array (i.e. the pivot is close to the median element), we get a good split and realize the same performance as mergesort. Such good pivot elements will often be the case. After all, half the elements lie closer to the middle than one of the ends. On average, Quicksort runs in time. If we are extremely unlucky and our randomly selected elements always are among the largest or smallest element in the array, Quicksort turn into selection sort and runs in . However, the odds against this are vanishingly small.

Randomization is a powerful, general tool to improve algorithms with bad worst-case but good average-case complexity. The worst case examples still exist, but they depend only upon how unlucky we are, not on the order that the input data is presented to us. For randomly chosen pivots, we can say that

``Randomized quicksort runs in time on any input, with high probability.''

If instead, we used some deterministic rule for selecting pivots (like always selecting A[(low+high)/2] as pivot), we can make no claim stronger than

``Quicksort runs in time if you give me random input data, with high probability.''

Randomization can also be used to drive search techniques such as simulated annealing, which are discussed in Section .

Next: Bucketing Techniques Up: Approaches to Sorting Previous: Divide and Conquer

Algorithms
Mon Jun 2 23:33:50 EDT 1997