Input description: An integer n.
Problem description: Generate (1) all or (2) a random or (3) the next subset of the integers .
Discussion: A subset describes a selection of objects, where the order among them does not matter. Many of the algorithmic problems in this catalog seek the best subset of a group of things: vertex cover seeks the smallest subset of vertices to touch each edge in a graph; knapsack seeks the most profitable subset of items of bounded total size; and set packing seeks the smallest subset of subsets that together cover each item exactly once.
There are distinct subsets of an n-element set, including the empty set as well as the set itself. This grows exponentially, but at a considerably smaller rate than the n! permutations of n items. Indeed, since 1,048,576, a brute-force search through all subsets of 20 elements is easily manageable, although by n=30, 1,073,741,824, so you will certainly be pushing things.
By definition, the relative order among the elements does not distinguish different subsets. Thus is the same as . However, it is a very good idea to maintain your subsets in a sorted or canonical order, in order to speed up such operations as testing whether two subsets are identical or making them look right when printed.
As with permutations (see Section ), the key to subset generation problems is establishing a numerical sequence among all subsets. There are three primary alternatives:
Subset generation in Gray code order can be very fast, because there is a nice recursive construction to sequence subsets. Further, since only one element changes between subsets, exhaustive search algorithms built on Gray codes can be quite efficient. A set cover program would only have to update the change in coverage by the addition or deletion of one subset. See the implementation section below for Gray code subset generation programs.
This binary representation is the key to solving all subset generation problems. To generate all subsets in order, simply count from 0 to . For each integer, successively mask off each of the bits and compose a subset of exactly the items corresponding to `1' bits. To generate the next or previous subset, increment or decrement the integer by one. Unranking a subset is exactly the masking procedure, while ranking constructs a binary number with 1's corresponding to items in S and then converts this binary number to an integer.
To generate a random subset, you could generate a random integer from 0 to and unrank, although you are probably asking for trouble because any flakiness with how your random number generator rounds things off means that certain subsets can never occur. Therefore, a better approach is simply to flip a coin n times, with the ith flip deciding whether to include element i in the subset. A coin flip can be robustly simulated by generating a random real or large integer and testing whether it is bigger or smaller than half the range. A Boolean array of n items can thus be used to represent subsets as a sort of premasked integer. The only complication is that you must explicitly handle the carry if you seek to generate all subsets.
Generation problems for two closely related problems arise often in practice:
The best way to construct all k-subsets is in lexicographic order. The ranking function is based on the observation that there are k-subsets whose smallest element is f. Using this, it is possible to determine the smallest element in the mth k-subset of n items. We then proceed recursively for subsequent elements of the subset. See the implementations below for details.
Implementations: Nijenhuis and Wilf [NW78] provide efficient Fortran implementations of algorithms to construct random subsets and to sequence subsets in Gray code and lexicographic order. They also provide routines to construct random k-subsets and sequence them in lexicographic order. See Section for details on ftp-ing these programs. Algorithm 515 [BL77] of the Collected Algorithms of the ACM is another Fortran implementation of lexicographic k-subsets, available from Netlib (see Section ).
An exciting WWW site developed by Frank Ruskey of the University of Victoria contains a wealth of material on generating combinatorial objects of different types, including permutations, subsets, partitions, and certain graphs. Specifically, it provides an interactive interface that lets you specify which type of objects you would like to construct and then returns the objects to you. Check this out at http://sue.csc.uvic.ca/ cos/.
Combinatorica [Ski90] provides Mathematica implementations of algorithms to construct random subsets and to sequence subsets in Gray code, binary, and lexicographic order. They also provide routines to construct random k-subsets and strings, and sequence them lexicographically. See Section for further information on Combinatorica.
Notes: The primary expositions on subset generation include [NW78, RND77, Rus97]. Wilf [Wil89] provides an update of [NW78], including a thorough discussion of modern Gray code generation problems.
Gray codes were first developed [Gra53] to transmit digital information in a robust manner over an analog channel. By assigning the code words in Gray code order, the ith word differs only slightly from the (i+1)st, so minor fluctuations in analog signal strength corrupts only a few bits. Gray codes have a particularly nice correspondence to Hamiltonian cycles on the hypercube. See any of the references above for details. An exposition on the more general problem of constructing Gray codes for k items (instead of subsets) appears in [Man89].
The popular puzzle Spinout, manufactured by Binary Arts Corporation, can be solved using Gray codes.
Related Problems: Generating permutations (see page ), generating partitions (see page ).