Input description: A set of subsets of the universal set .
Problem description: What is the smallest subset T of S such that ?
Discussion: Set cover arises when you try to efficiently acquire or represent items that have been packaged in a fixed set of lots. You want to obtain all the items, while buying as few lots as possible. Finding a cover is easy, because you can always buy one of each lot. However, by finding a small set cover you can do the same job for less money.
An interesting application of set cover is Boolean logic minimization. We are given a particular Boolean function of k variables, which for each of the possible input vectors describes whether the desired output is 0 or 1. We seek the simplest circuit that exactly implements this function. One approach is to find a disjunctive normal form (DNF) formula on the variables and their complements, such as . We could build one ``and'' term for each input vector and then ``or'' them all together, but we might save considerably by factoring out common subsets of variables. Given a set of feasible ``and'' terms, each of which covers a subset of the vectors we need, we seek to ``or'' together the smallest number of terms that realize the function. This is exactly the set cover problem.
There are several variations of set cover problems to be aware of:
It is instructive to model vertex cover as an instance of set cover. Let the universal set U be the set of edges . Construct n subsets, with consisting of the edges incident on vertex . Although vertex cover is just a set cover problem in disguise, you should take advantage of the fact that better algorithms exist for vertex cover.
Figure: Hitting set is dual to set cover
Hitting set is, in fact, dual to set cover, meaning it is exactly the same problem in disguise. Replace each element of U by a set of the names of the subsets that contain it. Now S and U have exchanged roles, for we seek a set of subsets from U to cover all the elements of S. This is is exactly the set cover problem. Thus we can use any of the set cover codes below to solve hitting set after performing this simple translation.
Since the vertex cover problem is NP-complete, the set cover problem must be at least as hard. In fact, it is somewhat harder. Approximation algorithms do no worse than twice optimal for vertex cover, but only a times optimal approximation algorithm exists for set cover.
The greedy heuristic is the right approach for set cover. Begin by placing the largest subset in the set cover, and then mark all its elements as covered. We will repeatedly add the subset containing the largest number of uncovered elements, until all elements are covered. This heuristic always gives a set cover using at most times as many sets as optimal, and in practice it usually does a lot better.
The simplest implementation of the greedy heuristic sweeps through the entire input instance of m subsets for each greedy step. However, by using such data structures as linked lists and a bounded-height priority queue (see Section ), the greedy heuristic can be implemented in O(S) time, where is the size of the input representation.
It pays to check whether there are certain elements that exist in only a few subsets, ideally only one. If so, we should select the biggest subsets containing these elements at the very beginning. We will have to take them eventually, and they carry with them extra elements that we might have to pay to cover by waiting until later.
Simulated annealing is likely to produce somewhat better set covers than these simple heuristics, if that is important for your application. Backtracking can be used to guarantee you an optimal solution, but typically it is not worth the computational expense.
Implementations: Pascal implementations of an exhaustive search algorithm for set packing, as well as heuristics for set cover, appear in [SDK83]. See Section .
Notes: Survey articles on set cover include [BP76]. See [CK75] for a computational study of set cover algorithms. An excellent exposition on algorithms and reduction rules for set cover is presented in [SDK83].
Good expositions of the greedy heuristic for set cover include [CLR90]. An example demonstrating that the greedy heuristic for set cover can be as bad as is presented in [Joh74, PS82]. This is not a defect of the heuristic. Indeed, it is provably hard to approximate set cover to within an approximation factor better than [LY93].
Related Problems: Matching (see page ), vertex cover (see page ), set packing (see page ).