Clique

Next: Independent Set Up: Graph Problems: Hard Problems Previous: Graph Problems: Hard Problems

Clique

Input description: A graph G=(V,E).

Problem description: What is the largest such that for all , ?

Discussion: When I went to high school, everybody complained about the ``clique'', a group of friends who all hung around together and seemed to dominate everything social. Consider a graph whose vertices represent a set of people, with edges between any pair of people who are friends. Thus the clique in high school was in fact a clique in this friendship graph.

Identifying ``clusters'' of related objects often reduces to finding large cliques in graphs. One example is in a program recently developed by the Internal Revenue Service (IRS) to detect organized tax fraud, where groups of phony tax returns are submitted in the hopes of getting undeserved refunds. The IRS constructs graphs with vertices corresponding to submitted tax forms and with edges between any two tax-forms that appear suspiciously similar. A large clique in this graph points to fraud.

Since any edge in a graph represents a clique of two vertices, the challenge lies not in finding a clique, but in finding a large one. And it is indeed a challenge, for finding a maximum clique is NP-complete. To make matters worse, not only is no good approximation algorithm known, it is provably hard to approximate even to within a factor of . Theoretically, clique is about as hard as a problem in this book can get. So what can we hope to do about it?

Will a maximal clique suffice? - A maximal clique is a subset of vertices, each pair of which defines an edge, that cannot be enlarged by adding any additional vertex. This doesn't mean that it has to be large relative to the largest possible clique, but it might be. To find a nice maximal clique, sort the vertices from highest degree to lowest degree, put the first vertex in the clique, and then test each of the other vertices in order to see whether it is adjacent to all the clique vertices thus far. If so, add it; if not, continue down the list. In O(n m) time you will have a maximal, and hopefully large, clique. An alternative approach would be to incorporate some randomness into your vertex ordering and accept the largest maximal clique you find after a certain number of trials.
What if I am looking for a large, dense subgraph instead of a perfect clique? - Insisting on perfect cliques to define clusters in a graph can be risky, since the loss of a single edge due to error will eliminate that vertex from consideration. Instead, we should seek large dense subgraphs, i.e. subsets of vertices that contain a large number of edges between them. Cliques are, by definition, the densest subgraphs possible.
A simple linear-time algorithm can be used to find the largest set of vertices whose induced (defined) subgraph has minimum vertex degree , beginning by repeatedly deleting all the vertices whose degree is less than k. This may reduce the degree of other vertices below k if they were adjacent to any deleted low-degree vertices. By repeating this process until all remaining vertices have degree , we eventually construct the largest dense subgraph. This algorithm can be implemented in O(n+m) time by using adjacency lists and the constant-width priority queue of Section . By continuing to delete the lowest-degree vertices, we will eventually end up with a clique, which may or may not be large depending upon the graph.
What if the graph is planar? - Planar graphs cannot have cliques of size larger than four, or else they cease to be planar. Since any edge defines a clique of size two, the only interesting cases are cliques of 3 and 4 vertices. Efficient algorithms to find such small cliques consider the vertices from lowest to highest degree. Any planar graph must contain a vertex of degree at most 5 (see Section ), so there is only a constant-sized neighborhood to check exhaustively for a clique. Once we finish with this vertex, we delete it to leave a smaller planar graph with at least one low-degree vertex. Repeat until the graph is empty.

If you really need to find the largest clique in a graph, an exhaustive search via backtracking provides the only real solution. We search through all k-subsets of the vertices, pruning a subset as soon as it contains a vertex that is not connected to all the rest. We never need consider a subset of size larger than the highest vertex degree in the graph, since the maximum degree gives an upper bound on the size of the largest clique in the graph. Similarly, as soon as we discover a clique of size k, no vertex of degree can help find a larger clique. To speed our search, we should delete all such useless vertices from G.

Heuristics for finding large cliques based on randomized techniques such as simulated annealing are likely to work reasonably well.

Implementations: Programs for the closely related problems of finding cliques and independent sets were sought for the Second DIMACS Implementation Challenge, held in October 1993. Programs and data from the challenge are available by anonymous ftp from dimacs.rutgers.edu. Source codes are available under pub/challenge/graph and test data under pub/djs.

In particular, two C language programs by David S. Johnson and David L. Applegate are available. The dfmax.c implements a simple-minded branch-and-bound algorithm similar to that of [CP90]. The dmclique.c uses a ``semi-exhaustive greedy'' scheme for finding large independent sets described in [JAMS91]. Performance data for both programs is available in files results.dfmax and results.dmclique in directories /pub/challenge/graph/benchmarks/clique and /pub/challenge/graph/benchmarks/volume.

Combinatorica [Ski90] provides (slow) Mathematica implementations of cliques, independent sets, and vertex covers. See Section .

Notes: Good expositions of the proof that clique is NP-complete [Kar72] include [CLR90, GJ79, Man89]. It is also given in Section . This reduction established that clique, vertex cover, and independent set are very closely related problems, so heuristics and programs that solve one of them may also produce reasonable solutions for the other two.

The linear-time algorithm for constructing maximal induced subgraphs is discussed in [Man89]. That clique cannot be approximated to within a factor of is shown in [BGS95].

Related Problems: Independent set (see page ), vertex cover (see page ).

Next: Independent Set Up: Graph Problems: Hard Problems Previous: Graph Problems: Hard Problems

Algorithms
Mon Jun 2 23:33:50 EDT 1997