The Theory of NP-Completeness
Several times this semester we have encountered problems for which we couldn't find efficient algorithms, such as the traveling salesman problem. We also couldn't prove an exponential time lower bound for the problem.
By the early 1970s, literally hundreds of problems were stuck in this limbo. The theory of NP-Compleness, developed by Stephen Cook and Richard Karp, provided the tools to show that all of these problems were really the same problem.
Polynomial vs. Exponential Time
n f(n) = n
f(n) = n! 10 0.01 s 0.1 s
1 s 3.63 ms
20 0.02 s 0.4 s
1 ms 77.1 years
30 0.03 s 0.9 s
1 sec years
40 0.04 s 1.6 s
18.3 min
50 0.05 s 2.5 s
13 days 100 0.1 s 10 s
years
1,000 1.00 s 1 ms
The Main Idea
Suppose I gave you the following algorithm to solve the bandersnatch problem:
Bandersnatch(G)
Convert G to an instance of the Bo-billy problem Y.
Call the subroutine Bo-billy on Y to solve this instance.
Return the answer of Bo-billy(Y) as the answer to G.
Such a translation from instances of one type of problem to instances of another type such that answers are preserved is called a reduction.
Now suppose my reduction translates G to Y in O(P(n)):
The second argument is the idea we use to prove problems hard!
Convex Hull and Sorting
A nice example of a reduction goes from sorting numbers to the convex hull problem:
Since this parabola is convex, every point is on the convex hull. Further since neighboring points on the convex hull have neighboring x values, the convex hull returns the points sorted by x-coordinate, ie. the original numbers.
Sort(S)
For each , create point .
Call subroutine convex-hull on this point set.
From the leftmost point in the hull,
read off the points from left to right.
Creating and reading off the points takes O(n) time.
What does this mean? Recall the sorting lower bound of . If we could do convex hull in better than , we could sort faster than - which violates our lower bound.
Thus convex hull must take as well!!!
Observe that any convex hull algorithm also gives us a complicated but correct sorting algorithm as well.
What is a problem?
A problem is a general question, with parameters for the input and conditions on what is a satisfactory answer or solution.
An instance is a problem with the input parameters specified.
Example: The Traveling Salesman
Problem: Given a weighted graph G, what tour minimizes .
Instance: , , , , ,
A problem with answers restricted to yes and no is called a decision problem. Most interesting optimization problems can be phrased as decision problems which capture the essence of the computation.
Example: The Traveling Salesman Decision Problem.
Given a weighted graph G and integer k, does there exist a traveling salesman tour with cost k?
Using binary search and the decision version of the problem we can find the optimal TSP solution.
For convenience, from now on we will talk only about decision problems.
Note that there are many possible ways to encode the input graph: adjacency matrices, edge lists, etc. All reasonable encodings will be within polynomial size of each other.
The fact that we can ignore minor differences in encoding is important. We are concerned with the difference between algorithms which are polynomial and exponential in the size of the input.
Satisfiability
Consider the following logic problem:
Instance: A set V of variables and a set of clauses C over V.
Question: Does there exist a satisfying truth assignment for C?
Example 1: and
A clause is satisfied when at least one literal in it is TRUE. C is satisfied when TRUE.
Example 2: ,
Although you try, and you try, and you try and you try, you can get no satisfaction.
There is no satisfying assigment since must be FALSE (third clause), so must be FALSE (second clause), but then the first clause is unsatisfiable!
For various reasons, it is known that satisfiability is a hard problem. Every top-notch algorithm expert in the world (and countless other, lesser lights) have tried to come up with a fast algorithm to test whether a given set of clauses is satisfiable, but all have failed.
Further, many strange and impossible-to-believe things have been shown to be true if someone in fact did find a fast satisfiability algorithm.
Clearly, Satisfiability is in NP, since we can guess an assignment of TRUE, FALSE to the literals and check it in polynomial time.
P versus NP
The precise distinction between whether a problem is in P or NP is somewhat technical, requiring formal language theory and Turing machines to state correctly.
However, intuitively a problem is in P, (ie. polynomial) if it can be solved in time polynomial in the size of the input.
A problem is in NP if, given the answer, it is possible to verify that the answer is correct within time polynomial in the size of the input.
Example P - Is there a path from to t in G of length less than k.
Example NP - Is there a TSP tour in G of length less than k. Given the tour, it is easy to add up the costs and convince me it is correct.
Example not NP - How many TSP tours are there in G of length less than k. Since there can be an exponential number of them, we cannot count them all in polynomial time.
Don't let this issue confuse you - the important idea here is of reductions as a way of proving hardness.
3-Satisfiability
Instance: A collection of clause C where each clause contains exactly 3 literals, boolean variable v.
Question: Is there a truth assignment to v so that each clause is satisfied?
Note that this is a more restricted problem than SAT. If 3-SAT is NP-complete, it implies SAT is NP-complete but not visa-versa, perhaps long clauses are what makes SAT difficult?!
After all, 1-Sat is trivial!
Theorem: 3-SAT is NP-Complete
Proof: 3-SAT is NP - given an assignment, just check that each clause is covered. To prove it is complete, a reduction from must be provided. We will transform each clause independantly based on its length.
Suppose the clause contains k literals.
, , , .
Note that the only way all four of these can be satisfied is if z is TRUE.
If none of the original variables in a clause are TRUE, there is no way to satisfy all of them using the additional variable:
But if any literal is TRUE, we have n-3 free variables and n-3 remaining 3-clauses, so we can satisfy each of them.
Since any SAT solution will also satisfy the 3-SAT instance and any 3-SAT solution sets variables giving a SAT solution - the problems are equivallent. If there were n clauses and m total literals in the SAT instance, this transform takes O(m) time, so SAT and 3-SAT.
Note that a slight modification to this construction would prove 4-SAT, or 5-SAT,... also NP-complete. However, it breaks down when we try to use it for 2-SAT, since there is no way to stuff anything into the chain of clauses. It turns out that resolution gives a polynomial time algorithm for 2-SAT.
Having at least 3-literals per clause is what makes the problem difficult. Now that we have shown 3-SAT is NP-complete, we may use it for further reductions. Since the set of 3-SAT instances is smaller and more regular than the SAT instances, it will be easier to use 3-SAT for future reductions. Remember the direction to reduction!