14: NP-Completeness
Complexity Classes
The theory of NP-completeness provides a framework for understanding why certain computational problems appear to resist efficient solution. We begin with the key complexity classes.
P is the class of decision problems solvable by a deterministic Turing machine in polynomial time for some constant . Problems in P are considered efficiently solvable: sorting, shortest paths, maximum matching, linear programming, and primality testing all belong to P.
NP (nondeterministic polynomial time) is the class of decision problems for which a “yes” answer can be verified in polynomial time given a suitable certificate. Formally, a language if there exists a polynomial-time verifier and a polynomial such that:
Every problem in P is also in NP (the verifier can simply solve the problem and ignore the certificate). The central open question in theoretical computer science is whether or .
coNP is the class of problems whose complement is in NP — equivalently, problems where “no” answers have short certificates. It is widely believed that , which would imply .
Polynomial-Time Reductions
A polynomial-time reduction from problem to problem , written , is a polynomial-time computable function such that . The key property is transitivity: if and , then .
Reductions establish relative difficulty: if and , then . Contrapositively, if and , then . This is the mechanism by which we propagate hardness.
NP-Completeness
A problem is NP-hard if every problem in NP reduces to : . A problem is NP-complete if it is both in NP and NP-hard. NP-complete problems are the “hardest” problems in NP: if any one of them has a polynomial-time algorithm, then .
To prove a new problem is NP-complete:
- Show by exhibiting a polynomial-time verifier.
- Show is NP-hard by reducing a known NP-complete problem to : .
Since reductions are transitive, step 2 inherits the NP-hardness of . But this bootstrapping requires a first NP-complete problem, established by the Cook-Levin theorem.
The Cook-Levin Theorem
Boolean satisfiability (SAT): given a Boolean formula over variables , is there an assignment making true?
Theorem (Cook 1971, Levin 1973). SAT is NP-complete.
Proof sketch. SAT is clearly in NP: given an assignment, evaluate in polynomial time. For NP-hardness, take any with verifier . The computation of on input with certificate can be encoded as a Boolean circuit, then converted to a CNF formula . The formula is satisfiable if and only if there exists a certificate such that . The construction is polynomial in the input size. Since was arbitrary, SAT is NP-hard.
Key NP-Complete Problems
The following cascade of reductions establishes NP-completeness for a broad family of problems. Each reduction is polynomial-time and demonstrates the versatility of the framework.
3-SAT. A Boolean formula in CNF where every clause has exactly 3 literals. SAT 3-SAT by splitting long clauses using auxiliary variables. 3-SAT is the standard starting point for most NP-completeness proofs because its rigid structure makes reductions easier to construct.
Clique. Given a graph and integer , does contain a complete subgraph on vertices? 3-SAT Clique: for a formula with clauses, create a vertex for each literal in each clause and connect two vertices if they are in different clauses and are not contradictory (one is not the negation of the other). A satisfying assignment yields a clique of size .
Vertex Cover. Given a graph and integer , is there a set of vertices touching every edge? Clique Vertex Cover via complementation: has a clique of size if and only if has a vertex cover of size .
Independent Set. A set of vertices with no edges between them. is independent in if and only if is a vertex cover, so Independent Set and Vertex Cover are equivalent under simple reductions.
Hamiltonian Path/Cycle. Does a graph contain a path (cycle) visiting every vertex exactly once? 3-SAT Hamiltonian Path through a gadget construction encoding variable assignments and clause satisfaction.
Subset Sum. Given integers and target , is there a subset summing to ? 3-SAT Subset Sum by encoding variables and clauses as digits of carefully constructed numbers. Subset Sum is notable as a weakly NP-complete problem: it admits a pseudo-polynomial time dynamic programming algorithm running in .
0/1 Knapsack. Given items with weights and values, a capacity , and a target value , can we select items with total weight and total value ? This generalizes Subset Sum and is also weakly NP-complete.
The Landscape of NP-Completeness
Ladner’s theorem (1975) shows that if , there exist problems in NP that are neither in P nor NP-complete — so-called NP-intermediate problems. Graph isomorphism and integer factoring are widely conjectured to be NP-intermediate.
The class NP-hard includes problems not necessarily in NP. The halting problem, for instance, is NP-hard (indeed undecidable) but not in NP. Optimization versions of NP-complete decision problems (e.g., finding a minimum vertex cover rather than deciding if one of size exists) are NP-hard but may not be in NP if the optimal value requires more than polynomial bits to represent.
Implications for Algorithm Design
NP-completeness does not mean a problem is unsolvable — it means we should not expect a polynomial-time exact algorithm. Practical responses include:
- Approximation algorithms: find near-optimal solutions with provable guarantees.
- Parameterized algorithms: solve problems in time where is a problem parameter (fixed-parameter tractability).
- Heuristics and metaheuristics: SAT solvers, genetic algorithms, and simulated annealing often work well in practice despite worst-case intractability.
- Special cases: many NP-complete problems become polynomial on restricted inputs (e.g., 2-SAT is in P, graph coloring on interval graphs is in P).
Connection to Machine Learning
Feature selection. Selecting the optimal subset of features from candidates to minimize prediction error is NP-hard (it reduces from Subset Sum or Set Cover, depending on the formulation). Practical methods use greedy forward/backward selection or regularization (Lasso) as a convex relaxation.
Neural network training. The problem of finding weights for a neural network that minimize training error is NP-hard in general. Blum and Rivest (1988) showed that training even a simple 3-node network is NP-complete. This hardness result underscores why gradient-based methods (which find local, not global, minima) are the practical approach — and why architectural choices, initialization schemes, and optimization landscapes matter so much in deep learning.
Constraint satisfaction. Many structured prediction tasks in ML (e.g., MAP inference in Markov random fields) are NP-hard. The theory of NP-completeness motivates the use of LP relaxations, message passing, and variational methods as tractable approximations.