14: NP-Completeness

Complexity Classes

The theory of NP-completeness provides a framework for understanding why certain computational problems appear to resist efficient solution. We begin with the key complexity classes.

P is the class of decision problems solvable by a deterministic Turing machine in polynomial time O(nk)O(n^k) for some constant kk. Problems in P are considered efficiently solvable: sorting, shortest paths, maximum matching, linear programming, and primality testing all belong to P.

NP (nondeterministic polynomial time) is the class of decision problems for which a “yes” answer can be verified in polynomial time given a suitable certificate. Formally, a language LNPL \in \text{NP} if there exists a polynomial-time verifier VV and a polynomial pp such that:

xL     certificate c with cp(x) such that V(x,c)=1x \in L \iff \exists \text{ certificate } c \text{ with } |c| \leq p(|x|) \text{ such that } V(x, c) = 1

Every problem in P is also in NP (the verifier can simply solve the problem and ignore the certificate). The central open question in theoretical computer science is whether P=NP\text{P} = \text{NP} or PNP\text{P} \neq \text{NP}.

coNP is the class of problems whose complement is in NP — equivalently, problems where “no” answers have short certificates. It is widely believed that NPcoNP\text{NP} \neq \text{coNP}, which would imply PNP\text{P} \neq \text{NP}.

Polynomial-Time Reductions

A polynomial-time reduction from problem AA to problem BB, written APBA \leq_P B, is a polynomial-time computable function ff such that xA    f(x)Bx \in A \iff f(x) \in B. The key property is transitivity: if APBA \leq_P B and BPCB \leq_P C, then APCA \leq_P C.

Reductions establish relative difficulty: if APBA \leq_P B and BPB \in \text{P}, then APA \in \text{P}. Contrapositively, if APA \notin \text{P} and APBA \leq_P B, then BPB \notin \text{P}. This is the mechanism by which we propagate hardness.

NP-Completeness

A problem BB is NP-hard if every problem in NP reduces to BB: ANP,APB\forall A \in \text{NP}, A \leq_P B. A problem is NP-complete if it is both in NP and NP-hard. NP-complete problems are the “hardest” problems in NP: if any one of them has a polynomial-time algorithm, then P=NP\text{P} = \text{NP}.

To prove a new problem BB is NP-complete:

  1. Show BNPB \in \text{NP} by exhibiting a polynomial-time verifier.
  2. Show BB is NP-hard by reducing a known NP-complete problem AA to BB: APBA \leq_P B.

Since reductions are transitive, step 2 inherits the NP-hardness of AA. But this bootstrapping requires a first NP-complete problem, established by the Cook-Levin theorem.

The Cook-Levin Theorem

Boolean satisfiability (SAT): given a Boolean formula ϕ\phi over variables x1,,xnx_1, \ldots, x_n, is there an assignment making ϕ\phi true?

Theorem (Cook 1971, Levin 1973). SAT is NP-complete.

Proof sketch. SAT is clearly in NP: given an assignment, evaluate ϕ\phi in polynomial time. For NP-hardness, take any LNPL \in \text{NP} with verifier VV. The computation of V(x,c)V(x,c) on input xx with certificate cc can be encoded as a Boolean circuit, then converted to a CNF formula ϕx\phi_x. The formula ϕx\phi_x is satisfiable if and only if there exists a certificate cc such that V(x,c)=1V(x,c) = 1. The construction is polynomial in the input size. Since LL was arbitrary, SAT is NP-hard.

Key NP-Complete Problems

The following cascade of reductions establishes NP-completeness for a broad family of problems. Each reduction is polynomial-time and demonstrates the versatility of the framework.

3-SAT. A Boolean formula in CNF where every clause has exactly 3 literals. SAT P\leq_P 3-SAT by splitting long clauses using auxiliary variables. 3-SAT is the standard starting point for most NP-completeness proofs because its rigid structure makes reductions easier to construct.

Clique. Given a graph GG and integer kk, does GG contain a complete subgraph on kk vertices? 3-SAT P\leq_P Clique: for a formula with mm clauses, create a vertex for each literal in each clause and connect two vertices if they are in different clauses and are not contradictory (one is not the negation of the other). A satisfying assignment yields a clique of size mm.

Vertex Cover. Given a graph GG and integer kk, is there a set of kk vertices touching every edge? Clique P\leq_P Vertex Cover via complementation: GG has a clique of size kk if and only if Gˉ\bar{G} has a vertex cover of size Vk|V| - k.

Independent Set. A set SS of vertices with no edges between them. SS is independent in GG if and only if VSV \setminus S is a vertex cover, so Independent Set and Vertex Cover are equivalent under simple reductions.

Hamiltonian Path/Cycle. Does a graph contain a path (cycle) visiting every vertex exactly once? 3-SAT P\leq_P Hamiltonian Path through a gadget construction encoding variable assignments and clause satisfaction.

Subset Sum. Given integers a1,,ana_1, \ldots, a_n and target tt, is there a subset summing to tt? 3-SAT P\leq_P Subset Sum by encoding variables and clauses as digits of carefully constructed numbers. Subset Sum is notable as a weakly NP-complete problem: it admits a pseudo-polynomial time dynamic programming algorithm running in O(nt)O(nt).

0/1 Knapsack. Given items with weights and values, a capacity WW, and a target value VV, can we select items with total weight W\leq W and total value V\geq V? This generalizes Subset Sum and is also weakly NP-complete.

The Landscape of NP-Completeness

Ladner’s theorem (1975) shows that if PNP\text{P} \neq \text{NP}, there exist problems in NP that are neither in P nor NP-complete — so-called NP-intermediate problems. Graph isomorphism and integer factoring are widely conjectured to be NP-intermediate.

The class NP-hard includes problems not necessarily in NP. The halting problem, for instance, is NP-hard (indeed undecidable) but not in NP. Optimization versions of NP-complete decision problems (e.g., finding a minimum vertex cover rather than deciding if one of size kk exists) are NP-hard but may not be in NP if the optimal value requires more than polynomial bits to represent.

Implications for Algorithm Design

NP-completeness does not mean a problem is unsolvable — it means we should not expect a polynomial-time exact algorithm. Practical responses include:

  • Approximation algorithms: find near-optimal solutions with provable guarantees.
  • Parameterized algorithms: solve problems in time f(k)nO(1)f(k) \cdot n^{O(1)} where kk is a problem parameter (fixed-parameter tractability).
  • Heuristics and metaheuristics: SAT solvers, genetic algorithms, and simulated annealing often work well in practice despite worst-case intractability.
  • Special cases: many NP-complete problems become polynomial on restricted inputs (e.g., 2-SAT is in P, graph coloring on interval graphs is in P).

Connection to Machine Learning

Feature selection. Selecting the optimal subset of kk features from nn candidates to minimize prediction error is NP-hard (it reduces from Subset Sum or Set Cover, depending on the formulation). Practical methods use greedy forward/backward selection or 1\ell_1 regularization (Lasso) as a convex relaxation.

Neural network training. The problem of finding weights for a neural network that minimize training error is NP-hard in general. Blum and Rivest (1988) showed that training even a simple 3-node network is NP-complete. This hardness result underscores why gradient-based methods (which find local, not global, minima) are the practical approach — and why architectural choices, initialization schemes, and optimization landscapes matter so much in deep learning.

Constraint satisfaction. Many structured prediction tasks in ML (e.g., MAP inference in Markov random fields) are NP-hard. The theory of NP-completeness motivates the use of LP relaxations, message passing, and variational methods as tractable approximations.


Next: 15: Approximation Algorithms