14: NP-Completeness

Complexity Classes

The theory of NP-completeness provides a framework for understanding why certain computational problems appear to resist efficient solution. We begin with the key complexity classes.

P is the class of decision problems solvable by a deterministic Turing machine in polynomial time $O(n^k)$ for some constant $k$ . Problems in P are considered efficiently solvable: sorting, shortest paths, maximum matching, linear programming, and primality testing all belong to P.

NP (nondeterministic polynomial time) is the class of decision problems for which a “yes” answer can be verified in polynomial time given a suitable certificate. Formally, a language $L \in \text{NP}$ if there exists a polynomial-time verifier $V$ and a polynomial $p$ such that:

$x \in L \iff \exists \text{ certificate } c \text{ with } |c| \leq p(|x|) \text{ such that } V(x, c) = 1$

Every problem in P is also in NP (the verifier can simply solve the problem and ignore the certificate). The central open question in theoretical computer science is whether $\text{P} = \text{NP}$ or $\text{P} \neq \text{NP}$ .

coNP is the class of problems whose complement is in NP — equivalently, problems where “no” answers have short certificates. It is widely believed that $\text{NP} \neq \text{coNP}$ , which would imply $\text{P} \neq \text{NP}$ .

Polynomial-Time Reductions

A polynomial-time reduction from problem $A$ to problem $B$ , written $A \leq_P B$ , is a polynomial-time computable function $f$ such that $x \in A \iff f(x) \in B$ . The key property is transitivity: if $A \leq_P B$ and $B \leq_P C$ , then $A \leq_P C$ .

Reductions establish relative difficulty: if $A \leq_P B$ and $B \in \text{P}$ , then $A \in \text{P}$ . Contrapositively, if $A \notin \text{P}$ and $A \leq_P B$ , then $B \notin \text{P}$ . This is the mechanism by which we propagate hardness.

NP-Completeness

A problem $B$ is NP-hard if every problem in NP reduces to $B$ : $\forall A \in \text{NP}, A \leq_P B$ . A problem is NP-complete if it is both in NP and NP-hard. NP-complete problems are the “hardest” problems in NP: if any one of them has a polynomial-time algorithm, then $\text{P} = \text{NP}$ .

To prove a new problem $B$ is NP-complete:

Show $B \in \text{NP}$ by exhibiting a polynomial-time verifier.
Show $B$ is NP-hard by reducing a known NP-complete problem $A$ to $B$ : $A \leq_P B$ .

Since reductions are transitive, step 2 inherits the NP-hardness of $A$ . But this bootstrapping requires a first NP-complete problem, established by the Cook-Levin theorem.

The Cook-Levin Theorem

Boolean satisfiability (SAT): given a Boolean formula $\phi$ over variables $x_1, \ldots, x_n$ , is there an assignment making $\phi$ true?

Theorem (Cook 1971, Levin 1973). SAT is NP-complete.

Proof sketch. SAT is clearly in NP: given an assignment, evaluate $\phi$ in polynomial time. For NP-hardness, take any $L \in \text{NP}$ with verifier $V$ . The computation of $V(x,c)$ on input $x$ with certificate $c$ can be encoded as a Boolean circuit, then converted to a CNF formula $\phi_x$ . The formula $\phi_x$ is satisfiable if and only if there exists a certificate $c$ such that $V(x,c) = 1$ . The construction is polynomial in the input size. Since $L$ was arbitrary, SAT is NP-hard.

Key NP-Complete Problems

The following cascade of reductions establishes NP-completeness for a broad family of problems. Each reduction is polynomial-time and demonstrates the versatility of the framework.

3-SAT. A Boolean formula in CNF where every clause has exactly 3 literals. SAT $\leq_P$ 3-SAT by splitting long clauses using auxiliary variables. 3-SAT is the standard starting point for most NP-completeness proofs because its rigid structure makes reductions easier to construct.

Clique. Given a graph $G$ and integer $k$ , does $G$ contain a complete subgraph on $k$ vertices? 3-SAT $\leq_P$ Clique: for a formula with $m$ clauses, create a vertex for each literal in each clause and connect two vertices if they are in different clauses and are not contradictory (one is not the negation of the other). A satisfying assignment yields a clique of size $m$ .

Vertex Cover. Given a graph $G$ and integer $k$ , is there a set of $k$ vertices touching every edge? Clique $\leq_P$ Vertex Cover via complementation: $G$ has a clique of size $k$ if and only if $\bar{G}$ has a vertex cover of size $|V| - k$ .

Independent Set. A set $S$ of vertices with no edges between them. $S$ is independent in $G$ if and only if $V \setminus S$ is a vertex cover, so Independent Set and Vertex Cover are equivalent under simple reductions.

Hamiltonian Path/Cycle. Does a graph contain a path (cycle) visiting every vertex exactly once? 3-SAT $\leq_P$ Hamiltonian Path through a gadget construction encoding variable assignments and clause satisfaction.

Subset Sum. Given integers $a_1, \ldots, a_n$ and target $t$ , is there a subset summing to $t$ ? 3-SAT $\leq_P$ Subset Sum by encoding variables and clauses as digits of carefully constructed numbers. Subset Sum is notable as a weakly NP-complete problem: it admits a pseudo-polynomial time dynamic programming algorithm running in $O(nt)$ .

0/1 Knapsack. Given items with weights and values, a capacity $W$ , and a target value $V$ , can we select items with total weight $\leq W$ and total value $\geq V$ ? This generalizes Subset Sum and is also weakly NP-complete.

The Landscape of NP-Completeness

Ladner’s theorem (1975) shows that if $\text{P} \neq \text{NP}$ , there exist problems in NP that are neither in P nor NP-complete — so-called NP-intermediate problems. Graph isomorphism and integer factoring are widely conjectured to be NP-intermediate.

The class NP-hard includes problems not necessarily in NP. The halting problem, for instance, is NP-hard (indeed undecidable) but not in NP. Optimization versions of NP-complete decision problems (e.g., finding a minimum vertex cover rather than deciding if one of size $k$ exists) are NP-hard but may not be in NP if the optimal value requires more than polynomial bits to represent.

Implications for Algorithm Design

NP-completeness does not mean a problem is unsolvable — it means we should not expect a polynomial-time exact algorithm. Practical responses include:

Approximation algorithms: find near-optimal solutions with provable guarantees.
Parameterized algorithms: solve problems in time $f(k) \cdot n^{O(1)}$ where $k$ is a problem parameter (fixed-parameter tractability).
Heuristics and metaheuristics: SAT solvers, genetic algorithms, and simulated annealing often work well in practice despite worst-case intractability.
Special cases: many NP-complete problems become polynomial on restricted inputs (e.g., 2-SAT is in P, graph coloring on interval graphs is in P).

Connection to Machine Learning

Feature selection. Selecting the optimal subset of $k$ features from $n$ candidates to minimize prediction error is NP-hard (it reduces from Subset Sum or Set Cover, depending on the formulation). Practical methods use greedy forward/backward selection or $\ell_1$ regularization (Lasso) as a convex relaxation.

Neural network training. The problem of finding weights for a neural network that minimize training error is NP-hard in general. Blum and Rivest (1988) showed that training even a simple 3-node network is NP-complete. This hardness result underscores why gradient-based methods (which find local, not global, minima) are the practical approach — and why architectural choices, initialization schemes, and optimization landscapes matter so much in deep learning.

Constraint satisfaction. Many structured prediction tasks in ML (e.g., MAP inference in Markov random fields) are NP-hard. The theory of NP-completeness motivates the use of LP relaxations, message passing, and variational methods as tractable approximations.

Next: 15: Approximation Algorithms