13: Linear Programming

Formulation

A linear program (LP) is an optimization problem in which both the objective function and constraints are linear. In standard form, an LP is:

$\text{maximize } \mathbf{c}^T \mathbf{x} \quad \text{subject to } A\mathbf{x} \leq \mathbf{b}, \quad \mathbf{x} \geq \mathbf{0}$

where $\mathbf{x} \in \mathbb{R}^n$ is the vector of decision variables, $\mathbf{c} \in \mathbb{R}^n$ is the objective coefficient vector, $A \in \mathbb{R}^{m \times n}$ is the constraint matrix, and $\mathbf{b} \in \mathbb{R}^m$ is the constraint bound vector. Every inequality constraint $\mathbf{a}_i^T \mathbf{x} \leq b_i$ defines a half-space in $\mathbb{R}^n$ .

Converting to slack form introduces a slack variable $s_i \geq 0$ for each inequality, yielding $A\mathbf{x} + \mathbf{s} = \mathbf{b}$ with $\mathbf{x}, \mathbf{s} \geq 0$ . Minimization problems are converted by negating the objective. Equality constraints $\mathbf{a}^T\mathbf{x} = b$ become two inequalities, and unrestricted variables are split into the difference of two nonneg variables. Thus standard form is fully general.

Geometric Interpretation

The feasible region $\{\mathbf{x} : A\mathbf{x} \leq \mathbf{b}, \mathbf{x} \geq 0\}$ is the intersection of finitely many half-spaces, forming a convex polytope. This polytope may be empty (infeasible), bounded, or unbounded.

The objective function $\mathbf{c}^T\mathbf{x}$ defines a family of parallel hyperplanes. Maximizing the objective geometrically corresponds to translating this hyperplane as far as possible in the direction $\mathbf{c}$ while maintaining contact with the polytope.

Fundamental Theorem of Linear Programming. If an LP has a finite optimal value, then there exists an optimal solution at a vertex (extreme point) of the feasible polytope.

This theorem is what makes LP tractable: instead of searching over a continuous region, we need only examine vertices. A polytope in $\mathbb{R}^n$ defined by $m$ constraints has at most $\binom{m+n}{n}$ vertices, which is finite (though potentially exponential).

The Simplex Method

The simplex method, introduced by Dantzig (1947), exploits the fundamental theorem by walking along edges of the polytope from vertex to vertex, always improving the objective.

A basic feasible solution (BFS) corresponds to a vertex of the polytope. It is determined by selecting $n$ linearly independent constraints to hold with equality (the basis), solving the resulting system, and verifying nonnegativity. At each step, the simplex method:

Examines the reduced costs of non-basic variables: $\bar{c}_j = c_j - \mathbf{c}_B^T B^{-1} \mathbf{a}_j$ , where $B$ is the basis matrix.
If all $\bar{c}_j \leq 0$ , the current BFS is optimal (for maximization).
Otherwise, selects an entering variable $x_j$ with $\bar{c}_j > 0$ (e.g., the largest reduced cost).
Performs a pivot: determines which basic variable leaves the basis using the minimum ratio test to maintain feasibility, then updates the basis.

Each pivot moves to an adjacent vertex with a strictly higher objective value (assuming nondegeneracy). Since the number of vertices is finite, the simplex method terminates. Degeneracy (multiple bases corresponding to the same vertex) can cause cycling, which is resolved by anti-cycling rules such as Bland’s rule.

Complexity. The simplex method visits at most $\binom{m+n}{n}$ vertices in the worst case, which is exponential. The Klee-Minty cube construction (1972) shows that certain pivot rules require exponentially many steps. However, in practice the simplex method is remarkably efficient, typically requiring $O(m)$ to $O(3m)$ pivots. Smoothed analysis by Spielman and Teng (2004) provides theoretical justification: under slight random perturbation of the input, the expected number of pivots is polynomial.

Polynomial-Time Algorithms

The ellipsoid method (Khachiyan, 1979) was the first polynomial-time algorithm for LP, running in $O(n^4 L)$ time where $L$ is the input bit length. It maintains an ellipsoid guaranteed to contain an optimal vertex and iteratively shrinks it.

Interior-point methods (Karmarkar, 1984) traverse the interior of the feasible polytope rather than its boundary. They achieve $O(n^{3.5} L)$ time and are competitive with (often faster than) simplex in practice for large-scale problems. Modern LP solvers use a combination of simplex and interior-point methods.

LP Duality

Every LP (the primal) has an associated dual LP. If the primal is:

$\text{maximize } \mathbf{c}^T \mathbf{x} \quad \text{subject to } A\mathbf{x} \leq \mathbf{b}, \quad \mathbf{x} \geq \mathbf{0}$

then the dual is:

$\text{minimize } \mathbf{b}^T \mathbf{y} \quad \text{subject to } A^T \mathbf{y} \geq \mathbf{c}, \quad \mathbf{y} \geq \mathbf{0}$

The dual has one variable $y_i$ for each primal constraint and one constraint for each primal variable. The dual of the dual is the primal.

Weak Duality Theorem. For any primal feasible $\mathbf{x}$ and dual feasible $\mathbf{y}$ : $\mathbf{c}^T \mathbf{x} \leq \mathbf{b}^T \mathbf{y}$ .

This follows immediately: $\mathbf{c}^T \mathbf{x} \leq (A^T \mathbf{y})^T \mathbf{x} = \mathbf{y}^T A \mathbf{x} \leq \mathbf{y}^T \mathbf{b}$ . Weak duality implies that any dual feasible solution provides an upper bound on the primal optimum.

Strong Duality Theorem. If the primal has an optimal solution $\mathbf{x}^*$ , then the dual also has an optimal solution $\mathbf{y}^*$ , and $\mathbf{c}^T \mathbf{x}^* = \mathbf{b}^T \mathbf{y}^*$ .

Strong duality is proved constructively: the simplex method simultaneously produces primal and dual optimal solutions. The complementary slackness conditions provide the bridge: at optimality, for each $i$ , either the $i$ -th primal constraint is tight or $y_i^* = 0$ , and for each $j$ , either the $j$ -th dual constraint is tight or $x_j^* = 0$ .

Applications

Linear programming is extraordinarily versatile. Network flow problems (shortest paths, maximum flows, minimum-cost flows) are all special cases of LP. The LP relaxation of integer programs provides bounds used in branch-and-bound algorithms. Scheduling, resource allocation, transportation, and diet problems all admit LP formulations.

Maximum flow as LP. The max-flow problem can be written: maximize $\sum_{v:(s,v)\in E} f_{sv}$ subject to $0 \leq f_{uv} \leq c_{uv}$ for all $(u,v) \in E$ and flow conservation at each internal vertex. This is a linear program, and its dual corresponds to the minimum cut problem — a concrete instance of LP duality matching max-flow min-cut.

Connection to Machine Learning

LP relaxations. Many combinatorial optimization problems arising in ML (e.g., MAP inference in graphical models) are formulated as integer linear programs. Solving the LP relaxation — dropping the integrality constraints — provides a lower bound and often a good approximate solution. Rounding the fractional LP solution to integers is a standard technique in approximation algorithms.

Support vector machines. The hard-margin SVM seeks the maximum-margin hyperplane separating two classes. The optimization problem is:

$\text{minimize } \frac{1}{2}\|\mathbf{w}\|^2 \quad \text{subject to } y_i(\mathbf{w}^T \mathbf{x}_i + b) \geq 1 \quad \forall i$

This is a quadratic program (QP), not an LP, since the objective is quadratic. However, its dual is also a QP, and the KKT conditions mirror LP complementary slackness: only support vectors (points on the margin boundary) have nonzero dual variables. The soft-margin SVM introduces slack variables $\xi_i$ and a penalty $C\sum_i \xi_i$ , adding linear terms. The entire SVM framework is deeply rooted in the duality theory that originates with linear programming.

Next: 14: NP-Completeness