13: Linear Programming

Formulation

A linear program (LP) is an optimization problem in which both the objective function and constraints are linear. In standard form, an LP is:

maximize cTxsubject to Axb,x0\text{maximize } \mathbf{c}^T \mathbf{x} \quad \text{subject to } A\mathbf{x} \leq \mathbf{b}, \quad \mathbf{x} \geq \mathbf{0}

where xRn\mathbf{x} \in \mathbb{R}^n is the vector of decision variables, cRn\mathbf{c} \in \mathbb{R}^n is the objective coefficient vector, ARm×nA \in \mathbb{R}^{m \times n} is the constraint matrix, and bRm\mathbf{b} \in \mathbb{R}^m is the constraint bound vector. Every inequality constraint aiTxbi\mathbf{a}_i^T \mathbf{x} \leq b_i defines a half-space in Rn\mathbb{R}^n.

Converting to slack form introduces a slack variable si0s_i \geq 0 for each inequality, yielding Ax+s=bA\mathbf{x} + \mathbf{s} = \mathbf{b} with x,s0\mathbf{x}, \mathbf{s} \geq 0. Minimization problems are converted by negating the objective. Equality constraints aTx=b\mathbf{a}^T\mathbf{x} = b become two inequalities, and unrestricted variables are split into the difference of two nonneg variables. Thus standard form is fully general.

Geometric Interpretation

The feasible region {x:Axb,x0}\{\mathbf{x} : A\mathbf{x} \leq \mathbf{b}, \mathbf{x} \geq 0\} is the intersection of finitely many half-spaces, forming a convex polytope. This polytope may be empty (infeasible), bounded, or unbounded.

The objective function cTx\mathbf{c}^T\mathbf{x} defines a family of parallel hyperplanes. Maximizing the objective geometrically corresponds to translating this hyperplane as far as possible in the direction c\mathbf{c} while maintaining contact with the polytope.

Fundamental Theorem of Linear Programming. If an LP has a finite optimal value, then there exists an optimal solution at a vertex (extreme point) of the feasible polytope.

This theorem is what makes LP tractable: instead of searching over a continuous region, we need only examine vertices. A polytope in Rn\mathbb{R}^n defined by mm constraints has at most (m+nn)\binom{m+n}{n} vertices, which is finite (though potentially exponential).

The Simplex Method

The simplex method, introduced by Dantzig (1947), exploits the fundamental theorem by walking along edges of the polytope from vertex to vertex, always improving the objective.

A basic feasible solution (BFS) corresponds to a vertex of the polytope. It is determined by selecting nn linearly independent constraints to hold with equality (the basis), solving the resulting system, and verifying nonnegativity. At each step, the simplex method:

  1. Examines the reduced costs of non-basic variables: cˉj=cjcBTB1aj\bar{c}_j = c_j - \mathbf{c}_B^T B^{-1} \mathbf{a}_j, where BB is the basis matrix.
  2. If all cˉj0\bar{c}_j \leq 0, the current BFS is optimal (for maximization).
  3. Otherwise, selects an entering variable xjx_j with cˉj>0\bar{c}_j > 0 (e.g., the largest reduced cost).
  4. Performs a pivot: determines which basic variable leaves the basis using the minimum ratio test to maintain feasibility, then updates the basis.

Each pivot moves to an adjacent vertex with a strictly higher objective value (assuming nondegeneracy). Since the number of vertices is finite, the simplex method terminates. Degeneracy (multiple bases corresponding to the same vertex) can cause cycling, which is resolved by anti-cycling rules such as Bland’s rule.

Complexity. The simplex method visits at most (m+nn)\binom{m+n}{n} vertices in the worst case, which is exponential. The Klee-Minty cube construction (1972) shows that certain pivot rules require exponentially many steps. However, in practice the simplex method is remarkably efficient, typically requiring O(m)O(m) to O(3m)O(3m) pivots. Smoothed analysis by Spielman and Teng (2004) provides theoretical justification: under slight random perturbation of the input, the expected number of pivots is polynomial.

Polynomial-Time Algorithms

The ellipsoid method (Khachiyan, 1979) was the first polynomial-time algorithm for LP, running in O(n4L)O(n^4 L) time where LL is the input bit length. It maintains an ellipsoid guaranteed to contain an optimal vertex and iteratively shrinks it.

Interior-point methods (Karmarkar, 1984) traverse the interior of the feasible polytope rather than its boundary. They achieve O(n3.5L)O(n^{3.5} L) time and are competitive with (often faster than) simplex in practice for large-scale problems. Modern LP solvers use a combination of simplex and interior-point methods.

LP Duality

Every LP (the primal) has an associated dual LP. If the primal is:

maximize cTxsubject to Axb,x0\text{maximize } \mathbf{c}^T \mathbf{x} \quad \text{subject to } A\mathbf{x} \leq \mathbf{b}, \quad \mathbf{x} \geq \mathbf{0}

then the dual is:

minimize bTysubject to ATyc,y0\text{minimize } \mathbf{b}^T \mathbf{y} \quad \text{subject to } A^T \mathbf{y} \geq \mathbf{c}, \quad \mathbf{y} \geq \mathbf{0}

The dual has one variable yiy_i for each primal constraint and one constraint for each primal variable. The dual of the dual is the primal.

Weak Duality Theorem. For any primal feasible x\mathbf{x} and dual feasible y\mathbf{y}: cTxbTy\mathbf{c}^T \mathbf{x} \leq \mathbf{b}^T \mathbf{y}.

This follows immediately: cTx(ATy)Tx=yTAxyTb\mathbf{c}^T \mathbf{x} \leq (A^T \mathbf{y})^T \mathbf{x} = \mathbf{y}^T A \mathbf{x} \leq \mathbf{y}^T \mathbf{b}. Weak duality implies that any dual feasible solution provides an upper bound on the primal optimum.

Strong Duality Theorem. If the primal has an optimal solution x\mathbf{x}^*, then the dual also has an optimal solution y\mathbf{y}^*, and cTx=bTy\mathbf{c}^T \mathbf{x}^* = \mathbf{b}^T \mathbf{y}^*.

Strong duality is proved constructively: the simplex method simultaneously produces primal and dual optimal solutions. The complementary slackness conditions provide the bridge: at optimality, for each ii, either the ii-th primal constraint is tight or yi=0y_i^* = 0, and for each jj, either the jj-th dual constraint is tight or xj=0x_j^* = 0.

Applications

Linear programming is extraordinarily versatile. Network flow problems (shortest paths, maximum flows, minimum-cost flows) are all special cases of LP. The LP relaxation of integer programs provides bounds used in branch-and-bound algorithms. Scheduling, resource allocation, transportation, and diet problems all admit LP formulations.

Maximum flow as LP. The max-flow problem can be written: maximize v:(s,v)Efsv\sum_{v:(s,v)\in E} f_{sv} subject to 0fuvcuv0 \leq f_{uv} \leq c_{uv} for all (u,v)E(u,v) \in E and flow conservation at each internal vertex. This is a linear program, and its dual corresponds to the minimum cut problem — a concrete instance of LP duality matching max-flow min-cut.

Connection to Machine Learning

LP relaxations. Many combinatorial optimization problems arising in ML (e.g., MAP inference in graphical models) are formulated as integer linear programs. Solving the LP relaxation — dropping the integrality constraints — provides a lower bound and often a good approximate solution. Rounding the fractional LP solution to integers is a standard technique in approximation algorithms.

Support vector machines. The hard-margin SVM seeks the maximum-margin hyperplane separating two classes. The optimization problem is:

minimize 12w2subject to yi(wTxi+b)1i\text{minimize } \frac{1}{2}\|\mathbf{w}\|^2 \quad \text{subject to } y_i(\mathbf{w}^T \mathbf{x}_i + b) \geq 1 \quad \forall i

This is a quadratic program (QP), not an LP, since the objective is quadratic. However, its dual is also a QP, and the KKT conditions mirror LP complementary slackness: only support vectors (points on the margin boundary) have nonzero dual variables. The soft-margin SVM introduces slack variables ξi\xi_i and a penalty CiξiC\sum_i \xi_i, adding linear terms. The entire SVM framework is deeply rooted in the duality theory that originates with linear programming.


Next: 14: NP-Completeness