11: Dynamic Programming

Dynamic programming (DP) solves optimization problems by decomposing them into overlapping subproblems and storing intermediate results to avoid redundant computation. It is the standard approach for sequence alignment, shortest paths, and many combinatorial optimization problems that arise in ML pipelines and systems.

When to Use Dynamic Programming

A problem is amenable to DP when it has:

Optimal substructure. The optimal solution to the problem contains optimal solutions to subproblems. If the shortest path from A to C goes through B, then the sub-path from A to B must itself be shortest.
Overlapping subproblems. The same subproblems are solved multiple times in a naive recursive approach. Without memoization, the recursive Fibonacci algorithm computes $F(n-2)$ twice, $F(n-3)$ three times, etc., leading to exponential time.

DP trades space for time: store subproblem solutions in a table to avoid recomputation.

Implementation Strategies

Top-Down (Memoization)

Write the recursive solution, then cache results:

def fib(n, memo={}):
    if n <= 1: return n
    if n not in memo:
        memo[n] = fib(n-1, memo) + fib(n-2, memo)
    return memo[n]

Time: $O(n)$ , Space: $O(n)$ . The recursion tree is pruned: each subproblem is solved once.

Bottom-Up (Tabulation)

Fill a table iteratively from the base cases:

def fib(n):
    dp = [0] * (n + 1)
    dp[1] = 1
    for i in range(2, n + 1):
        dp[i] = dp[i-1] + dp[i-2]
    return dp[n]

Same time and space complexity, but avoids recursion overhead and stack depth limits. Bottom-up is generally preferred for production code.

Space optimization. When the recurrence only depends on a fixed number of previous entries, reduce space by keeping only those entries. Fibonacci needs only the last 2 values: $O(1)$ space.

Classic Problems

Longest Common Subsequence (LCS)

Given sequences $X = x_1 \ldots x_m$ and $Y = y_1 \ldots y_n$ , find the length of their longest common subsequence.

Recurrence:

\text{LCS}(i, j) = \begin{cases} 0 & \text{if } i = 0 \text{ or } j = 0 \\ \text{LCS}(i-1, j-1) + 1 & \text{if } x_i = y_j \\ \max(\text{LCS}(i-1, j), \text{LCS}(i, j-1)) & \text{otherwise} \end{cases}

Time: $O(mn)$ , Space: $O(mn)$ (reducible to $O(\min(m, n))$ ).

ML connection. LCS and edit distance are used in NLP for string similarity, text alignment, and evaluation metrics (ROUGE-L uses LCS for summarization evaluation).

Edit Distance (Levenshtein Distance)

Minimum number of insertions, deletions, and substitutions to transform string $X$ into string $Y$ .

d(i, j) = \begin{cases} i & \text{if } j = 0 \\ j & \text{if } i = 0 \\ d(i-1, j-1) & \text{if } x_i = y_j \\ 1 + \min(d(i-1,j), d(i,j-1), d(i-1,j-1)) & \text{otherwise} \end{cases}

Time: $O(mn)$ . Used in spell checking, DNA sequence alignment, and fuzzy string matching.

0/1 Knapsack

Given $n$ items with weights $w_i$ and values $v_i$ , and a capacity $W$ , maximize total value without exceeding capacity.

\text{OPT}(i, w) = \max(\text{OPT}(i-1, w), \; v_i + \text{OPT}(i-1, w - w_i))

Time: $O(nW)$ . This is pseudo-polynomial: polynomial in $n$ and $W$ but exponential in the number of bits needed to represent $W$ . The knapsack problem is NP-hard, so no truly polynomial algorithm exists (unless P = NP).

Matrix Chain Multiplication

Given matrices $A_1, \ldots, A_n$ with dimensions $p_0 \times p_1, \ldots, p_{n-1} \times p_n$ , find the parenthesization that minimizes the total number of scalar multiplications.

m(i, j) = \min_{i \leq k < j} \{m(i, k) + m(k+1, j) + p_{i-1} p_k p_j\}

Time: $O(n^3)$ , Space: $O(n^2)$ . The optimal substructure: the best way to multiply $A_i \cdots A_j$ must use an optimal split point $k$ and optimal parenthesizations of both halves.

Shortest Paths

Single-Source: Bellman-Ford

For a graph with $V$ vertices and $E$ edges (possibly negative weights):

d[v] = \min_{(u,v) \in E} \{d[u] + w(u,v)\}

Iterate $V - 1$ times over all edges. Time: $O(VE)$ . Detects negative cycles (if any $d[v]$ decreases on the $V$ -th iteration).

All-Pairs: Floyd-Warshall

d^{(k)}[i][j] = \min(d^{(k-1)}[i][j], \; d^{(k-1)}[i][k] + d^{(k-1)}[k][j])

Time: $O(V^3)$ , Space: $O(V^2)$ . Considers all intermediate vertices $k = 1, \ldots, V$ .

DP in Machine Learning

Algorithm	DP Component	Application
Viterbi	Most probable state sequence in HMMs	Speech recognition, POS tagging
CTC decoding	Best alignment between input and output	Speech-to-text
Beam search	Approximate sequence decoding	Text generation, machine translation
Needleman-Wunsch	Global sequence alignment	Bioinformatics, token alignment
Value iteration	Bellman equation for MDPs	Reinforcement learning

The Viterbi algorithm finds the most likely hidden state sequence in a Hidden Markov Model. It is a DP over the trellis of states and time steps:

\delta_t(j) = \max_i [\delta_{t-1}(i) \cdot a_{ij}] \cdot b_j(o_t)

where $\delta_t(j)$ is the probability of the most likely path ending in state $j$ at time $t$ , $a_{ij}$ is the transition probability, and $b_j(o_t)$ is the emission probability. Time: $O(T \cdot S^2)$ where $T$ is sequence length and $S$ is the number of states.

Value iteration in reinforcement learning applies the Bellman optimality equation:

V^*(s) = \max_a \left[R(s, a) + \gamma \sum_{s'} P(s' \mid s, a) V^*(s')\right]

This is DP over states: each iteration improves the value estimate by propagating information one step backward from successor states.

Complexity Analysis

Problem	Time	Space	Type
Fibonacci	$O(n)$	$O(1)$	1D, linear
LCS / Edit distance	$O(mn)$	$O(\min(m,n))$	2D, quadratic
Knapsack	$O(nW)$	$O(W)$	Pseudo-polynomial
Matrix chain	$O(n^3)$	$O(n^2)$	Interval DP
Floyd-Warshall	$O(V^3)$	$O(V^2)$	All-pairs
Viterbi	$O(TS^2)$	$O(TS)$	Trellis DP

Summary

Dynamic programming reduces exponential brute-force search to polynomial time by exploiting overlapping subproblems and optimal substructure. The key steps: define the state space, write the recurrence relation, determine the base cases, and choose top-down (memoization) or bottom-up (tabulation) implementation. In ML, DP appears in sequence decoding (Viterbi, CTC, beam search), reinforcement learning (value iteration), and evaluation metrics (ROUGE-L).