16: Randomized Algorithms
Introduction
A randomized algorithm uses random bits as part of its logic, making decisions based on coin flips rather than purely deterministic rules. Randomization often yields algorithms that are simpler, faster, or more elegant than their deterministic counterparts. The analysis of randomized algorithms relies on probability theory: we reason about expected running times, success probabilities, and concentration inequalities.
Las Vegas vs. Monte Carlo
Randomized algorithms fall into two categories:
Las Vegas algorithms always produce a correct answer; only the running time is random. The guarantee is: for every input, the expected running time is bounded, and the output is always correct. Randomized quicksort is the canonical example.
Monte Carlo algorithms always run in bounded time but may produce an incorrect answer with some probability. The guarantee is: for every input, the algorithm terminates in polynomial time and is correct with probability at least for some parameter . Karger’s minimum cut algorithm is an example. Repeated independent runs reduce the error probability exponentially: runs reduce failure probability to .
Randomized Quicksort
Deterministic quicksort with a fixed pivot rule (e.g., always choosing the first element) runs in on adversarial inputs. Randomized quicksort selects a pivot uniformly at random from the current subarray.
Theorem. The expected number of comparisons made by randomized quicksort is .
Proof. Let be the elements in sorted order. Define the indicator if and are ever compared. Two elements are compared if and only if one of them is chosen as a pivot before any element between them in sorted order. Among the elements , the probability that or is chosen first is . Therefore:
The expected time is regardless of the input distribution. This is a Las Vegas algorithm: the output is always a correctly sorted array.
Karger’s Minimum Cut Algorithm
Given an undirected multigraph with vertices, Karger’s algorithm finds a global minimum cut using random edge contraction:
- While :
- Select an edge uniformly at random.
- Contract and : merge them into a single vertex, preserving all other edges (creating a multigraph; remove self-loops).
- Return the edges between the two remaining super-vertices as the cut.
Theorem. The probability that Karger’s algorithm outputs a specific minimum cut is at least .
Proof. Let be a minimum cut of size . Every vertex has degree at least (otherwise its edges would form a smaller cut), so . At the first step, the probability of contracting an edge in is at most . Conditioning on survival, after contractions the graph has vertices, and the probability of not contracting a cut edge at step is at least . The overall survival probability is:
Running the algorithm times and returning the smallest cut found gives a correct answer with probability at least . The total running time is . The Karger-Stein improvement uses recursive contraction to achieve .
Hashing
Universal Hashing
A family of hash functions from universe to is universal if for any two distinct keys :
A classic construction for with prime: , choosing and uniformly at random.
Theorem. With a universal hash family, the expected number of collisions for any key in a hash table with keys and slots is at most . Choosing gives expected lookup time.
Universal hashing eliminates the need for assumptions about the input distribution. No adversary can construct a bad input because the hash function is chosen randomly.
Bloom Filters
A Bloom filter is a space-efficient probabilistic data structure for approximate set membership. It uses a bit array of bits and independent hash functions . To insert element , set for all . To query , check if all ; if any bit is 0, is definitely not in the set.
False positives occur but false negatives do not. After inserting elements, the false positive probability is approximately:
This is minimized when , giving a false positive rate of . With 10 bits per element and 7 hash functions, the false positive rate is about 0.8%.
Bloom filters are ubiquitous in systems: web caches, database query optimization, network routing, and spell checkers all use them to avoid expensive lookups.
Randomized Rounding
Randomized rounding is a technique for converting fractional LP relaxation solutions into integer solutions with provable approximation guarantees.
Consider Set Cover: given the LP relaxation solution for each set , include set in the integer solution independently with probability for an appropriate constant . The expected cost is times the LP optimum, and a Chernoff bound argument shows that all elements are covered with high probability.
More precisely, for each element , the probability that is not covered in a single round is at most:
Choosing large enough and applying a union bound over all elements gives the result. This technique connects the theory of LP relaxations to combinatorial optimization in a principled way.
Connection to Machine Learning
Randomization is fundamental to modern machine learning at every level.
Stochastic gradient descent. Rather than computing the full gradient over all training examples, SGD samples a random mini-batch and uses as an unbiased estimator. This reduces per-iteration cost from to while preserving convergence guarantees. The noise introduced by random sampling also serves as implicit regularization, helping escape sharp local minima.
Random feature maps. Rahimi and Recht (2007) showed that kernel methods can be approximated by random features: for a shift-invariant kernel , sample random frequencies from the kernel’s Fourier transform and approximate . This converts a nonlinear kernel machine into a linear model on -dimensional random features, reducing training from to . This is a Monte Carlo approximation: accuracy improves with more random features.
Dropout. During training, dropout randomly zeros out each neuron’s activation with probability (typically 0.5 for hidden layers). This can be interpreted as training an exponential ensemble of sub-networks (where is the number of neurons) simultaneously. At test time, all neurons are active with weights scaled by . Dropout provides regularization — it prevents co-adaptation of neurons and approximates Bayesian model averaging.
Random forests. Breiman’s random forest (2001) combines two sources of randomness: bagging (training each tree on a bootstrap sample of the data) and random subspace selection (considering only a random subset of features at each split). These randomizations reduce correlation between trees, and the ensemble’s variance decreases as for trees with average variance (assuming low correlation). Random forests are a Las Vegas ensemble: each tree is deterministic given its random seed, and the forest’s prediction is always well-defined.
This concludes the series on Data Structures and Algorithms.