8: Likelihood Ratio Tests
The likelihood ratio test (LRT) is a general-purpose method for comparing nested statistical models. It provides an optimal test for simple hypotheses (Neyman-Pearson lemma) and extends to composite hypotheses via Wilks’ theorem. LRTs are the foundation for model comparison in both classical statistics and modern ML.
Simple Hypotheses: Neyman-Pearson Lemma
Consider testing:
- :
- :
where both and are completely specified (simple hypotheses).
Neyman-Pearson Lemma. The most powerful test at significance level rejects when:
where the threshold is chosen such that .
“Most powerful” means no other test at the same significance level has higher power (probability of rejecting when is true). The Neyman-Pearson lemma guarantees that the likelihood ratio is the optimal test statistic for simple vs simple hypothesis testing.
Intuition. The likelihood ratio measures how much more likely the data are under versus . Large values indicate the data strongly favor .
Generalized Likelihood Ratio Test (GLRT)
For composite hypotheses (parameters not fully specified), we replace the simple likelihoods with maximized likelihoods:
where:
- is the MLE under the null (restricted parameter space )
- is the unrestricted MLE (full parameter space )
Since , the unrestricted MLE always achieves at least as high a likelihood, so . Values of near 0 indicate the null is a poor fit compared to the unrestricted model.
Reject when , or equivalently when .
Wilks’ Theorem
Theorem (Wilks, 1938). Under and regularity conditions, as :
where is the number of parameters constrained by .
This is powerful: regardless of the specific distributions involved, the test statistic follows a chi-squared distribution with degrees of freedom equal to the difference in dimensionality between the full and null models. The p-value is:
Examples
Testing a Normal Mean
vs with known .
- Restricted MLE: (fixed)
- Unrestricted MLE:
This recovers the z-test.
Comparing Nested Regression Models
Model 1 (null): ( parameters) Model 2 (full): ( parameters)
The LRT statistic under . If , the additional features do not improve the model.
Testing Independence in Contingency Tables
: two categorical variables are independent. The LRT statistic:
where are observed counts and are expected counts under independence.
Connection to Information Criteria
The LRT compares two specific models. Information criteria extend this to model selection among non-nested models:
AIC (Akaike Information Criterion):
BIC (Bayesian Information Criterion):
where is the number of parameters and is the sample size. Both penalize model complexity: AIC asymptotically selects the model that minimizes prediction error (KL divergence), while BIC selects the true model (if it’s among the candidates) with probability approaching 1.
For nested models, the difference in AIC is related to the LRT: . AIC adds a complexity penalty of 2 per parameter; the LRT adds none.
Application to ML Model Comparison
The likelihood ratio framework extends to comparing ML models:
Deviance. For generalized linear models, the deviance is a likelihood ratio statistic comparing the fitted model to a saturated model (one parameter per observation). The difference in deviance between two nested models follows .
The tracker cost model uses a different evaluation paradigm: bootstrap confidence intervals on MAE rather than LRT. This is because the comparison is between a lookup table (not a parametric model) and XGBoost, which are not nested. The bootstrap CI approach [3,314, 3,627] vs [3,623, 3,984] provides the same kind of rigorous comparison but without requiring nested model structure.
Cross-validation vs LRT. LRT requires nested models and correct specification. Cross-validation works for any two models (nested or not, parametric or not) and directly estimates prediction performance. For this reason, cross-validation is the standard comparison method in ML, while LRT is standard in parametric statistics.
Summary
| Concept | Key Result |
|---|---|
| Neyman-Pearson | Likelihood ratio is the most powerful test for simple hypotheses |
| GLRT | ; reject when is small |
| Wilks’ theorem | asymptotically |
| Degrees of freedom | = number of parameters constrained by |
| AIC/BIC | LRT + complexity penalty for non-nested model selection |
The likelihood ratio provides a principled framework for deciding whether a more complex model is justified by the data. Wilks’ theorem gives a universal asymptotic distribution, making LRTs applicable across a wide range of parametric models.