4: Method of Moments
The method of moments (MoM) is the oldest systematic approach to parameter estimation. The idea is simple: equate population moments to sample moments and solve for the parameters. It produces consistent estimators with minimal computational effort, though it generally sacrifices efficiency compared to maximum likelihood. For models where the MLE is difficult to compute, MoM estimators serve as useful starting points or standalone alternatives.
Population and Sample Moments
The -th population moment of a random variable is:
The -th central moment is . The first moment is the mean, the second central moment is the variance.
The corresponding -th sample moment from observations is:
By the law of large numbers, for each (assuming the moment exists). This convergence is the foundation of the method.
The Method
Suppose the distribution of depends on parameters . The first population moments are functions of these parameters:
The method of moments estimator solves the system of equations:
That is, we set each population moment equal to its sample counterpart and solve for .
Algorithm:
- Express the first population moments as functions of
- Replace each population moment with the sample moment
- Solve the resulting system of equations in unknowns
Examples
Normal distribution. For , we have two parameters and need two moments:
Setting and :
Note this gives the biased variance estimator (dividing by , not ). The MoM and MLE coincide here.
Gamma distribution. For with and :
Solving: and , where .
The Gamma MLE has no closed form and requires numerical optimization, so MoM provides a convenient analytical alternative.
Uniform distribution. For , with and :
A known deficiency: can exceed and can fall below , producing estimates inconsistent with the observed data. The MLE () avoids this problem.
Beta distribution. For :
Again, the MLE requires iterative methods, while MoM gives a closed-form initializer.
Properties
Consistency. MoM estimators are consistent under mild conditions. Since by the LLN, and the mapping from moments to parameters is continuous, the continuous mapping theorem gives .
Asymptotic normality. By the CLT and the delta method, MoM estimators are asymptotically normal:
where depends on the moments of up to order and the Jacobian of the moment-to-parameter mapping.
Not generally efficient. The asymptotic variance is typically larger than the Cramer-Rao lower bound. MoM estimators use only the first moments, discarding information present in the full likelihood. The efficiency loss can be substantial.
MoM vs. MLE
| Method of Moments | Maximum Likelihood | |
|---|---|---|
| Computation | Solve moment equations (often closed-form) | Optimize likelihood (often iterative) |
| Efficiency | Generally inefficient | Asymptotically efficient |
| Robustness | Less sensitive to model misspecification | Can be sensitive to distributional assumptions |
| Existence | Always exists if moments exist | May not exist or may not be unique |
| Invariance | Not invariant to reparametrization | Invariant: MLE of is |
For exponential family distributions, MoM and MLE produce the same estimator, since the MLE equates the expected sufficient statistic to the sample average (and the sufficient statistics are functions of the moments). Outside the exponential family, the two methods diverge.
In practice, MoM is most useful when:
- The MLE has no closed form (Gamma, Beta, mixture models)
- A quick initial estimate is needed for iterative MLE algorithms
- Robustness to misspecification is more important than efficiency
Generalized Method of Moments
The generalized method of moments (GMM) extends MoM to settings with more moment conditions than parameters. Suppose we have moment conditions:
With more equations than unknowns, exact solutions generally do not exist. GMM minimizes a quadratic form:
where is a positive definite weighting matrix. The choice (the inverse of the estimated covariance of the moment conditions) yields the efficient GMM estimator, which achieves the smallest asymptotic variance among all GMM estimators.
GMM is foundational in econometrics, where moment conditions arise naturally from economic theory (e.g., Euler equations, instrumental variables). The instrumental variables (IV) estimator is a special case of GMM.
Connection to ML. The idea of matching empirical expectations to model expectations appears throughout machine learning. Contrastive divergence training in restricted Boltzmann machines matches the data’s expected sufficient statistics to the model’s. Moment matching is also central to generative adversarial networks (the discriminator implicitly enforces moment conditions) and to kernel methods through maximum mean discrepancy (MMD), which compares all moments simultaneously in a reproducing kernel Hilbert space.
Summary
| Concept | Key Result |
|---|---|
| MoM estimator | Solve for |
| Consistency | Follows from LLN + continuous mapping theorem |
| Efficiency | Generally less efficient than MLE |
| Best use case | Closed-form estimates when MLE is intractable |
| GMM | Handles overidentified models with moment conditions |