15: PCA in Finance
Principal Component Analysis is the SVD applied to data. In finance, it answers a fundamental question: what are the hidden factors driving asset returns? The math reveals that a few eigenvectors often explain most of the variance in hundreds of correlated assets.
The Setup
You have a returns matrix of shape :
- time periods (rows)
- assets (columns)
- Each entry is the return of asset at time
Assets are correlated. The goal: find uncorrelated factors that explain the variance.
PCA Algorithm
-
Center the data: Subtract the mean return from each column
-
Compute the covariance matrix:
-
Eigendecompose:
- Columns of are principal components (eigenvectors)
- Diagonal of are eigenvalues (variance explained)
-
Project data: gives factor exposures
Alternatively, use SVD directly: . The right singular vectors are the principal components.
Interpretation in Finance
Principal Components as Factors
Each principal component is a portfolio—a linear combination of assets. The first few PCs often have clear interpretations:
Equity markets (e.g., S&P 500 stocks):
- PC1: Market factor (all stocks move together)—explains ~50-70% of variance
- PC2: Often size or value (small vs. large, growth vs. value)
- PC3+: Sector or style factors
Fixed income (yield curve):
- PC1: Level (parallel shifts)—explains ~90% of variance
- PC2: Slope (steepening/flattening)
- PC3: Curvature (butterfly)
Variance Explained
The eigenvalue is the variance of the -th principal component. The fraction of variance explained:
The cumulative variance explained tells you how many PCs you need:
If the first 3 PCs explain 80% of variance, the other dimensions are mostly noise.
Dimensionality Reduction
Factor Models
Instead of modeling correlated returns, model uncorrelated factors:
where:
- is the -dimensional factor return vector (the PCs)
- is the factor loading matrix (columns of )
- is idiosyncratic noise
This is the statistical factor model. It reduces a covariance matrix to:
where is diagonal (idiosyncratic variances).
Benefits
- Fewer parameters: Estimating covariances vs. factor loadings
- More stable: Less overfitting to noise
- Interpretable: Factors often have economic meaning
Example: Yield Curve PCA
Consider monthly changes in Treasury yields at maturities: 3M, 6M, 1Y, 2Y, 3Y, 5Y, 7Y, 10Y, 20Y, 30Y.
Typical results:
| PC | Variance Explained | Interpretation |
|---|---|---|
| 1 | ~90% | Level (all yields move together) |
| 2 | ~8% | Slope (short vs. long end) |
| 3 | ~2% | Curvature (belly vs. wings) |
PC1 loadings: All positive, roughly equal—a parallel shift.
PC2 loadings: Negative for short maturities, positive for long—a steepening/flattening.
PC3 loadings: Positive at short and long ends, negative in the middle—a butterfly.
Three numbers describe 99%+ of yield curve movements!
Risk Management Applications
Factor Risk Decomposition
Portfolio variance decomposes by factor:
The term is the risk contribution from factor .
Stress Testing
To stress test against a factor shock:
- Identify the relevant PC (e.g., PC1 for market crash)
- Compute portfolio exposure:
- Multiply by shock size:
De-correlating Portfolios
Transform returns to PC space: . Now:
- Components are uncorrelated
- Variances are eigenvalues
- Risk budgeting becomes simple
Practical Considerations
Stationarity
PCA assumes the covariance structure is stable. In reality:
- Correlations spike during crises
- Factor structure can change over time
Solution: Rolling-window PCA, regime-switching models, or robust covariance estimators.
Sign and Scale Ambiguity
Eigenvectors are determined up to sign. and are both valid. Choose signs for interpretability (e.g., PC1 loadings all positive for “market”).
Number of Factors
How many PCs to keep? Common approaches:
- Scree plot: Look for an “elbow” in eigenvalue decay
- Variance threshold: Keep enough for 80-90% cumulative variance
- Cross-validation: Test predictive performance
PCA vs. Factor Models
| PCA | Economic Factor Models | |
|---|---|---|
| Factors | Statistical (eigenvectors) | Pre-specified (market, size, value) |
| Orthogonality | Guaranteed | Not generally |
| Interpretability | Often unclear | Built-in |
| Estimation | Data-driven | Requires factor definitions |
PCA finds factors that maximize variance explained. Economic factors are chosen for interpretability. Often they overlap significantly (PC1 ≈ market factor).
Key Takeaways
- PCA = eigendecomposition of covariance: Finds uncorrelated directions of maximum variance
- Few factors explain most variance: Especially in correlated markets
- Interpretable in finance: Level/slope/curvature for rates, market/size/value for equities
- Enables dimensionality reduction: Model factors instead of assets
- Risk decomposes by factor: Understand where portfolio risk comes from
When you see 500 stocks moving in sync, you’re seeing PC1—the market factor. PCA reveals that behind the apparent complexity of financial markets, a handful of common factors drive most of the action. The rest is noise.
Comments
Loading comments...