13: Singular Value Decomposition
Eigendecomposition is elegant, but it has a problem: it only works for square matrices, and even then, only when you have enough eigenvectors. Singular Value Decomposition (SVD) is the generalization that works for any matrix,rectangular, singular, whatever. Every matrix has an SVD.
The idea is beautiful: every linear transformation can be broken into three simple steps,rotate, stretch, rotate. That’s it. No matter how complicated the transformation, it’s just two rotations with some scaling in between.
The Decomposition
Any matrix can be written as:
where:
- is an orthogonal matrix (rotation/reflection in the output space)
- is an diagonal matrix (scaling along axes)
- is an orthogonal matrix (rotation/reflection in the input space)
The diagonal entries of are called singular values, written where is the rank of .
Geometric Interpretation
Think of as a transformation from to . The SVD says you can decompose this into:
- : Rotate the input space to align with the “natural axes” of
- : Stretch along these axes (singular values tell you how much)
- : Rotate the result into the output space
The columns of (called right singular vectors) are the directions in the input space that acts on most cleanly. The columns of (called left singular vectors) are where those directions end up in the output space.
Example: matrix
Consider a matrix that stretches by 3 along one direction and by 2 along another:
The SVD reveals:
- rotates to align with ‘s “principal directions”
- stretches by and
- rotates back to the output orientation
Even though looks like it mixes and , the SVD finds the hidden coordinate system where is just scaling.
Why Not Just Use Eigendecomposition?
Eigendecomposition () requires:
- is square ()
- has linearly independent eigenvectors
SVD requires nothing. It works for:
- Rectangular matrices ()
- Singular matrices (rank-deficient)
- Non-diagonalizable matrices
- Matrices with no real eigenvalues
Plus, and are orthogonal, so their inverses are just transposes,no need to compute .
Connection to Eigenvalues
There’s a beautiful relationship:
- The singular values of are the square roots of the eigenvalues of (or )
- The right singular vectors (columns of ) are the eigenvectors of
- The left singular vectors (columns of ) are the eigenvectors of
This is why SVD always works: is symmetric and positive semi-definite, so it always has real, non-negative eigenvalues and an orthonormal eigenbasis (by the Spectral Theorem).
Derivation sketch
Start with . Then:
This is an eigendecomposition of ! The eigenvalues are and the eigenvectors are the columns of .
Computing the SVD
Algorithm:
- Compute (an symmetric matrix)
- Find its eigenvalues and eigenvectors
- The singular values are
- The columns of are the eigenvectors
- Compute the columns of as (for )
(In practice, numerical libraries use more sophisticated algorithms like the Golub-Reinsch algorithm to avoid explicitly forming , which can be numerically unstable.)
The Four Fundamental Subspaces
The SVD cleanly separates the four fundamental subspaces:
- Row space of : spanned by the first columns of (where )
- Column space of : spanned by the first columns of
- Null space of : spanned by the last columns of (singular values are zero)
- Left null space of : spanned by the last columns of
The singular values tell you how “important” each direction is. Large singular values correspond to directions where stretches space significantly. Small (or zero) singular values correspond to directions where compresses or collapses space.
Applications
1. Low-Rank Approximation
Suppose you want the “best” rank- approximation to . Take the SVD and keep only the largest singular values:
where and contain only the first columns, and is the diagonal matrix of the largest singular values.
Eckart-Young Theorem: is the closest rank- matrix to in both Frobenius and spectral norms.
This is the basis for data compression,you can store a huge matrix by keeping only the top singular values and vectors.
2. Principal Component Analysis (PCA)
In statistics and machine learning, PCA finds the directions of maximum variance in data. The algorithm:
- Center your data matrix (subtract the mean)
- Compute the SVD:
- The columns of are the principal components,the directions of maximum variance
The singular values tell you how much variance is explained by each component.
3. Pseudoinverse
For non-square or singular matrices, the usual inverse doesn’t exist. The Moore-Penrose pseudoinverse is:
where is formed by taking the reciprocal of each non-zero singular value.
This gives you the “best” solution to in the least-squares sense when is not invertible.
4. Image Compression
Images are just matrices of pixel values. Take the SVD and keep only the top singular values. For a grayscale image, you’ve reduced storage from numbers to numbers (storing , , and ).
For large images and small , this is massive compression,and the image still looks recognizable because the largest singular values capture the “important” structure.
Example: Matrix
Let . This maps .
Step 1: Compute :
Step 2: Find eigenvalues and eigenvectors of :
Characteristic polynomial:
Eigenvalues: ,
Eigenvectors (normalized):
Step 3: Singular values: ,
Step 4: Compute :
(I’ll spare you the full calculation,this is tedious by hand, but straightforward.)
The result: where .
Why SVD Matters
SVD is arguably the most important matrix decomposition:
- Always exists (no restrictions on )
- Numerically stable (orthogonal matrices have condition number 1)
- Reveals structure (rank, subspaces, importance of directions)
- Enables approximation (optimal low-rank approximations)
Eigendecomposition is cleaner for square matrices, but SVD is the tool that works on everything. It’s the Swiss Army knife of linear algebra.
Every matrix has a story,where it stretches, where it collapses, what structure it preserves. SVD tells you that story in the clearest possible way: rotate to the natural coordinates, scale by singular values, rotate back. Once you see it, you can’t unsee it.