13: Singular Value Decomposition

Eigendecomposition is elegant, but it has a problem: it only works for square matrices, and even then, only when you have enough eigenvectors. Singular Value Decomposition (SVD) is the generalization that works for any matrix,rectangular, singular, whatever. Every matrix has an SVD.

The idea is beautiful: every linear transformation can be broken into three simple steps,rotate, stretch, rotate. That’s it. No matter how complicated the transformation, it’s just two rotations with some scaling in between.


The Decomposition

Any m×nm \times n matrix AA can be written as:

A=UΣVTA = U\Sigma V^T

where:

  • UU is an m×mm \times m orthogonal matrix (rotation/reflection in the output space)
  • Σ\Sigma is an m×nm \times n diagonal matrix (scaling along axes)
  • VV is an n×nn \times n orthogonal matrix (rotation/reflection in the input space)

The diagonal entries of Σ\Sigma are called singular values, written σ1σ2σr>0\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_r > 0 where rr is the rank of AA.


Geometric Interpretation

Think of AA as a transformation from Rn\mathbb{R}^n to Rm\mathbb{R}^m. The SVD says you can decompose this into:

  1. VTV^T: Rotate the input space to align with the “natural axes” of AA
  2. Σ\Sigma: Stretch along these axes (singular values tell you how much)
  3. UU: Rotate the result into the output space

The columns of VV (called right singular vectors) are the directions in the input space that AA acts on most cleanly. The columns of UU (called left singular vectors) are where those directions end up in the output space.

Example: 2×22 \times 2 matrix

Consider a matrix that stretches by 3 along one direction and by 2 along another:

A=[2112]A = \begin{bmatrix} 2 & 1 \\ 1 & 2 \end{bmatrix}

The SVD reveals:

  • VV rotates to align with AA‘s “principal directions”
  • Σ\Sigma stretches by σ1=3\sigma_1 = 3 and σ2=1\sigma_2 = 1
  • UU rotates back to the output orientation

Even though AA looks like it mixes xx and yy, the SVD finds the hidden coordinate system where AA is just scaling.


Why Not Just Use Eigendecomposition?

Eigendecomposition (A=PDP1A = PDP^{-1}) requires:

  1. AA is square (n×nn \times n)
  2. AA has nn linearly independent eigenvectors

SVD requires nothing. It works for:

  • Rectangular matrices (m×nm \times n)
  • Singular matrices (rank-deficient)
  • Non-diagonalizable matrices
  • Matrices with no real eigenvalues

Plus, UU and VV are orthogonal, so their inverses are just transposes,no need to compute P1P^{-1}.


Connection to Eigenvalues

There’s a beautiful relationship:

  • The singular values of AA are the square roots of the eigenvalues of ATAA^TA (or AATAA^T)
  • The right singular vectors (columns of VV) are the eigenvectors of ATAA^TA
  • The left singular vectors (columns of UU) are the eigenvectors of AATAA^T

This is why SVD always works: ATAA^TA is symmetric and positive semi-definite, so it always has real, non-negative eigenvalues and an orthonormal eigenbasis (by the Spectral Theorem).

Derivation sketch

Start with A=UΣVTA = U\Sigma V^T. Then:

ATA=(UΣVT)T(UΣVT)=VΣTUTUΣVT=VΣ2VTA^TA = (U\Sigma V^T)^T(U\Sigma V^T) = V\Sigma^T U^T U\Sigma V^T = V\Sigma^2 V^T

This is an eigendecomposition of ATAA^TA! The eigenvalues are σi2\sigma_i^2 and the eigenvectors are the columns of VV.


Computing the SVD

Algorithm:

  1. Compute ATAA^TA (an n×nn \times n symmetric matrix)
  2. Find its eigenvalues λ1,,λn\lambda_1, \ldots, \lambda_n and eigenvectors v1,,vn\mathbf{v}_1, \ldots, \mathbf{v}_n
  3. The singular values are σi=λi\sigma_i = \sqrt{\lambda_i}
  4. The columns of VV are the eigenvectors vi\mathbf{v}_i
  5. Compute the columns of UU as ui=1σiAvi\mathbf{u}_i = \frac{1}{\sigma_i}A\mathbf{v}_i (for σi0\sigma_i \neq 0)

(In practice, numerical libraries use more sophisticated algorithms like the Golub-Reinsch algorithm to avoid explicitly forming ATAA^TA, which can be numerically unstable.)


The Four Fundamental Subspaces

The SVD cleanly separates the four fundamental subspaces:

  1. Row space of AA: spanned by the first rr columns of VV (where r=rank(A)r = \text{rank}(A))
  2. Column space of AA: spanned by the first rr columns of UU
  3. Null space of AA: spanned by the last nrn - r columns of VV (singular values are zero)
  4. Left null space of AA: spanned by the last mrm - r columns of UU

The singular values tell you how “important” each direction is. Large singular values correspond to directions where AA stretches space significantly. Small (or zero) singular values correspond to directions where AA compresses or collapses space.


Applications

1. Low-Rank Approximation

Suppose you want the “best” rank-kk approximation to AA. Take the SVD and keep only the kk largest singular values:

Ak=UkΣkVkTA_k = U_k \Sigma_k V_k^T

where UkU_k and VkV_k contain only the first kk columns, and Σk\Sigma_k is the k×kk \times k diagonal matrix of the largest singular values.

Eckart-Young Theorem: AkA_k is the closest rank-kk matrix to AA in both Frobenius and spectral norms.

This is the basis for data compression,you can store a huge matrix by keeping only the top singular values and vectors.

2. Principal Component Analysis (PCA)

In statistics and machine learning, PCA finds the directions of maximum variance in data. The algorithm:

  1. Center your data matrix XX (subtract the mean)
  2. Compute the SVD: X=UΣVTX = U\Sigma V^T
  3. The columns of VV are the principal components,the directions of maximum variance

The singular values σi2\sigma_i^2 tell you how much variance is explained by each component.

3. Pseudoinverse

For non-square or singular matrices, the usual inverse doesn’t exist. The Moore-Penrose pseudoinverse is:

A+=VΣ+UTA^+ = V\Sigma^+ U^T

where Σ+\Sigma^+ is formed by taking the reciprocal of each non-zero singular value.

This gives you the “best” solution to Ax=bA\mathbf{x} = \mathbf{b} in the least-squares sense when AA is not invertible.

4. Image Compression

Images are just matrices of pixel values. Take the SVD and keep only the top kk singular values. For a grayscale image, you’ve reduced storage from m×nm \times n numbers to k(m+n+1)k(m + n + 1) numbers (storing UkU_k, Σk\Sigma_k, and VkV_k).

For large images and small kk, this is massive compression,and the image still looks recognizable because the largest singular values capture the “important” structure.


Example: 3×23 \times 2 Matrix

Let A=[122122]A = \begin{bmatrix} 1 & 2 \\ 2 & 1 \\ 2 & 2 \end{bmatrix}. This maps R2R3\mathbb{R}^2 \to \mathbb{R}^3.

Step 1: Compute ATAA^TA:

ATA=[9889]A^TA = \begin{bmatrix} 9 & 8 \\ 8 & 9 \end{bmatrix}

Step 2: Find eigenvalues and eigenvectors of ATAA^TA:

Characteristic polynomial: det(ATAλI)=(9λ)264=λ218λ+17=(λ17)(λ1)\det(A^TA - \lambda I) = (9-\lambda)^2 - 64 = \lambda^2 - 18\lambda + 17 = (\lambda - 17)(\lambda - 1)

Eigenvalues: λ1=17\lambda_1 = 17, λ2=1\lambda_2 = 1

Eigenvectors (normalized):

v1=12[11],v2=12[11]\mathbf{v}_1 = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 \\ 1 \end{bmatrix}, \quad \mathbf{v}_2 = \frac{1}{\sqrt{2}}\begin{bmatrix} 1 \\ -1 \end{bmatrix}

Step 3: Singular values: σ1=17\sigma_1 = \sqrt{17}, σ2=1\sigma_2 = 1

Step 4: Compute UU:

u1=1σ1Av1=117[334]=134[3342]\mathbf{u}_1 = \frac{1}{\sigma_1}A\mathbf{v}_1 = \frac{1}{\sqrt{17}} \begin{bmatrix} 3 \\ 3 \\ 4 \end{bmatrix} = \frac{1}{\sqrt{34}}\begin{bmatrix} 3 \\ 3 \\ 4\sqrt{2} \end{bmatrix}

(I’ll spare you the full calculation,this is tedious by hand, but straightforward.)

The result: A=UΣVTA = U\Sigma V^T where Σ=[1700100]\Sigma = \begin{bmatrix} \sqrt{17} & 0 \\ 0 & 1 \\ 0 & 0 \end{bmatrix}.


Why SVD Matters

SVD is arguably the most important matrix decomposition:

  • Always exists (no restrictions on AA)
  • Numerically stable (orthogonal matrices have condition number 1)
  • Reveals structure (rank, subspaces, importance of directions)
  • Enables approximation (optimal low-rank approximations)

Eigendecomposition is cleaner for square matrices, but SVD is the tool that works on everything. It’s the Swiss Army knife of linear algebra.


Every matrix has a story,where it stretches, where it collapses, what structure it preserves. SVD tells you that story in the clearest possible way: rotate to the natural coordinates, scale by singular values, rotate back. Once you see it, you can’t unsee it.