11: Diagonalization and Similarity

Diagonalization converts a complicated matrix into the simplest possible form: a diagonal matrix. When you can diagonalize a matrix AA, you’ve found the “eigenvector coordinate system” where AA just stretches along axes. This makes everything easier,computing powers, solving differential equations, understanding long-term behavior. The key is finding enough eigenvectors to build a new basis.

What is Diagonalization?

(The Goal)

A square n×nn \times n matrix AA is diagonalizable if it can be written as:

A=PDP1A = PDP^{-1}

where:

  • DD is a diagonal matrix (eigenvalues on the diagonal)
  • PP is an invertible matrix (eigenvectors as columns)

Geometric meaning: In the eigenvector basis, AA is just a scaling transformation. The columns of PP define this special coordinate system, DD tells you the scaling factors, and P1P^{-1} converts back.


(Why This Matters)

Once you have A=PDP1A = PDP^{-1}, everything becomes easier:

Powers:

Ak=PDkP1A^k = PD^kP^{-1}

Computing DkD^k is trivial,just raise each diagonal entry to the kk-th power.

Matrix exponential:

eAt=PeDtP1e^{At} = Pe^{Dt}P^{-1}

where eDte^{Dt} has eλite^{\lambda_i t} on the diagonal.

Long-term behavior: The dominant eigenvalue (largest λ|\lambda|) determines if Ak0A^k \to 0, explodes, or oscillates.


Eigenvalues and Eigenvectors

(Definition)

A scalar λ\lambda is an eigenvalue of AA if there exists a nonzero vector v\mathbf{v} such that:

Av=λvA\mathbf{v} = \lambda \mathbf{v}

The vector v\mathbf{v} is an eigenvector corresponding to λ\lambda.

Interpretation: Eigenvectors are the special directions where AA acts like pure scaling,no rotation, just stretch or compression by factor λ\lambda.


(Finding Eigenvalues)

Rewrite Av=λvA\mathbf{v} = \lambda \mathbf{v} as:

(AλI)v=0(A - \lambda I)\mathbf{v} = \mathbf{0}

For a nontrivial solution to exist, AλIA - \lambda I must be singular:

det(AλI)=0\det(A - \lambda I) = 0

This is the characteristic equation. Expanding it gives a polynomial of degree nn in λ\lambda,the characteristic polynomial.


(Example: Finding Eigenvalues)

A=[4123]A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}

Compute det(AλI)\det(A - \lambda I):

det[4λ123λ]=(4λ)(3λ)2=λ27λ+10\det\begin{bmatrix} 4-\lambda & 1 \\ 2 & 3-\lambda \end{bmatrix} = (4-\lambda)(3-\lambda) - 2 = \lambda^2 - 7\lambda + 10 =(λ5)(λ2)=0= (\lambda - 5)(\lambda - 2) = 0

Eigenvalues: λ1=5\lambda_1 = 5, λ2=2\lambda_2 = 2.


(Finding Eigenvectors)

For each eigenvalue λ\lambda, solve (AλI)v=0(A - \lambda I)\mathbf{v} = \mathbf{0} to find the corresponding eigenvector(s).

For λ1=5\lambda_1 = 5:

(A5I)v=[1122][v1v2]=0(A - 5I)\mathbf{v} = \begin{bmatrix} -1 & 1 \\ 2 & -2 \end{bmatrix}\begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \mathbf{0}

Row reduce: Both rows give v1+v2=0-v_1 + v_2 = 0, so v2=v1v_2 = v_1.

Eigenvector: v1=[11]\mathbf{v}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix} (or any nonzero scalar multiple)

For λ2=2\lambda_2 = 2:

(A2I)v=[2121][v1v2]=0(A - 2I)\mathbf{v} = \begin{bmatrix} 2 & 1 \\ 2 & 1 \end{bmatrix}\begin{bmatrix} v_1 \\ v_2 \end{bmatrix} = \mathbf{0}

Row reduce: 2v1+v2=02v_1 + v_2 = 0, so v2=2v1v_2 = -2v_1.

Eigenvector: v2=[12]\mathbf{v}_2 = \begin{bmatrix} 1 \\ -2 \end{bmatrix}


The Diagonalization Process

(Diagonalization Theorem)

An n×nn \times n matrix AA is diagonalizable if and only if AA has nn linearly independent eigenvectors.

When this holds:

  1. Let P=[v1v2vn]P = [\mathbf{v}_1 \mid \mathbf{v}_2 \mid \cdots \mid \mathbf{v}_n] (eigenvectors as columns)
  2. Let D=diag(λ1,λ2,,λn)D = \text{diag}(\lambda_1, \lambda_2, \ldots, \lambda_n) (corresponding eigenvalues)

Then:

A=PDP1or equivalentlyAP=PDA = PDP^{-1} \quad \text{or equivalently} \quad AP = PD

(Why This Works)

The equation AP=PDAP = PD says:

A[v1v2vn]=[λ1v1λ2v2λnvn]A[\mathbf{v}_1 \mid \mathbf{v}_2 \mid \cdots \mid \mathbf{v}_n] = [\lambda_1\mathbf{v}_1 \mid \lambda_2\mathbf{v}_2 \mid \cdots \mid \lambda_n\mathbf{v}_n]

Column by column, this is exactly Avi=λiviA\mathbf{v}_i = \lambda_i\mathbf{v}_i,the defining property of eigenvectors.

Multiplying on the right by P1P^{-1} gives A=PDP1A = PDP^{-1}.


(Example: Diagonalizing a Matrix)

From before: A=[4123]A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix} with:

  • λ1=5\lambda_1 = 5, v1=[11]\mathbf{v}_1 = \begin{bmatrix} 1 \\ 1 \end{bmatrix}
  • λ2=2\lambda_2 = 2, v2=[12]\mathbf{v}_2 = \begin{bmatrix} 1 \\ -2 \end{bmatrix}

Build PP and DD:

P=[1112],D=[5002]P = \begin{bmatrix} 1 & 1 \\ 1 & -2 \end{bmatrix}, \quad D = \begin{bmatrix} 5 & 0 \\ 0 & 2 \end{bmatrix}

Compute P1P^{-1}:

P1=13[2111]=[2/31/31/31/3]P^{-1} = \frac{1}{-3}\begin{bmatrix} -2 & -1 \\ -1 & 1 \end{bmatrix} = \begin{bmatrix} 2/3 & 1/3 \\ 1/3 & -1/3 \end{bmatrix}

Verify:

PDP1=[1112][5002][2/31/31/31/3]PDP^{-1} = \begin{bmatrix} 1 & 1 \\ 1 & -2 \end{bmatrix}\begin{bmatrix} 5 & 0 \\ 0 & 2 \end{bmatrix}\begin{bmatrix} 2/3 & 1/3 \\ 1/3 & -1/3 \end{bmatrix} =[5254][2/31/31/31/3]=[4123]=A= \begin{bmatrix} 5 & 2 \\ 5 & -4 \end{bmatrix}\begin{bmatrix} 2/3 & 1/3 \\ 1/3 & -1/3 \end{bmatrix} = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix} = A \quad ✓

Similar Matrices

(Definition)

Two matrices AA and BB are similar if there exists an invertible matrix PP such that:

B=P1APB = P^{-1}AP

Interpretation: Similar matrices represent the same linear transformation in different bases. They have the same intrinsic properties but different coordinate representations.


(Properties Preserved by Similarity)

If AA and BB are similar, they share:

  1. Determinant: det(B)=det(A)\det(B) = \det(A)
  2. Trace: tr(B)=tr(A)\text{tr}(B) = \text{tr}(A) (sum of diagonal entries)
  3. Eigenvalues: Same characteristic polynomial, same eigenvalues (with multiplicity)
  4. Rank: rank(B)=rank(A)\text{rank}(B) = \text{rank}(A)
  5. Invertibility: AA invertible     \iff BB invertible

Why? All these properties are basis-independent,they depend only on the transformation itself, not the coordinate system.


(Diagonalization as Similarity)

When A=PDP1A = PDP^{-1}, we’re saying AA is similar to a diagonal matrix DD.

Key insight: Diagonalizable matrices are exactly those that are similar to diagonal matrices. The “nicest” matrices are those you can represent diagonally in some basis.


When Is a Matrix Diagonalizable?

(Sufficient Condition: Distinct Eigenvalues)

Theorem: If an n×nn \times n matrix has nn distinct eigenvalues, it is diagonalizable.

Why? Eigenvectors from different eigenvalues are automatically linearly independent. So nn distinct eigenvalues gives you nn independent eigenvectors.

Example: A=[1221]A = \begin{bmatrix} 1 & 2 \\ 2 & 1 \end{bmatrix} has eigenvalues 33 and 1-1 (distinct), so it’s diagonalizable.


(Repeated Eigenvalues: It Depends)

When eigenvalues repeat, diagonalizability depends on whether there are enough eigenvectors.

Algebraic vs Geometric Multiplicity:

For an eigenvalue λ\lambda:

  • Algebraic multiplicity: How many times λ\lambda appears as a root of the characteristic polynomial
  • Geometric multiplicity: dim(ker(AλI))\dim(\ker(A - \lambda I)),the number of linearly independent eigenvectors for λ\lambda

Key fact: Always: geometric multiplicity \leq algebraic multiplicity

Diagonalizability condition: AA is diagonalizable if and only if for every eigenvalue, geometric multiplicity equals algebraic multiplicity.


(Example: Diagonalizable with Repeated Eigenvalue)

A=[200020003]A = \begin{bmatrix} 2 & 0 & 0 \\ 0 & 2 & 0 \\ 0 & 0 & 3 \end{bmatrix}

Eigenvalue λ=2\lambda = 2 has algebraic multiplicity 2.

Check geometric multiplicity:

A2I=[000000001]A - 2I = \begin{bmatrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{bmatrix}

dim(ker(A2I))=2\dim(\ker(A - 2I)) = 2 (the first two basis vectors are eigenvectors).

Since geometric = algebraic for all eigenvalues, AA is diagonalizable (it’s already diagonal!).


(Example: Not Diagonalizable)

A=[2102]A = \begin{bmatrix} 2 & 1 \\ 0 & 2 \end{bmatrix}

Eigenvalue: λ=2\lambda = 2 with algebraic multiplicity 2.

A2I=[0100]A - 2I = \begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}

dim(ker(A2I))=1\dim(\ker(A - 2I)) = 1,only one independent eigenvector [10]\begin{bmatrix} 1 \\ 0 \end{bmatrix}.

Since geometric (1) < algebraic (2), AA is not diagonalizable.


Computing Powers of Matrices

(The Power Formula)

If A=PDP1A = PDP^{-1}, then:

Ak=PDkP1A^k = PD^kP^{-1}

where Dk=diag(λ1k,λ2k,,λnk)D^k = \text{diag}(\lambda_1^k, \lambda_2^k, \ldots, \lambda_n^k).

Why this works:

A2=(PDP1)(PDP1)=PD(P1P)DP1=PD2P1A^2 = (PDP^{-1})(PDP^{-1}) = PD(P^{-1}P)DP^{-1} = PD^2P^{-1}

By induction, Ak=PDkP1A^k = PD^kP^{-1}.


(Example)

Compute A10A^{10} for A=[4123]A = \begin{bmatrix} 4 & 1 \\ 2 & 3 \end{bmatrix}.

We found:

P=[1112],D=[5002],P1=[2/31/31/31/3]P = \begin{bmatrix} 1 & 1 \\ 1 & -2 \end{bmatrix}, \quad D = \begin{bmatrix} 5 & 0 \\ 0 & 2 \end{bmatrix}, \quad P^{-1} = \begin{bmatrix} 2/3 & 1/3 \\ 1/3 & -1/3 \end{bmatrix} D10=[51000210]=[9765625001024]D^{10} = \begin{bmatrix} 5^{10} & 0 \\ 0 & 2^{10} \end{bmatrix} = \begin{bmatrix} 9765625 & 0 \\ 0 & 1024 \end{bmatrix} A10=PD10P1A^{10} = PD^{10}P^{-1}

Computing this is far easier than multiplying AA ten times!


(Long-Term Behavior)

As kk \to \infty, AkA^k is dominated by the largest eigenvalue (in absolute value).

  • If λmax<1|\lambda_{\max}| < 1: Ak0A^k \to 0 (everything decays)
  • If λmax=1|\lambda_{\max}| = 1: Bounded behavior (might oscillate)
  • If λmax>1|\lambda_{\max}| > 1: AkA^k explodes (growth along dominant eigenvector)

Application: In Markov chains, Leslie models, discrete dynamical systems,eigenvalues control the long-term fate.


Change of Basis Perspective

(What PP Does)

The matrix PP performs a change of basis from the standard basis to the eigenvector basis.

In the eigenvector basis:

  • Coordinates: [x]B=P1x[\mathbf{x}]_{\mathcal{B}} = P^{-1}\mathbf{x}
  • Transformation: [A]B=D[A]_{\mathcal{B}} = D (diagonal!)
  • The transformation is just scaling along each eigenvector direction

In the standard basis:

  • Coordinates: x\mathbf{x}
  • Transformation: AA
  • The transformation looks complicated because we’re using the “wrong” coordinates

The diagonalization formula:

A=PDP1A = PDP^{-1}

can be read as:

  1. P1P^{-1}: Convert from standard to eigenvector basis
  2. DD: Apply the simple diagonal transformation
  3. PP: Convert back to standard basis

Applications

(Differential Equations)

The system dxdt=Ax\frac{d\mathbf{x}}{dt} = A\mathbf{x} has solution:

x(t)=eAtx(0)\mathbf{x}(t) = e^{At}\mathbf{x}(0)

If A=PDP1A = PDP^{-1}:

eAt=PeDtP1=P[eλ1teλnt]P1e^{At} = Pe^{Dt}P^{-1} = P\begin{bmatrix} e^{\lambda_1 t} & & \\ & \ddots & \\ & & e^{\lambda_n t} \end{bmatrix}P^{-1}

The eigenvectors give you the “modes” of the system, and eigenvalues tell you whether each mode grows or decays.


(Fibonacci Numbers)

The Fibonacci recurrence Fn+1=Fn+Fn1F_{n+1} = F_n + F_{n-1} can be written as:

[Fn+1Fn]=[1110][FnFn1]\begin{bmatrix} F_{n+1} \\ F_n \end{bmatrix} = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix}\begin{bmatrix} F_n \\ F_{n-1} \end{bmatrix}

Diagonalizing A=[1110]A = \begin{bmatrix} 1 & 1 \\ 1 & 0 \end{bmatrix} gives eigenvalues ϕ=1+52\phi = \frac{1+\sqrt{5}}{2} (golden ratio) and ϕ^=152\hat{\phi} = \frac{1-\sqrt{5}}{2}.

This leads to Binet’s formula:

Fn=ϕnϕ^n5F_n = \frac{\phi^n - \hat{\phi}^n}{\sqrt{5}}

Summary: The Diagonalization Playbook

To diagonalize an n×nn \times n matrix AA:

  1. Find eigenvalues: Solve det(AλI)=0\det(A - \lambda I) = 0
  2. Find eigenvectors: For each λ\lambda, solve (AλI)v=0(A - \lambda I)\mathbf{v} = \mathbf{0}
  3. Check linear independence: You need nn independent eigenvectors
  4. Build PP and DD:
    • P=[v1vn]P = [\mathbf{v}_1 \mid \cdots \mid \mathbf{v}_n] (eigenvectors as columns)
    • D=diag(λ1,,λn)D = \text{diag}(\lambda_1, \ldots, \lambda_n) (eigenvalues in matching order)
  5. Verify: AP=PDAP = PD

If you can’t find nn independent eigenvectors, the matrix is not diagonalizable,but you might still use the Jordan normal form (a nearly-diagonal form with 1’s above some diagonal entries).

Diagonalization reveals the coordinate system where a transformation is simplest,just scaling, no mixing. It’s the key to understanding matrix powers, exponentials, and the long-term behavior of linear systems.