Eigendecomposition
In this post, we are going to take a closer look at the eigendecomposition: an operation which represents a square matrix as a multiplication of its three constituents. To get a better understanding I will accompany theory with specific examples.
Table of Contents
- Overview
- Definition
- Specific Examples
a. Projection Matrix
b. Permutation Matrix
c. Symmetric Matrix - Eigenvectors and Diagonalization
- Geometric View of A = VΔV^-1
- Cayley-Hamilton Theorem
Overview
Eigenvectors and eigenvalues are two terms associated with square matrices which constitutes eigendecomposition of a matrix. Does every square matrix have eigenvalue and corresponding eigenvector then? It depends on the field we are working on: Some real valued matrices may not have real eigenvalues at all. Nevertheless, it has a particularly simple expression for a class of matrices often used in multivariate analysis such as correlation, covariance, or cross-product matrices. The eigen-decomposition of this type of matrices is important in statistics because it is used to find the maximum (or minimum) of functions involving these matrices. For example, principal component analysis is obtained from the eigen-decomposition of a covariance matrix.
Definition
A square matrix A has the eigenvalue λ and corresponding eigenvector v if the following equation holds:
Av = λv (1)
In this equation A is of (N x N) matrix, v is an N dimensional vector and finally λ is a scalar over some field: Real or Complex. In simple words, this equation looks for vector v such that when it is linearly transformed by matrix A, the only change is made in its norm but not in its direction. A different perspective might be that we are seeking vectors v such that when it is multiplied by some square matrix A, the resultant vector is parallel to v. In this case, vector v is called eigenvector and λ is called eigenvalue. It is worth mentioning that these two terms are matrix-specific.
Observe that, if the matrix A is a singular matrix then every vector in its nullspace is an eigenvector since we can satisfy equation (1) by setting λ as 0.
Suppose that v is in nullspace of A. Then:
Av = 0 = λv : λ = 0 (2)
But this is a trivial solution which does not give much at the end of the day. We are specifically looking for non-trivial eigenvalues.
To better understand the concept, let’s give a concrete example. Suppose that A is a (2 x 2) matrix with the following entries:
This matrix has two eigenvalues and corresponding eigenvectors.
λ1 = 4, v1 = [3, 2]^T (3)
λ2 = -1, v2 = [-1, 1]^T (4)
As we can verify easily, for the expression (3) Av outputs the vector [12, 8]^T which equals λv = 4 x [3, 2]^T and if we check the expression (4) Av generates [1, -1]^T which is indeed equal to λv = -1 x [-1, 1]^T.
We can also geometrically interpret the result. For this purpose, let’s concentrate on expression (3): Below you can see two vectors, the red one corresponds to initial vector v and the blue one corresponds to the Av.
Here we visually see that two vectors are parallel each other, though they are not required to point to exactly same direction, initial vector and transformed vector may point to opposite directions. See the visualization of expression (4) below:
Specific Examples
In this section, we are going to examine different types of matrices and their relationship between eigenvalues and eigenvectors.
- Projection Matrix
Let the projection matrix be P then if a vector v is already on the plane which it is going to be projected on, then following equation always holds:
Pv = v (5)
In this case projection matrix P has eigenvalue λ = 1 for every vector v locating on the plane determined by P further each of these vectors are eigenvectors of P. Let’s see it geometrically:
What if v is perpendicular to the plane? As far as this condition is concerned, Pv will yield 0 vector which means that for λ = 0 equation (1) holds:
Suppose vector v is perpendicular to the plane, then:
Pv = 0
By the equation (1) we have
Pv = λv
We get λ = 0, a trivial eigenvalue in this case.
2. Permutation Matrix
A permutation matrix is a square binary matrix that has exactly one entry of 1 in each row and each column and 0s elsewhere. Let’s take a (2 x 2) permutation matrix:
This (2 x 2) permutation matrix has 2 eigenvalues:
λ1 = 1, v1 = [1, 1]^T (6)
λ2 = -1, v2 = [-1, 1]^T (7)
Given a square matrix A, how can we find its eigenvalues and corresponding eigenvectors then? In order to find an answer to this question let’s review the equation (1) again and apply some algebraic operations on it:
Av = λv (8)
(A - λI)v = 0 (9)
Where I is an (N x N) identity matrix. Observe that for equation (9) to have a non-trivial solution the matrix (A — λI) must be singular which further means that:
det(A - λI) = 0 (10)
Equation (10) is called the characteristic equation of matrix A. Then, steps to find eigenvalues and eigenvectors for given matrix reduces to following:
- Find λ [eigenvalue]
- By row elimination find v [eigenvector]
Then characteristic equation gives:
det(A - λI) = 0
λ^2 - 1 = 0: λ1 = 1, λ2 = -1
We have accomplished the step 1, now let’s utilize from equation (9) to find corresponding eigenvectors:
(A - λI)v = 0 (11)
For λ1 = 1: v = [1, 1]^T (12)
For λ2 = -1: v = [-1, 1]^T (13)
We can make three important observations at this point:
- For any square matrix A of size n x n, there exists n eigenvalues though some of them may be repeated or complex. To see this, we need to focus on the characteristic equation (10).
det(A - λI) = 0
Indeed, characteristic equation can be viewed as a function of λ whose result is a n-degree polynomial in λ. Considering this view, characteristic equation can be reformulated as follow:
(λ - λ1)(λ - λ2)...(λ - λn)
Where λ1…λn corresponds to the roots of the equation, eigenvalues.
2. Trace of the square matrix A gives the sum of the eigenvalues.
3. Product of eigenvalues gives the determinant of the matrix A.
3. Symmetric Matrix
In the previous section, we saw a special kind of symmetric matrix: permutation matrix and at the end we found eigenvectors and eigenvalues of matrix A as follow:
λ1 = 1: v1 = [1, 1]^T
λ2 = -1: v2 = [-1, 1]^T
Now, let us look at a slightly different example. Let the matrix B be:
Then, by following the 2 steps discussed above we can find eigenvalues and associated eigenvectors. By characteristic equation we have: det(B— λI) = 0.
det(B - λI) = 0: (3 - λ)^2 - 1 = 0
λ1 = 4: v1 = [1, 1]^T
λ2 = 2: v2 = [-1, 1]^T
Observe that, matrix B is nothing but A + 3I and we have the same eigenvectors. Furthermore, eigenvalues are incremented by 3.
More formally, if an (n x n) matrix has eigenvalues λ1, λ2, …, λn; B = A + cI where c is a scalar will share the same eigenvectors with B and further eigenvalues are changed to λ1 + c, λ2 + c, …, λn + c.
Let constituent matrix be A and B = A + cI. For any eigenvalue λ and eigenvector v corresponding to the matrix A following holds:
Av = λv
Then if we process the same calculation on B:
Bv = (A + cI)v = Av + cv
Where Av = λv. If we plug this value into above equation we get:
Bv = (λ + c)v
We see clearly that same matrix B has exactly same eigenvectors with eigenvalues changed by c.
Eigenvectors and Diagonalization
In this section, we are going to experience how eigenvectors are used in diagonalization. Diagonal matrices are special kind of matrices that can be represented by multiplication of 3 matrices of the form:
A = PDP^-1 (14)
Where D is a diagonal matrix and P is an invertible matrix. The same structure can be generated using eigenvectors. Suppose that A is an (n x n) matrix with eigenvectors v1, v2, …, vn and eigenvalues λ1, λ2, …, λn. Then, let V be a matrix whose columns are all eigenvectors of A:
V = [v1 v2 ... vn] (15)
If we perform the operation AV we will get:
AV = A[v1 v2 ... vn] : [Av1 Av2 ... Avn] = [λ1v1 λ2v2 ... λnvn] (16)
[λ1v1 λ2v2 ... λnvn] = VΔ (17)
Where Δ is a diagonal matrix whose diagonal entries are the eigenvalues:
If we process the equation (17) further we get:
AV = VΔ (18)
A = VΔV^-1 (19)
By equation (19) we can infere why determinant of the matrix A is the multiplication of eigenvalues. It is quite easy to show, let’s start with taking determinant of both sides of equation:
det(A) = det(VΔV^-1): Then by piecewise multiplication property of determinant we can write the above equation as:
det(A) = det(V)det(Δ)det(V^-1): Further, since det(V^-1) is equal to 1/det(A) above equation reduces to
det(A) = det(Δ): Since Δ is a diagonal matrix its determinant is equal to multiplication of diagonal entries, which are eigenvalues.
One significant rule for constructing the matrix V is that in order for V to be invertible eigenvectors placed on its columns must be linearly independent. In practice, they are chosen as orthonormal vectors since it adds V several additional properties we can make use of. Before moving on these additional features, let’s get a better understanding of the equation (19).
Geometric View of A = VΔV^-1
It is important to understand what A = VΔV^-1 means geometrically. Let x be a vector in R^N. Then, observe what exactly happening to the vector x by the following linear transformation:
Ax = VΔV^-1x (20)
The components of vector x can be viewed as a coordinates of x with respect to standard basis. By multiplying x with V^-1 we get the exactly same orientation of vector x in n dimensional space but in this time coordinates are specified with respect to columns of V.
Change of basis review:
Ix = Vx' : We need to find the coordinates of x with respect change of basis vector V. Note that V is a basis in n dimensional space since it has n linearly independent columns. If we process above equation further we get:
x' = V^-1x
Then multiplying vector V^-1x with matrix Δ results in scaling every coordinate of this vector by the diagonal entries of Δ. This operations simply dilates coordinates.
Finally multiplying the vector ΔV^-1x with matrix V returns the coordinates back to with respect to the original basis, in this case standard basis.
Let us take an example to embrace the entire process better. Again consider the matrix A as:
In the ”Definition” section we saw that matrix A has eigenvectors [3, 2] and [-1, 1] where eigenvalues are λ1 = 4, λ2 = -1 respectively. Let’s take a random vector x in R², [5, 5]. Initially, x can be represented in 2D plane as:
Now, let us construct the matrix V: The way we construct it is just putting the eigenvectors to columns.
Further, inverse of this matrix is:
And lastly we need to construct the matrix Δ: The way we construct it is just putting the eigenvalues in diagonal entries and setting all other entries as 0. Important rule we need to obey here is that eigenvalues must be put into the matrix Δ in order corresponding eigenvectors placed in V.
As first step, we multiply vector x with inverse of V:
V^-1x = [2, 1]^T (21)
As we discussed above, this transformation corresponds to change of basis: Before applying linear transformation coordinates of vector x were determined with respect to standard basis. After transformation, in 2D plane we will see the exactly same orientation of x, the only change is that now the coordinates of x (which is the vector V^-1x) is written with respect to columns of V.
Next, we will scale the resultant vector (which is [2, 1]^T) using the matrix Δ. As discussed above, this operation means scaling each component of resultant vector simultaneously by the diagonal entries of Δ which are indeed the eigenvalues.
ΔV^-1x = [8, -1]^T (22)
After this operation, vector x is distorted, it lost its initial orientation. With respect to basis V its coordinates now [8, -1] basis contains the vectors [3, 2] and [-1, 1] we can conclude that x is pointing to [25, 15] with respect to standard basis which was the initial basis.
This coordinate is what we will get after the last step: Multiplying [8, -1]^T with matrix V. As we discussed previously, this operation returns the coordinates of transformed vector x which are determined by V to coordinates which are determined by initial basis, in this case standard basis.
VΔV^-1x = [25, 15]^T (23)
Observe further that this result is exactly what we get by the operation:
Ax = [25, 15]^T (24)
The overall result is an anisotropic scaling in n eigenvector directions for an (nxn) matrix. This concludes our discussion on geometric interpretation of the diagonalization with eigenvalues.
Cayley-Hamilton Theorem
Let A be any square matrix with characteristic polynomial f(λ) = det(A — λI). Then f(A) evaluates to zero matrix.
Although Cayley-Hamilton theorem applies to any square matrix, it is easy to prove in diagonalizable matrices. Suppose that A is a diagonalizable matrix. Then:
f(A) = Vf(Δ)V^-1 (25)
Applying a function to the diagonalizable matrix is equivalent to applying the function only to diagonal part. Since f is evaluates the characteristic equation f(Δ) is equal to zero matrix which makes f(A) zero matrix at the end. In other words, Cayley-Hamilon theorem states that every square matrix A is a root of their characteristic equation.
One interesting consequence of Cayley-Hamilton theorem is that inverse of an ( n x n ) matrix can be represented by (n-1) degree polynomial. Let’s see the reason: Suppose A is an (n x n) matrix. Then, characteristic equation given by the formula det(A — λI) is an n-degree polynomial of the form:
By Cayley-Hamilton theorem every square matrix is a root of their characteristic equation. This means that:
Further, if we take cI term from the left handside to the right handside and take the remaining left part into parantheses of A we get:
Thank you for reading, hope you enjoyed!