Chapter 27 Singular Value Decomposition
Singular Value Decomposition (SVD) is another type of decomposition. Different than eigendecomposition, which requires a square matrix, SVD allows us to decompose a rectangular matrix. This is more useful because the rectangular matrix usually represents data in practice.
For any matrix \(\mathbf{A}\), both \(\mathbf{A^{\top} A}\) and \(\mathbf{A A^{\top}}\) are symmetric. Therefore, they have \(n\) and \(m\) **orthogonal* eigenvectors, respectively. The proof is simple:
Suppose we have a 2 x 2 symmetric matrix, \(\mathbf{A}\), with two distinct eigenvalues (\(\lambda_1, \lambda_2\)) and two corresponding eigenvectors (\(\mathbf{v}_1\) and \(\mathbf{v}_1\)). Following the rule,
\[ \begin{aligned} & \mathbf{A} \mathbf{v}_1=\lambda_1 \mathbf{v}_1, \\ & \mathbf{A} \mathbf{v}_2=\lambda_2 \mathbf{v}_2. \\ \end{aligned} \] Let’s multiply (inner product) the first one with \(\mathbf{v}_2^{\top}\):
\[ \mathbf{v}_2^{\top}\mathbf{A} \mathbf{v}_1=\lambda_1 \mathbf{v}_2^{\top} \mathbf{v}_1 \] And, the second one with \(\mathbf{v}_1^{\top}\)
\[ \mathbf{v}_1^{\top}\mathbf{A} \mathbf{v}_2=\lambda_2 \mathbf{v}_1^{\top} \mathbf{v}_2 \] If we take the transpose of both side of \(\mathbf{v}_2^{\top}\mathbf{A} \mathbf{v}_1=\lambda_1 \mathbf{v}_2^{\top} \mathbf{v}_1\), it will be
\[ \mathbf{v}_1^{\top}\mathbf{A} \mathbf{v}_2=\lambda_1 \mathbf{v}_1^{\top} \mathbf{v}_2 \] And, subtract these last two:
\[ \begin{aligned} &\mathbf{v}_1^{\top}\mathbf{A} \mathbf{v}_2=\lambda_2 \mathbf{v}_1^{\top} \mathbf{v}_2 \\ & \mathbf{v}_1^{\top}\mathbf{A} \mathbf{v}_2=\lambda_1 \mathbf{v}_1^{\top} \mathbf{v}_2 \\ & \hline 0=\left(\lambda_2 - \lambda_1\right) \mathbf{v}_1^{\top} \mathbf{v}_2 \end{aligned} \] Since , \(\lambda_1\) and \(\lambda_2\) are distinct, \(\lambda_2- \lambda_1\) cannot be zero. Therefore, $ _1^{} _2 = 0$. As we saw in Chapter 15, the dot products of two vectors can be expressed geometrically
\[ \begin{aligned} a \cdot b=\|a\|\|b\| \cos (\theta),\\ \cos (\theta)=\frac{a \cdot b}{\|a\|\|b\|} \end{aligned} \] Hence, \(\cos (\theta)\) has to be zero for $ _1^{} _2 = 0$. Since \(\cos (90)=0\), the two vectors are orthogonal.
We start with the following eigendecomposition for \(\mathbf{A^{\top}A}\) and \(\mathbf{A A^{\top}}\):
\[ \begin{aligned} \mathbf{A^{\top} A =V D V^{\top}} \\ \mathbf{A A^{\top} =U D^{\prime} U^{\top}} \end{aligned} \]
where \(\mathbf{V}\) is an \(n \times n\) orthogonal matrix consisting of the eigenvectors of \(\mathbf{A}^{\top}\mathbf{A},\) and, \(\mathbf{D}\) is an \(n \times n\) diagonal matrix with the eigenvalues of \(\mathbf{A^{\top} A}\) on the diagonal. The same decomposition for \(\mathbf{A A^{\top}}\), now \(\mathbf{U}\) is an \(m \times m\) orthogonal matrix consisting of the eigenvectors of \(\mathbf{A A^{\top}}\), and \(\mathbf{D^{\prime}}\) is an \(m \times m\) diagonal matrix with the eigenvalues of \(\mathbf{A A^{\top}}\) on the diagonal.
It turns out that \(\mathbf{D}\) and \(\mathbf{D^{\prime}}\) have the same non-zero diagonal entries except that the order might be different.
We can write SVD for any real \(m \times n\) matrix as
\[ \mathbf{A=U \Sigma V^{\top}} \]
where \(\mathbf{U}\) is an \(m \times m\) orthogonal matrix whose columns are the eigenvectors of \(\mathbf{A A^{\top}}\), \(\mathbf{V}\) is an \(n \times n\) orthogonal matrix whose columns are the eigenvectors of \(\mathbf{A^{\top} A}\), and \(\mathbf{\Sigma}\) is an \(m \times n\) diagonal matrix of the form:
\[ \mathbf{\Sigma}=\left(\begin{array}{cccc} \sigma_{1} & & & \\ & \ddots & \\ & & \sigma_{n} & \\ 0 & 0 & 0 \\ 0 & 0 &0 \\ \end{array}\right) \] with \(\sigma_{1} \geq \sigma_{2} \geq \cdots \geq \sigma_{n}>0\) . The number of non-zero singular values is equal to the rank of \(\operatorname{rank}(\mathbf{A})\). In \(\mathbf{\Sigma}\) above, \(\sigma_{1}, \ldots, \sigma_{n}\) are the square roots of the eigenvalues of \(\mathbf{A^{\top} A}\). They are called the singular values of \(\mathbf{A}\).
One important point is that, although \(\mathbf{U}\) in \(\mathbf{U \Sigma V^{\top}}\) is \(m \times m\), when it is multiplied by \(\mathbf{\Sigma}\), it reduces to \(n \times n\) due to zeros in \(\mathbf{\Sigma}\). Hence, we can actually select only those in \(\mathbf{U}\) that are not going to be zeroed out due to that multiplication. When we take only \(n \times n\) from \(\mathbf{U}\) matrix, it is called “Economy SVD”, \(\mathbf{\hat{U} \hat{\Sigma} V^{\top}}\), where all matrices will be \(n \times n\).
The singular value decomposition is very useful when our basic goal is to “solve” the system \(\mathbf{A} x=b\) for all matrices \(\mathbf{A}\) and vectors \(b\) with a numerically stable algorithm. Some important applications of the SVD include computing the pseudoinverse, matrix approximation, and determining the rank, range, and null space of a matrix. We will see some of them in the following chapters
Here is an example:
set.seed(104)
<- matrix(sample(100, 12), 3, 4)
A A
## [,1] [,2] [,3] [,4]
## [1,] 77 24 32 78
## [2,] 67 61 39 96
## [3,] 34 94 42 28
<- svd(A)
svda svda
## $d
## [1] 199.83933 70.03623 16.09872
##
## $u
## [,1] [,2] [,3]
## [1,] -0.5515235 0.5259321 -0.6474699
## [2,] -0.6841400 0.1588989 0.7118312
## [3,] -0.4772571 -0.8355517 -0.2721747
##
## $v
## [,1] [,2] [,3]
## [1,] -0.5230774 0.3246068 -0.7091515
## [2,] -0.4995577 -0.8028224 0.1427447
## [3,] -0.3221338 -0.1722864 -0.2726277
## [4,] -0.6107880 0.4694933 0.6343518
# Singular values = sqrt(eigenvalues of t(A)%*%A))
<- eigen(t(A) %*% A)$values
ev round(sqrt(ev), 5)
## [1] 199.83933 70.03623 16.09872 0.00000
Note that this ““Economy SVD” using only the non-zero eigenvalues and their respective eigenvectors.
<- svda$u %*% diag(svda$d) %*% t(svda$v)
Ar Ar
## [,1] [,2] [,3] [,4]
## [1,] 77 24 32 78
## [2,] 67 61 39 96
## [3,] 34 94 42 28
As we use SVD in the following chapter, its usefulness will be obvious.