Singular Value Decomposition
Definition
Theorem (Singular Value Decomposition)
Given a matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\), there exist orthogonal square matrices:
such that:
where \(\sigma_1 \geq \sigma_2 \geq \ldots \geq \sigma_p \geq 0\). Equivalently:
The number of non-zero singular values is equal to the rank of \(\mathbf{A}\). The columns of \(\mathbf{U}\) and the columns of \(\mathbf{V}\) are called left-singular vectors and right-singular vectors of \(\mathbf{A}\), respectively. They form two sets of orthonormal bases \(\mathbf{u}_1, \ldots, \mathbf{u}_m\) and \(\mathbf{v}_1, \ldots, \mathbf{v}_n\).
For a square matrix \(\mathbf{A}\), if they are sorted so that the singular values of \(\sigma_i\) with value zero are all in the highest-numbered columns, the SVD can be written as:
where \(r\) is the rank of \(\mathbf{A}\).
Separate visualizations of the SVD are required depending upon whether \(\mathbf{A}\) has more rows or columns. 3-by-2 and 2-by-3 examples are:
Properties
Property 1
If \(\mathbf{U}^T \mathbf{A} \mathbf{V} = \boldsymbol{\Sigma}\) is the SVD of \(\mathbf{A} \in \mathbb{R}^{m \times n}\) and \(m \geq n\), then for \(i = 1 \ldots, n\):
The geometry behind this result is that the singular values of the matrix \(\mathbf{A}\) are the lengths of the semiaxes of the hyperellipsoid \(E\) defined by \(E = \left\{ \mathbf{A} \mathbf{x}: \ ||\mathbf{x}||_2 = 1 \right\}\). The semiaxes directions are defined by the \(\mathbf{u}_i\) and their lengths are the singular values. From the property:
for \(i = 1, \ldots, n\). This shows that there is an intimate connection between the SVD of \(\mathbf{A}\) and the eigensystems of the symmetric matrices \(\mathbf{A}^T \mathbf{A}\) and \(\mathbf{A} \mathbf{A}^T\).
If \(\mathbf{M}\) is a symmetric matrix with eigenvalues \(\lambda_1, \ldots, \lambda_n\) and orthonormal eigenvectors \(\mathbf{u}_1, \ldots, \mathbf{u}_n\), i.e., \(\mathbf{M} = \mathbf{U} \boldsymbol{\Lambda} \mathbf{U}^T\), where \(\mathbf{U} = \left[ \begin{array}{ccc} \mathbf{u}_1, \ldots, \mathbf{u}_n \end{array}\right]\), then \(\mathbf{M = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^T}\), where \(\sigma_i = |\lambda_i|\) and \(\mathbf{v}_i = \text{sign}(\lambda_i) \mathbf{u}_i\).
Eigenvalues of \(\mathbf{A}^T \mathbf{A}\) are \(\sigma^2_1, \ldots, \sigma^2_n\) and the eigenvectors are \(\mathbf{v}_1, \ldots, \mathbf{v}_n\):
Property 2
If \(\mathbf{A} \in \mathbb{R}^{m \times n}\), then:
where \(p = \min \left\{ m, n \right\}\).
If \(\mathbf{A}\) is perturbed by a matrix \(\mathbf{E}\), then no singular value can move by more than \(||\mathbf{E}||_2\). The following property identifies two useful instances of this result.
Property 3
If \(\mathbf{A} \in \mathbb{R}^{m \times n}\) and \(\mathbf{E} \in \mathbb{R}^{m \times n}\), then:
Applications
Pseudo-Inverse
Given:
the pseudo-inverse of \(\mathbf{A}\), denoted as \(\mathbf{A}^\dagger\) can be computed as:
where \(\ddagger\) implies inverting every non-zero diagonal element. See Least Squares for more information.
Nearest Orthogonal Matrix
Given a real square matrix \(\mathbf{A}\), the orthogonal matrix \(\mathbf{O}\) closest to \(\mathbf{A}\) is \(\mathbf{U} \mathbf{V}^T\). The closeness of fit is measured by the Frobenius norm of \(\mathbf{O} - \mathbf{A}\). The orthogonal matrix would have the decomposition \(\mathbf{O} = \mathbf{U} \mathbf{I} \mathbf{V}^T\), so that if \(\mathbf{A} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^T\), then the product \(\mathbf{A} = \mathbf{U} \mathbf{V}^T\) amounts to replacing the singular values with ones.
Kabsch Algorithm
See Kabsch Algorithm.
Low-Rank Matrix Approximation
Given a full rank matrix \(\mathbf{A} \in \mathbb{R}^{m \times n}\), we can use SVD to find a "simpler" matrix \(\tilde{\mathbf{A}}\) that "approximates" \(\mathbf{A}\) well.
-
To define the criterion for a "simple" matrix, we can use the rank of the matrix:
\[ \text{rank}(\tilde{\mathbf{A}}) = r \ll \text{rank}(\mathbf{A}). \] -
To define the criterion for "approximation", we can use \(\mathcal{l}_2\) norm and \(\epsilon\):
\[ ||\tilde{\mathbf{A}} - \mathbf{A}||_2 < \epsilon. \]
\(\tilde{\mathbf{A}}\) is called low rank matrix. Then the low-rank matrix approximation can turn into an optimization problem:
which is a least squares problem with a constraint.
Theorem. Let \(\mathbf{A} = \sum^n_{i = 1} \sigma_i \mathbf{u}_i \mathbf{v}^T_i = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}^T\) be the SVD of the \(m \times n\) matrix \(\mathbf{A}\) with \(m \geq n\), where \(\mathbf{U} = \left[ \begin{array}{ccc} \mathbf{u}_1, \ldots, \mathbf{u}_m \end{array} \right]\), \(\mathbf{V} = \left[ \begin{array}{ccc} \mathbf{v}_1, \ldots, \mathbf{v}_n \end{array} \right]\), and \(\boldsymbol{\Sigma} = \text{diag}(\sigma_1, \ldots, \sigma_n)\) with \(\sigma_1 \geq \ldots \geq \sigma_n\). Then the rank \(r\) matrix \(\tilde{\mathbf{A}}\) closest to \(\mathbf{A}\) in \(\mathcal{l}_2\) norm is given by \(\tilde{\mathbf{A}} = \sum^{r}_{i = 1} \sigma_i \mathbf{u}_i \mathbf{v}^T_i = \mathbf{U} \boldsymbol{\Sigma}_r \mathbf{v}^T\), where \(\boldsymbol{\Sigma}_r = \text{diag}(\sigma_1, \ldots, \sigma_r, \mathbf{0})\). Furthemore, \(||\tilde{\mathbf{A}} - \mathbf{A}||_2 = \sigma_{r + 1}\).
An example of low-rank matrix approximation is image compression. See Compression via Low-Rank Matrix Approximation.