Matrix Calculus
Definition
There are two layouts for computing partial derivatives. Consider a vector function \(\mathbf{y}\) and a vector \(\mathbf{x}\). The two commonly used layouts are:
- Numerator layout which lays out according to \(\mathbf{y}\) and \(\mathbf{x}^T\).
- Denominator layout which lays out according to \(\mathbf{y}^T\) and \(\mathbf{x}\).
In general, to transform from one layout to another, we can just take the transpose of the result.
Layouts
Type | Numerator Layout | Denominator Layout |
---|---|---|
Vector-by-Scalar Consider a scalar \(x\) and a column vector \(\mathbf{y} = \left[\begin{array}{cccc}y_1 & y_2 & \ldots & y_m\end{array}\right]^T\) |
The derivative will be \(m \times 1\) column vector: \(\frac{\partial \mathbf{y}}{\partial x} = \left[\begin{array}{c}\frac{\partial y_1}{\partial x} \\ \frac{\partial y_2}{\partial x} \\ \vdots \\ \frac{\partial y_m}{\partial x}\end{array}\right]\) |
The derivative will be \(1 \times m\) row vector: \(\frac{\partial \mathbf{y}}{\partial x} = \left[\begin{array}{cccc}\frac{\partial y_1}{\partial x} &\frac{\partial y_2}{\partial x} & \ldots &\frac{\partial y_m}{\partial x}\end{array}\right]\) |
Scalar-by-Vector Consider a scalar \(y\) and a column vector \(\mathbf{x} = \left[\begin{array}{cccc}x_1 & x_2 & \ldots & x_n\end{array}\right]^T\) |
The derivative will be \(1 \times n\) vector: \(\frac{\partial y}{\partial \mathbf{x}} = \left[\begin{array}{cccc}\frac{\partial y}{\partial x_1} & \frac{\partial y}{\partial x_2} & \ldots & \frac{\partial y}{\partial x_n}\end{array}\right]\) |
The derivative will be \(n \times 1\) vector: \(\frac{\partial y}{\partial \mathbf{x}} = \left[\begin{array}{cccc}\frac{\partial y}{\partial x_1} \\ \frac{\partial y}{\partial x_2} \\ \vdots \\ \frac{\partial y}{\partial x_n}\end{array}\right]\) |
Vector-by-Vector Consider column vectors \(\mathbf{y} = \left[\begin{array}{cccc}y_1 & y_2 & \ldots & y_m\end{array}\right]^T\) and \(\mathbf{x} = \left[\begin{array}{cccc}x_1 & x_2 & \ldots & x_n\end{array}\right]^T\) |
The derivative will be \(m \times n\) matrix: \(\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \left[\begin{array}{cccc}\frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \ldots & \frac{\partial y_1}{\partial x_n} \\\frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \ldots & \frac{\partial y_2}{\partial x_n} \\\vdots & \vdots & \ddots & \vdots \\\frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \ldots & \frac{\partial y_m}{\partial x_n}\end{array}\right]\) |
The derivative will be \(n \times m\) matrix: \(\frac{\partial \mathbf{y}}{\partial \mathbf{x}} = \left[\begin{array}{cccc}\frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \ldots & \frac{\partial y_m}{\partial x_1} \\\frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \ldots & \frac{\partial y_m}{\partial x_2} \\\vdots & \vdots & \ddots & \vdots \\\frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} & \ldots & \frac{\partial y_m}{\partial x_n}\end{array}\right]\) |
Scalar-by-Matrix Consider a scalar function \(y\) and a \(p \times q\) matrix \(\mathbf{X}\) |
The derivative will be \(q \times p\) matrix: \(\frac{\partial y}{\partial \mathbf{X}} = \left[\begin{array}{cccc}\frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \ldots & \frac{\partial y}{\partial x_{p1}} \\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \ldots & \frac{\partial y}{\partial x_{p2}} \\ \vdots & \vdots & \ddots & \vdots \\\frac{\partial y}{\partial x_{1q}} & \frac{\partial y}{\partial x_{2q}} & \ldots & \frac{\partial y}{\partial x_{pq}}\end{array}\right]\) |
The derivative will be \(p \times q\) matrix: \(\frac{\partial y}{\partial \mathbf{X}} = \left[\begin{array}{cccc}\frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \ldots & \frac{\partial y}{\partial x_{1q}} \\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \ldots & \frac{\partial y}{\partial x_{2q}} \\ \vdots & \vdots & \ddots & \vdots \\\frac{\partial y}{\partial x_{p1}} & \frac{\partial y}{\partial x_{p2}} & \ldots & \frac{\partial y}{\partial x_{pq}}\end{array}\right]\) |
Matrix-by-Scalar Consider a \(m \times n\) matrix function \(\mathbf{Y}\) and a scalar \(x\) |
The derivative will be \(m \times n\) matrix: \(\frac{\partial \mathbf{Y}}{\partial x} = \left[\begin{array}{cccc}\frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \ldots & \frac{\partial y_{1n}}{\partial x} \\ \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \ldots & \frac{\partial y_{2n}}{\partial x} \\ \vdots & \vdots & \ddots & \vdots \\\frac{\partial y_{m1}}{\partial x} & \frac{\partial y_{m2}}{\partial x} & \ldots & \frac{\partial y_{mn}}{\partial x} \\ \end{array}\right]\) |
The derivative will be \(n \times m\) matrix: \(\frac{\partial \mathbf{Y}}{\partial x} = \left[\begin{array}{cccc}\frac{\partial y_{11}}{\partial x} & \frac{\partial y_{21}}{\partial x} & \ldots & \frac{\partial y_{m1}}{\partial x} \\ \frac{\partial y_{12}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \ldots & \frac{\partial y_{m2}}{\partial x} \\ \vdots & \vdots & \ddots & \vdots \\\frac{\partial y_{1n}}{\partial x} & \frac{\partial y_{2n}}{\partial x} & \ldots & \frac{\partial y_{mn}}{\partial x} \\ \end{array}\right]\) |
Useful Identities
Type | Identity | Notes |
---|---|---|
Vector-by-Vector (numerator) | \(\begin{alignat*}{2}&\frac{\partial (\mathbf{A} \mathbf{x})}{\mathbf{x}} = \mathbf{A}, \\&\mathbf{x}^T \mathbf{y} = x_1 y_1 + \ldots + x_n y_n, \\&\frac{\partial (\mathbf{y}^T \mathbf{x} )}{\partial \mathbf{x}} = \frac{\partial (\mathbf{x}^T \mathbf{y})}{\partial \mathbf{x}} &&=\left[\begin{array}{ccc}\partial (\mathbf{x}^T \mathbf{y}) / \partial x_1 & \ldots & \partial (\mathbf{x}^T \mathbf{y}) / \partial x_n\end{array}\right] \\& &&=\left[\begin{array}{ccc}y_1 & \ldots & y_n\end{array}\right] \\ & &&= \mathbf{y}^T.\end{alignat*}\) | The denominator layout will yield in \(\mathbf{A}^T\) and \(\mathbf{y}\) |
Scalar-by-Vector | \(\begin{align*}\frac{\partial (\mathbf{x}^T \mathbf{A} \mathbf{x})}{\mathbf{x}} &=\left[\begin{array}{ccc}\partial (\mathbf{x}^T \mathbf{A} \mathbf{x}) / \partial x_1 &\ldots &\partial (\mathbf{x}^T \mathbf{A} \mathbf{x}) / \partial x_n\end{array}\right] \\&= \left[\begin{array}{ccc}\sum_j x_j A_{1j} + \sum_i x_i A_{i1} & \ldots &\sum_j x_j A_{nj} + \sum_i x_i A_{in}\end{array}\right] \\&= \left[\begin{array}{ccc}\sum_j x_j A_{1j} & \ldots & \sum_j x_j A_{nj}\end{array}\right] + \left[\begin{array}{ccc}\sum_i x_i A_{i1} & \ldots & \sum_i x_i A_{in}\end{array}\right] \\&= \mathbf{x}^T \mathbf{A}^T + \mathbf{x}^T \mathbf{A}.\end{align*}\) | If \(\mathbf{A}\) is a symmetric matrix, then \(\mathbf{A} = \mathbf{A}^T\), which yields to: \(\frac{\partial (\mathbf{x}^T \mathbf{A} \mathbf{x})}{\mathbf{x}} = 2 \mathbf{x}^T \mathbf{A}.\) Again this is in numerator layout. In denominator layout it will be \(2 \mathbf{A}^T \mathbf{x}\). |