Linear Recursive Least Squares
Problem Statement
Suppose we have an estimate \(\hat{\mathbf{x}}\) after \((k - 1)\) measurements, and we obtain a new measurement \(y_k\) at time \(k\). If we obtain measurements sequentially and want to update our estimate of the constant \(\mathbf{x}\) with each new measurement, we need to augment the observation matrix \(\mathbf{H} \in \mathbb{R}^{k \times n}\) matrix and completely recompute the estimate \(\hat{\mathbf{x}}\). Recursive least squares avoids this recomputation.
A linear recursive least squares estimator can be written as:
\[
\begin{align}
y_k &= \mathbf{H}_k \mathbf{x} + v_k \\
\hat{\mathbf{x}}_k &= \hat{\mathbf{x}}_{k - 1} + \mathbf{K}_{k} \left( y_k - \mathbf{H}_k \hat{\mathbf{x}}_{k - 1} \right),
\end{align}
\]
where \(\mathbf{K}_k\) is the estimator gain matrix. Here, we compute \(\hat{\mathbf{x}}_k\) on the basis of the previous estimate \(\hat{\mathbf{x}}_{k - 1}\) and the new measurement \(y_k\). The correction term \(y_k - \mathbf{H}_k \hat{\mathbf{x}}_{k - 1}\) applied to the previous term \(\hat{\mathbf{x}}\) is called the residual or innovation.
Solution to Linear Recursive Least Squares
The estimation error after \(k\) measurements can be defined as:
\[
\boldsymbol{\epsilon}_{\mathbf{x}, k} = \mathbf{x} - \hat{\mathbf{x}}_k.
\]
Then the mean or the expectation of the estimation error is:
\[
\begin{align}
\mathbb{E} \left[
\boldsymbol{\epsilon}_{\mathbf{x}, k} \right] &= \mathbb{E} \left[ \mathbf{x} - \hat{\mathbf{x}}_k \right] \\
&= \mathbb{E} \left[ \mathbf{x} - \hat{\mathbf{x}}_{k - 1} - \mathbf{K}_k \left( y_k - \mathbf{H}_k \hat{\mathbf{x}}_{k - 1} \right) \right] \\
&= \mathbb{E} \left[ \boldsymbol{\epsilon}_{\mathbf{x}, k - 1} - \mathbf{K}_k \left(\mathbf{H}_k \mathbf{x} + v_k - \mathbf{H}_k \hat{\mathbf{x}}_{k - 1} \right) \right] \\
&= \mathbb{E}\left[ \boldsymbol{\epsilon}_{\mathbf{x}, k - 1} - \mathbf{K}_k \mathbf{H}_k( \mathbf{x} - \hat{\mathbf{x}}_{k - 1}) - \mathbf{K}_k v_k \right] \\
&= \left( \mathbf{I} - \mathbf{K}_k \mathbf{H}_k \right) \mathbb{E}(\boldsymbol{\epsilon}_{\mathbf{x}, k-1}) - \mathbf{K}_k \mathbb{E}(v_k).
\end{align}
\]
Unbiased Estimator
If the measurement noise \(v_k\) is zero mean for all \(k\), and if the initial estimate of \(\mathbf{x}\) is set to the expected value of \(\mathbf{x}\) (i.e., \(\hat{\mathbf{x}}_0 = \mathbb{E}\left[ \mathbf{x} \right]\)), then the expected value of \(\hat{\mathbf{x}}_k\) will be equal to \(\mathbf{x}\) for all \(k\) (\(\mathbb{E}\left[\boldsymbol{\epsilon}_{\mathbf{x}, k}\right] = \mathbf{0}\)). This is called an unbiased estimator.
The cost function can be defined as the sum of the variances of the estimation errors at time \(k\):
\[
\begin{align}
\mathbf{J}_k &= \mathbb{E} \left[(x_1 - \hat{x}_{1, k})^2 \right] + \cdots + \mathbb{E} \left[(x_n - \hat{x}_{n, k})^2 \right] \\
&= \mathbb{E} \left[\epsilon^2_{x1, k} + \cdots + \epsilon^2_{xn, k} \right] \\
&= \mathbb{E} \left[\boldsymbol{\epsilon}^T_{\mathbf{x}, k} \boldsymbol{\epsilon}_{\mathbf{x}, k} \right] \\
&= \mathbf{E} \left [ \text{Tr} \left(\boldsymbol{\epsilon}^T_{\mathbf{x}, k} \boldsymbol{\epsilon}_{\mathbf{x}, k} \right) \right] \\
&= \text{Tr} \mathbf{P}_k,
\end{align}
\]
where \(\mathbf{P}_k\) is the estimation-error covariance at time \(k\) and can be found as follows:
\[
\begin{align}
\mathbf{P}_k &= \mathbb{E} \left[\boldsymbol{\epsilon}^T_{\mathbf{x}, k} \boldsymbol{\epsilon}_{\mathbf{x}, k} \right] \\
&= \mathbb{E} \left[ \left( \mathbf{x} - \hat{\mathbf{x}}_k \right) \left( \mathbf{x} - \hat{\mathbf{x}}_k \right)^T \right] \\
&= \mathbb{E} \left[ \left(\mathbf{x} - \mathbf{\hat{x}}_{k - 1} - \mathbf{K}_k (\mathbf{H}_k \mathbf{x} + v_k - \mathbf{H}_k \hat{\mathbf{x}}_{k - 1}) \right) \left( \cdots \right)^T \right] \\
&= \mathbb{E} \left[ \left( \boldsymbol{\epsilon}_{\mathbf{x}, k - 1} - \mathbf{K}_k \mathbf{H}_k \boldsymbol{\epsilon}_{\mathbf{x}, k - 1} + \mathbf{K}_k v_k \right) \left( \cdots \right)^T \right] \\
&= \mathbb{E} \left[ \left( (\mathbf{I} - \mathbf{K}_k \mathbf{H}_k)\boldsymbol{\epsilon}_{\mathbf{x}, k - 1} + \mathbf{K}_k \right) \left( \cdots \right)^T \right] \\
&= \left( \mathbf{I} - \mathbf{K}_k \mathbf{H}_k \right) \mathbb{E} \left[ \boldsymbol{\epsilon}_{\mathbf{x}, k - 1} \boldsymbol{\epsilon}^T_{\mathbf{x}, k - 1} \right] \left( \mathbf{I} - \mathbf{K}_k \mathbf{H}_k \right)^T \\
&\quad - \mathbf{K}_k \mathbb{E} \left[v_k \boldsymbol{\epsilon}^T_{\mathbf{x}, k - 1} \right] \left( \mathbf{I} - \mathbf{K}_k \mathbf{H}_k \right)^T \\
&\quad - \left( \mathbf{I} - \mathbf{K}_k \mathbf{H}_k \right) \mathbb{E} \left[ \boldsymbol{\epsilon}_{\mathbf{x}, k - 1} v^T_k \ \right] \mathbf{K}^T_k \\
&\quad + \mathbf{K}_k \mathbb{E}\left[v_k v^T_k \right] \mathbf{K}^T_k.
\end{align}
\]
The estimation error at time \(k - 1\), \(\boldsymbol{\epsilon}_{\mathbf{x}, k}\), is independent of the measurement noise at time \(k\), \(v_k\):
\[
\mathbb{E}\left[ v_k \boldsymbol{\epsilon}^T_{\mathbf{x}, k - 1} \right] = \mathbb{E}\left[ v_k \right] \mathbb{E} \left[ \boldsymbol{\epsilon}^T_{\mathbf{x}, k - 1} \right] = \mathbf{0}.
\]
Hence, the estimation-error covariance is:
\[
\mathbf{P}_k = \left( \mathbf{I} - \mathbf{K}_k \mathbf{H}_k \right) \mathbf{P}_{k - 1} \left( \mathbf{I} - \mathbf{K}_k \mathbf{H}_k \right)^T + \mathbf{K}_k \mathbf{R}_k \mathbf{K}^T_k,
\]
where \(\mathbf{R}_k\) is the covariance of \(v_k\).
To optimality condition is:
\[
\frac{ \partial \mathbf{J}_k}{ \partial \mathbf{K}_k} = 2 \left( \mathbf{I} - \mathbf{K}_k \mathbf{H}_k \right) \mathbf{P}_{k - 1} \left( - \mathbf{H}^T_k \right) + 2 \mathbf{K}_k \mathbf{R}_k = \mathbf{0},
\]
since:
\[
\frac{\partial \text{Tr}(\mathbf{A} \mathbf{B} \mathbf{A}^T)}{\partial \mathbf{A}} = 2 \mathbf{A} \mathbf{B},
\]
given \(\mathbf{B}\) is symmetric. Hence, the recursive least squares update equations are:
\[
\begin{align}
\mathbf{K}_k &= \mathbf{P}_{k - 1} \mathbf{H}^T_k \left( \mathbf{H}_k \mathbf{P}_{k - 1} \mathbf{H}^T + \mathbf{R}_k \right)^{-1} \\
\hat{\mathbf{x}}_k &= \hat{\mathbf{x}}_{k - 1} + \mathbf{K}_k \left(y_k - \mathbf{H}_k \hat{\mathbf{x}}_{k - 1} \right) \\
\mathbf{P}_k &= \left( \mathbf{I} - \mathbf{K}_k \mathbf{H}_k \right) \mathbf{P}_{k - 1} \left( \mathbf{I} - \mathbf{K}_k \mathbf{H}_k \right)^T + \mathbf{K}_k \mathbf{R}_k \mathbf{K}^T_k \\
&= \left( \mathbf{I} - \mathbf{K}_k \mathbf{H}_k \right) \mathbf{P}_{k - 1}.
\end{align}
\]