Weighted Non-Linear Least Squares
Problem Statement
Consider a weighted averaging problem similar as prevous example. Let \(x \in \mathbb{R}\) be a constant hidden variable to be estimated and \(z_i\) be multiple measurements. Assume all measurements are i.i.d and perturbed by noise \(\mathcal{N}(0, \sigma_i)\). Consider each measurement has its own non-linear measurement function:
\[
z_i = h_i(x) + \epsilon.
\]
Solving with Factor Graph
Non-Linear to Linear Least Squares
An estimator can be found as:
\[
\hat{x} = \arg \min_{x} \sum^{n}_{i = 1} ||h_i(x) - z_i||^2_{\sigma_i}.
\]
We can replace \(h_i(x)\) with an affine approximation by linearizing the non-linear function:
\[
h_i(x) \approx H_i \Delta x + h_i(x_0),
\]
where \(H_i = \left. \frac{\partial h_i(x)}{\partial x} \right|_{x = x_0}\) and \(\Delta x = x - x_0\).
Then:
\[
\begin{align}
\hat{x} = x_0 +
\left(
\Delta x = \arg \min_{\Delta \mathbf{x}} \sum^n_{i = 1} || H_i \Delta x + h_i(\mathbf{x}_0) - z_i ||^2_{\sigma_i}
\right),
\end{align}
\]
\[
\begin{align}
\mathbf{L} &= \left[
\begin{array}{cccc}
\frac{H_1}{\sigma_1} & \frac{H_2}{\sigma_2} & \cdots & \frac{H_n}{\sigma_n}
\end{array}
\right]^T, \\
\mathbf{y} &= \left[
\begin{array}{cccc}
\frac{z_1 - h_1(x_0)}{\sigma_1} & \frac{z_2- h_2(x_0)}{\sigma_2} & \cdots & \frac{z_n - h_n(x_0)}{\sigma_n}
\end{array}
\right], \\
\Delta x &= \mathbf{L}^{\dagger} \mathbf{y}.
\end{align}
\]
Note that \(H_i\) is the negative derivative of the residual vector \(\mathbf{y}\) and is not constant. When \(\hat{x}\) changes, \(H_i\)s change as well.
Algorithm: Non-Linear Least Squares
\(\quad\)Input: \(\mathbf{x}_0\), \(\mathbf{R}\)
\(\quad\)Result: \(\hat{\mathbf{x}}\)
\(\quad\)Initialize: \(\hat{\mathbf{x}} = \mathbf{x}_0\)
\(\quad\)Repeat
\(\quad\quad\) \(\mathbf{H} = \left. \frac{\partial h(\mathbf{x})}{\partial \mathbf{x}} \right|_{\mathbf{x} = \hat{\mathbf{x}}}\)
\(\quad\quad\) \(\Delta \mathbf{x} = \arg \min_{\Delta \mathbf{x}} \sum^n_{i = 1} || \mathbf{H} \Delta \mathbf{x} + h(\mathbf{x_0}) - \mathbf{z}_i ||^2_{\mathbf{R}}\) - Run (linear) least squares
\(\quad\quad\) \(\hat{\mathbf{x}} = \hat{\mathbf{x}} + \Delta \mathbf{x}\)
\(\quad\)until convergence.
Numerical Example
Let \(x\) be a scalar state to be estimated. Assume that the true value of \(x\) is \(x = 200\).
Consider two noisy measurements of \(x\) (hidden variable):
\[
\begin{align}
z_1 &= -7800.52, \quad (\text{noiseless} = -7795) \\
z_2 &= 605.79, \quad (\text{noiseless} = 605).
\end{align}
\]
Let the non-linear measurement functions be:
\[
\begin{alignat}{2}
h_1(x) &= 0.05 (x + 10)^2 - 10000, \quad &&\sigma_1 = 100 \\
h_2(x) &= 3x + 5, \quad &&\sigma_2 = 1.
\end{alignat}
\]
The Jacobian is:
\[
\mathbf{J} =
\left[
\begin{array}{c}
\frac{\partial h_1}{\partial x} \\
\frac{\partial h_2}{\partial x}
\end{array}
\right] =
\left[
\begin{array}{c}
0.1 (x + 10) \\
3
\end{array}
\right].
\]
Iteration 1
Start with estimate \(\hat{x} = 0\). We have:
\[
\begin{align}
\mathbf{L} &=
\left[
\begin{array}{c}
\sigma^{-1/2}_1 \left. \frac{\partial h_1}{\partial x} \right|_{x = \hat{x}} \\
\sigma^{-1/2}_2 \left. \frac{\partial h_2}{\partial x} \right|_{x = \hat{x}} \\
\end{array}
\right]
=
\left[
\begin{array}{c}
0.1 \\
3.0
\end{array}
\right] \\
\mathbf{y} &=
\left[
\begin{array}{c}
\sigma^{-1/2}_1 (z_1 - h_1(\hat{x})) \\
\sigma^{-1/2}_2 (z_2 - h_2(\hat{x})) \\
\end{array}
\right]
\approx
\left[
\begin{array}{c}
219.448 \\
600.79
\end{array}
\right] \\
\Delta x &= \mathbf{L}^{\dagger} \mathbf{y} \approx 202.48.
\end{align}
\]
Iteration 2
Start with estimate \(\hat{x} \approx 202.48\). We have:
\[
\begin{align}
\mathbf{L} &=
\left[
\begin{array}{c}
\sigma^{-1/2}_1 \left. \frac{\partial h_1}{\partial x} \right|_{x = \hat{x}} \\
\sigma^{-1/2}_2 \left. \frac{\partial h_2}{\partial x} \right|_{x = \hat{x}} \\
\end{array}
\right]
=
\left[
\begin{array}{c}
2.12 \\
3.0
\end{array}
\right] \\
\mathbf{y} &=
\left[
\begin{array}{c}
\sigma^{-1/2}_1 (z_1 - h_1(\hat{x})) \\
\sigma^{-1/2}_2 (z_2 - h_2(\hat{x})) \\
\end{array}
\right]
\approx
\left[
\begin{array}{c}
-5.78 \\
-6.64
\end{array}
\right] \\
\Delta x &= \mathbf{L}^{\dagger} \mathbf{y} \approx -2.38.
\end{align}
\]
Iteration 3
Start with estimate \(\hat{x} = 202.48 - 2.38 = 200.1\). We have:
\[
\begin{align}
\mathbf{L} &=
\left[
\begin{array}{c}
\sigma^{-1/2}_1 \left. \frac{\partial h_1}{\partial x} \right|_{x = \hat{x}} \\
\sigma^{-1/2}_2 \left. \frac{\partial h_2}{\partial x} \right|_{x = \hat{x}} \\
\end{array}
\right]
=
\left[
\begin{array}{c}
2.10 \\
3.0
\end{array}
\right] \\
\mathbf{y} &=
\left[
\begin{array}{c}
\sigma^{-1/2}_1 (z_1 - h_1(\hat{x})) \\
\sigma^{-1/2}_2 (z_2 - h_2(\hat{x})) \\
\end{array}
\right]
\approx
\left[
\begin{array}{c}
-0.74 \\
0.51
\end{array}
\right] \\
\Delta x &= \mathbf{L}^{\dagger} \mathbf{y} \approx -0.003.
\end{align}
\]
Notes
- Non-linear optimization presented so far is Gauss-Newton optimization routine. In general, any techniques in literature for improving Gauss-Newton can be applied here. Other optimizations are also often applied for very non-linear systems (e.g., Levenburg-Marquardt, Dogleg, etc.).
- Convergence is not guaranteed. Must be "close enough" to converge.