Linear Weighted Least Squares

Problem Statement

Consider a problem of estimating a constant but unknown \(\mathbf{x} \in \mathbb{R}^n\) given \(m\) measurements, \(\mathbf{y} \in \mathbb{R}^m\) where \(m > n\). Assume that the measurement model is linear:

\[ \mathbf{y} = \mathbf{H} \mathbf{x} + \boldsymbol{v} \label{least_squares_main}, \]

where \(\mathbf{H} \in \mathbb{R}^{m \times n}\) is the linear observation matrix and \(\boldsymbol{v} \in \mathbb{R}^m\) is the measurement noise. Assume that the noise for each measurement is zero-mean and independent:

\[ \mathbb{E} \left[v^2_i \right] = \sigma^2_i, \quad i = 1, \ldots, m. \]

The measurement covariance matrix is:

\[ \begin{align} \mathbf{R} &= E(\boldsymbol{v} \boldsymbol{v}^T) \\ &= \left[ \begin{array}{ccc} \sigma^2_1 & \ldots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \ldots & \sigma^2_m \end{array} \right] \end{align} \]

The measurement residual is defined as:

\[ \boldsymbol{\epsilon}_{\mathbf{y}} = \mathbf{y} - \mathbf{H} \hat{\mathbf{x}}, \]

where \(\hat{\mathbf{x}}\) is the "best" estimate of \(\mathbf{x}\).

Solution to Linear Weighted Least Squares

We will define the cost function as a weighted sum of squares of the residuals:

\[ \begin{align} \mathbf{J} &= \boldsymbol{\epsilon}^T_{\mathbf{y}} \mathbf{R}^{-1} \boldsymbol{\epsilon}_{\mathbf{y}} \\ &= \left[ \begin{array}{ccc} \epsilon_{y1} & \ldots & \epsilon_{ym} \end{array} \right] \left[ \begin{array}{ccc} 1/\sigma^2_1 & \ldots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \ldots & 1/\sigma^2_m \end{array} \right] \left[ \begin{array}{c} \epsilon_{y1} \\ \vdots \\ \epsilon_{ym} \end{array} \right] \\ &= \epsilon^2_{y1} / \sigma^2_1 + \ldots + \epsilon^2_{ym} / \sigma^2_{m}. \end{align} \]

Hence, the weighted least squares problem can be formulated as:

\[ \begin{align} \arg \min_{\hat{\mathbf{x}}} \quad \mathbf{R}^{-1} || \mathbf{H} \hat{\mathbf{x}} - \mathbf{y} ||^2_2 = \arg \min_{\hat{\mathbf{x}}} \quad \mathbf{R}^{-1} \sqrt{\sum^m_{i = 1} \left( \mathbf{h}^T_i \hat{\mathbf{x}} - y_i \right)^2}. \end{align} \]

Rewrite the objective function in terms of the optimization variable as:

\[ \begin{align} \mathbf{J} &= \boldsymbol{\epsilon}^T_{\mathbf{y}} \mathbf{R}^{-1} \boldsymbol{\epsilon}_{\mathbf{y}} \\ &= (\mathbf{y} - \mathbf{H} \hat{\mathbf{x}})^T \mathbf{R}^{-1} (\mathbf{y} - \mathbf{H} \hat{\mathbf{x}}) \\ &= (\mathbf{y}^T \mathbf{R}^{-1} - \hat{\mathbf{x}}^T \mathbf{H}^T \mathbf{R}^{-1}) (\mathbf{y} - \mathbf{H} \hat{\mathbf{x}}) \\ &= \mathbf{y}^T \mathbf{R}^{-1} \mathbf{y} - \mathbf{y}^T \mathbf{R}^{-1} \mathbf{H} \hat{\mathbf{x}} - \hat{\mathbf{x}}^T \mathbf{H}^T \mathbf{R}^{-1} \mathbf{y} + \hat{\mathbf{x}}^T \mathbf{H}^T \mathbf{R}^{-1} \mathbf{H} \hat{\mathbf{x}}. \end{align} \]

The optimization variable is \(\hat{\mathbf{x}}\) and \(\mathbf{J}\) is a quadratic convex function. Hence, the optimality condition is:

\[ \nabla_{\hat{\mathbf{x}}} \mathbf{J} = -2\mathbf{y}^T \mathbf{R}^{-1} \mathbf{H} + 2 \hat{\mathbf{x}}^T \mathbf{H}^T \mathbf{R}^{-1} \mathbf{H} = 0. \]

Solving for \(\hat{\mathbf{x}}\) yields to:

\[ \hat{\mathbf{x}} = \left( \mathbf{H}^T \mathbf{R}^{-1} \mathbf{H} \right)^{-1} \mathbf{H}^T \mathbf{R}^{-1} \mathbf{y}. \]