Least Squares
Problem Statement
Consider a set of \(k\) data points:
where \(x\) is an independent variable to be estimated and \(z_i\) is a dependent variable whose value can be obtained by an observation or a measurement. We have a measurement model:
where \(w_i\) is the measurement noise. The goal of the least squares method is to find an estimator, \(\hat{x}\), that "best" fits the measurement data. The fit of a model to a data point is measured by its residual or error, defined as the difference between the observed value of the dependent variable and the value predicted by the model:
The least squares method finds the optimal parameter values by minimizing objective function defined as the sum of the squared residuals:
Solving Least Squares
The solution to the least squares problem is a a solution to an optimization problem:
This is the nonlinear least squares problem. If the functions \(h_i\) are linear in \(x\), then the problem is ordinary or linear least squares problem.
Relationship to MLE
Note here that there is no assumption about the measurement noises \(w_i\). If the measurement noise is independent and identically distributed zero-mean Gaussian random variables such that:
then the least squares estimator coincides with the MLE under these assumptions. In this case:
The likelihood function of \(x\) is then:
where \(c\) is some constant.
Parametric Model and Alternative Formulation
In the context of regression analysis or data fitting, we can formulate the problem differently. Consider a set of \(k\) data points:
Assume that the model function has the form \(f(x_i, \boldsymbol{\beta})\), where \(m\) adjustable parameters are held in the vector \(\boldsymbol{\beta}\). The goal of the least squares method here is to find the parameter values for the model that "best" fits the data. The residual is:
The least squares method minimizes objective function defined as the sum of the squared residuals:
The minimum of the objective function is found by setting the gradient to zero. Since the model contains \(m\) parameters, there are \(m\) gradient equations:
The minimum of the objective function is found by setting the gradient to zero. Since the model contains \(m\) parameters, there are \(m\) gradient equations:
This is particularly useful for polynomial fitting.