Skip to content

Least Squares

Problem Statement

Consider a set of data points:

where is an independent variable to be estimated and is a dependent variable whose value can be obtained by an observation or a measurement. We have a measurement model:

where is the measurement noise. The goal of the least squares method is to find an estimator, , that "best" fits the measurement data. The fit of a model to a data point is measured by its residual or error, defined as the difference between the observed value of the dependent variable and the value predicted by the model:

The least squares method finds the optimal parameter values by minimizing objective function defined as the sum of the squared residuals:

Solving Least Squares

The solution to the least squares problem is a a solution to an optimization problem:

This is the nonlinear least squares problem. If the functions are linear in , then the problem is ordinary or linear least squares problem.

Relationship to MLE

Note here that there is no assumption about the measurement noises . If the measurement noise is independent and identically distributed zero-mean Gaussian random variables such that:

then the least squares estimator coincides with the MLE under these assumptions. In this case:

The likelihood function of is then:

where is some constant.

Parametric Model and Alternative Formulation

In the context of regression analysis or data fitting, we can formulate the problem differently. Consider a set of data points:

Assume that the model function has the form , where adjustable parameters are held in the vector . The goal of the least squares method here is to find the parameter values for the model that "best" fits the data. The residual is:

The least squares method minimizes objective function defined as the sum of the squared residuals:

The minimum of the objective function is found by setting the gradient to zero. Since the model contains parameters, there are gradient equations:

The minimum of the objective function is found by setting the gradient to zero. Since the model contains parameters, there are gradient equations:

This is particularly useful for polynomial fitting.