Gauss-Netwon Method
General Descent Methods
The basic paradigm of descent methods is as follows:
Algorithm: Descent Methods
\(\quad\) Choose an initial solution \(\mathbf{x}^0\).
\(\quad\) Choose a descent direction \(\mathbf{d}^0\).
\(\quad\) Choose a step size \(\alpha_0\).
\(\quad\) Update the solution \(\mathbf{x}^1 = \mathbf{x}^0 + \alpha_0 \mathbf{d}^0\).
\(\quad\) If a stopping criteria is met, stop; else re-iterate with current solution.
Gauss-Newton Method
Let \(\mathbf{x}^k\) be the current iterate and consider a second-order Taylor series of \(f(\mathbf{x})\) at \(\mathbf{x}^k\), \(g(\mathbf{x})\):
In Newton's method, we choose the next iterate as the solution that minimizes the approximate function \(g(\mathbf{x})\). Setting \(\nabla g(\mathbf{x}) = \mathbf{0}\) yields to a linear system:
If the Hessian is non-singular, a solution to the above system is well-defined. The update for next iteration is then:
In this case, the improving direction is:
and a step size of \(\alpha = 1\).
Newton's Method Single Iteration Example
Let \(\mathbf{x} = \left[x_1, x_2 \right]^T\). We are interested in the optimization problem:
To solve it with the Newton's method:
- Choose initial conditions \(\mathbf{x}^0 = \left[0, 1 \right]^T\). The initial cost is \(f(\mathbf{x}^0) = 17.0\)
-
The gradient is:
\[ \nabla f(\mathbf{x}) = \left[ \begin{array}{c} 4(x_1 + 1)^3 + x_2 \\ x_1 + 4(x_2 + 1)^3 \end{array} \right]. \]The Hessian is:
\[ \nabla^2 f(\mathbf{x}) = \left[ \begin{array}{cc} 12(x_1 + 1)^2 & 1 \\ 1 & 12(x_2 + 1) \end{array} \right]. \]Evaluating the gradient at \(\mathbf{x}^0\), we get \(\nabla f(\mathbf{x}^0) = \left[5, 32 \right]^T\).
Evaluating the Hessian at \(\mathbf{x}^0\), we get \(\nabla^2 f(\mathbf{x}^0) = \left[\begin{array}{cc}12 & 1 \\ 1 & 48 \end{array} \right]\)
-
The next iterate is then \(\mathbf{x}^1 = \left[ -0.3617, 0.3409 \right]^T\) and \(f(\mathbf{x}^1) = 3.2755\).
Newton's Method Characteristics
-
If we start close to a local minimum and the Hessian is positive definite, then Newton's method has a quadratic convergence.
-
In general, Newton's method does not guarantee to converge. The direction may not be improving at all. For example, if we start from \(\mathbf{x}^0 = \left[ -1, 1 \right]^T\) with \(f(\mathbf{x}^0) = 15\), the next iterate is \(\mathbf{x}^1 = \left[ -2, 18 \right]\) with \(f(\mathbf{x}^1) = 130286\).
-
If the Hessian is singular (or ill-conditioned), Newton's method will halt.
-
Computing gradient and Hessian (and its inverse) is computationally expensive.