Tutorial 10 Unconstrained optimization Conjugate gradients

Tutorial 10Unconstrained optimization Conjugate gradients

Method of Conjugate Gradients Suppose that we want to minimize the quadratic function where Q is a symmetric positive definite matrix, and x has n components. As we well know, the minimum x* is the solution to the linear system The explicit solution of this system (Newton’s Method) requires about O(n3) operations and O(n2) memory, which is very expensive. M4CS 2005

Conjugate Gradients 2 We now consider an alternative solution method that does not need the inversion of Q, but only the gradient of f(xk) (just like SD, but better) evaluated at n different points x1, …, xn. Gradient Conjugate Gradient M4CS 2005

Conjugate Gradients 3 Consider, for example, the case n=3, in which the variable x in f(x) is a three-dimensional vector. Then the quadratic function f(x) is constant over 3D ellipsoids, called isosurfaces, centered at the minimum x* . How can we start from a point x0 on one of these ellipsoids and reach x* by a finite sequence of one-dimensional searches? In the steepest descent, for the poorly conditioned Hessians, orthogonal directions lead to many small steps, that lead to slow convergence. M4CS 2005

Conjugate Gradients: Spherical Case In the spherical case, the very first step in the direction of the gradient takes us to x* right away. Suppose however that we cannot afford to compute this special direction p1 orthogonal to p0, but that we can only compute some direction p1 orthogonal to p0 (there is an n-1 dimensional space of such directions!) and reach the minimum of f(x) in this direction. In that case n steps will take us to x* of the sphere, since coordinate of the minimum in each on the n directions is independent of others. M4CS 2005

Conjugate Gradients: Elliptical Case Any set of orthogonal directions, with a line search in each direction, will lead to the minimum for spherical isosurfaces. Given an arbitrary set of ellipsoidal isosurfaces, there is a one-to-one mapping with a spherical system: if Q = UΣUT is the SVD of the symmetric positive definite matrix Q, then we can write ,where M4CS 2005

Elliptical Case 2 Consequently, there must be a condition for the original problem (in terms of Q) that is equivalent to orthogonality for the spherical problem. If two directions yi and yj are orthogonal in the spherical context, that is, if , where What does this translate into in terms of the directions xi and xj for the ellipsoidal problem? We have This condition is called Q-conjugacy, or Q-orthogonality : if this equation holds, then xi and xj are said to be Q-conjugate or Q-orthogonal to each other, or simply "conjugate". (1) (2) M4CS 2005

Elliptical Case 3 In summary, if we can find n directions p0,…,pn-1, that are mutually conjugate, i.e. comply with (2), and if we do line minimization along each direction pi we reach the minimum in at most n steps. Such algorithm, would be named “Conjugate Direction (CD)”. A special case we will consider is to drive the construction of these directions from the local gradients, thus giving birth to the “Conjugate Gradients”. Of course, we cannot use the transformation (1) in the algorithm, because Σ and especially UT are too large. So for computational efficiency we need to find a method for generating n conjugate directions without using SVD or other complex processes on the Hessian Q. M4CS 2005

Hestenes Stiefel Procedure Here • First step is like NSD with optimal line search • Once we have the new solution, the gradient is evaluated there • The next search direction is built by taking the current gradient vector and Q-orthogonalizing it with all previous directions M4CS 2005

Hestenes Stiefel Procedure 2 Let us check that the directrions are indeed Q-conjugate. First, it is easy to see that p1 and p0 are conjugate. Now assume that p0,…, pk are already mutually conjugate and let us verify that pk+1 is conjugate to each of them, i.e. for arbitrary j: One can see that the vectors pk are found by a generalization of Gram-Schmidt to produce conjugate rather than orthogonal vectors. In practical cases, it can be worth to dismiss except for . For the sake of simplicity, we will assume that this is the case. M4CS 2005

Removing the Hessian In the described algorithm the expression for pk contains the Hessian Q, which is too large. We now show that pk can be rewritten in terms of the gradient values gk and gk+1 only. To this end, we notice that or Proof: So that M4CS 2005

Removing the Hessian 2 We can therefore write and Q has disappeared. This expression forpkcan be further simplified by noticing that because the line along pk is tangent to an isosurface at xk+l , while the gradient gk+l is orthogonal to the isosurface at xk+l. Similarly, Then, the denominator of becomes: M4CS 2005

Polak-Ribiere formula In conclusion, we obtain the Polak-Ribiere formula M4CS 2005

General Case When the function f(x) is arbitrary, the same algorithm can be used, but n iterations will not suffice, since the Hessian, which was constant for the quadratic case, now is a function of xk. Strictly speaking, we then lose conjugacy, since pk and pk+lare associated to different Hessians. That is the reason why it is worth to keep conjugacy only between pk+1and pk, setting ). However, as the algorithm approaches the minimum x*, the quadratic approximation becomes more and more valid, and a few cycles of n iterations each will achieve convergence. M4CS 2005

-f’(0) 2 1 Conjugate gradients: example Consider the elliptic function: f(x,y)=(x-1)2+2(y-1)2and find the first three terms of Taylor expansion. Find the first step of Steepest Descent, from (0,0). M4CS 2005

-f’(0) 1 1 Conjugate gradients: example M4CS 2005

Tutorial 10 Unconstrained optimization Conjugate gradients