 Download Download Presentation Primal Methods

# Primal Methods

Télécharger la présentation ## Primal Methods

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Primal Methods

2. By a primal method of solution we mean a search method that works on the original problem directly by searching through the feasible region for the optimal solution. Methods that work on an approximation of the original problem are often referred to as “Transformation Methods” Each point in the process is feasible (theoretically) and the value of the objective function constantly decreases. Given n variables and m constraints, primal methods can be devised that work in spaces of dimension n-m, n, m, or n+m. In other words, a large variety exists. Primal Methods

3. Primal methods possess 3 significant advantages (Luenberger): 1) Since each point generated in the search process is feasible, if the process is terminated before reaching the solution, the terminating point is feasible. Thus, the final point is feasible and probably nearly optimal. 2) Often it can be guaranteed that if they generate a convergent sequence, then the limit point of that sequence must be at least a local constrained minimum. 3) Most primal methods do not rely on a special problem structure, such as convexity, and hence these methods are applicable to general nonlinear programming problems. Furthermore, their convergence rates are competitive with other methods, and particularly for linear constraints, they are often among the most efficient. Advantages of Primal Methods

4. Primal methods are not without disadvantages: They require a (Phase I) procedure to obtain an initial feasible point. They are all plagued, particularly for problems with nonlinear constraints, with computational difficulties arising from the necessity to remain within the feasible region as the method progresses. Some methods can fail to converge for problems with inequality constraints (!) unless elaborate precautions are taken. Disadvantages of Primal Methods

5. The following classes of algorithms are typically noted under primal methods: Feasible direction methods which search only in directions which are always feasible Zoutendijk’s Feasible Direction method Active set methods which partition inequality constraints into two groups of active and inactive constraints. Constraints treated as inactive are essentially ignored. Gradient projection methods which project the negative gradient of the objective onto the constraint surface. (Generalized) reduced gradient methods which partition the problem variables into basic and non-basic variables. Some Typical Primal Algorithm Classes

6. Active Sets

7. Constrained optimization can be made much more efficient if you know which constraints are active and which are inactive. Mathematically, active constraints are always equalities (!) Only considering the active constraints leads to a family of constrained optimization algorithms that can be classified as “active set methods” Dividing the Constraint Set

8. The idea underlying active set methods is to partition inequality constraints into two groups: those that are active and those that are inactive. The constraints treated as inactive are essentially ignored. Clearly, if the active constraints (for the solution) would be known, then the original problem could be replaced by a corresponding problem having equality constraints only. Alternatively, suppose we guess an active set and solve the equality problem. Then if all constraints and optimality conditions would be satisfied, then we would have found the correct solution. Active Set Methods

9. Idea behind active set methods is to define at each step af the algorithm a set of constraints, termed the working set, that is to be treated as the active set. Active set methods consist of two components: 1) determine a current working set that is a subset of the active set, 2) move on the surface defined by the working set to an improved solution. This surface is often referred to as the working surface. The direction of movement is generally determined by first or second order approximations of the functions. Basic Active Set Method

10. Basic active set algorithm is as follows: Start with a given working set and begins minimizing over the corresponding working surface. If new constraint boundaries are encountered, they may be added to the working set, but no constraints are dropped from the working set. Finally, a point is obtained that minimizes the objective function with respect to the current working set of constraints. For this point, optimality criteria are checked, and if it is deemed “optimal”, the solution has been found. Otherwise, one or more inactive constraints are dropped from the working set and the whole procedure is restarted with this new working set. Many variations are possible Specific examples: Gradient Projection algorithm (Generalized) Reduced Gradient algorithm Basic Active Set Algorithm

11. Accuracy of activity can cause some problems. Also, the calculation of the Lagrangian multipliers may not be accurate if we are just a bit off the exact optimum. In practice, constraints are dropped from the working set using various criteria before an exact minimum on the working surface is found. For many algorithms, convergence cannot be guaranteed and jamming may occur in (very) rare cases. Active set methods with various refinements are often very effective. Some Problems With Active Set Methods

12. Feasible Direction Methods

13. Basic Algorithm • Each iteration in a feasible direction method consists of • selecting a feasible direction and • a constrained line search.

14. (Simplified) Zoutendijk Method One of the earliest proposals for a feasible direction method uses a linear programming subproblem. Consider min ƒ(x) subject to a1Tx b1 ... amTx bm Given a feasible point, xk, let I be the set of indices representing active constraints, that is, aiTx = bi for i  I. The direction vector dk is the chosen as the solution to the linear program minimize ƒ(xk) d subject to aiTd 0, i  I (normalizing constraint) where d = (d1, d2, ..., dn) Constraints assure that vectors of the form will be feasible for sufficiently small a > 0, and subject to these conditions, d is chosen to line up as closely as possible with the negative gradient of ƒ. This will result in the locally best direction in which to proceed. The overall procedure progresses by generating feasible directions in this manner, and moving along them to decrease the objective.

15. Basic problem: Min f(x) Subject togi(x)0with i = 1, .., m Now think of a direction vector d that is both descending and feasible: “descent direction” (= reducing f(x)) “feasible direction” (= reducing g(x) = increasing feasibility) If d reduces f(x), then the following holds:f(x)Td <0 If d increases feasibility of gi(x), then the following holds:gi(x)Td <0 Given that you know d, you know need to know how far to go along d. xk+1 = xk + akdk Feasible Descent Directions

16. The following condition expresses the value ofak: ak = max {f(x)Td , gj(x)Td for each j  I} where I is the set of active constraints Note that ak < 0 MUST hold if you want both a reduction in f(x) and increase in feasibility (remember g(x) < 0, thus lower g(x) is better) The best ak is the lowest valued (most negative) ak , thus the problem now becomes: minimize a Subject to f(x)Td  a gj(x)Td  afor each j  I -1  di  1 where I = 1, .., n This Linear Programming problem now has n+1 variables (n elements of vector d plus scalar a) Finding the direction vector – A LP Problem

17. The idea behind feasible direction methods is to take steps through the feasible region of the form xk+1 = xk + akdk where dk is a direction vector and ak is a nonnegative scalar. Given that we have dk , next we need to know how far to move along dk . The scalar ak is chosen to minimize the objective function ƒ with the restriction that the point xk+1 and the line segment joining xk and xk+1 be feasible. IMPORTANT: Note that while moving along dk, we may encounter constraints that were inactive, but can now become active. Thus, we do need to do a constrained line search to find the maximum au . Approach in textbook: Determine maximum step size based on bounds of variables. If all constraints are feasible at the variable bounds, take this maximum step size as the step size. Otherwise, search along dk until you find a constraint that cause infeasibility first. Next step: Constrained Line Search

18. Two major shortcomings of feasible direction methods that require modification of the methods in most cases: 1) For general problems, there may not exist any feasible direction. (example??) In such cases, either • relax definition of feasibility or allow points to deviate, or • introduce concept of moving along curves rather than straight lines. 2) Feasible direction methods can be subject to jamming a.k.a. zigzagging, that is, it does not converge to a constrained local minimum. In Zoutendijk's method, this can be caused because the method for finding a feasible direction changes if another constraint becomes active. Major Shortcomings

19. Gradient Projection Methods

20. Gradient projection started from nonlinear optimization problem with linear constraints: Min f(x) S.t. arTx br asTx= bs Basic Problem Formulation

21. Gradient projection method is motivated by the ordinary methods of steepest descent for unconstrained problems. Fundamental Concept: The negative gradient of the objective function is projected on the working surface (subset of active constraints) in order to define the direction of movement. Major task is to calculate projection matrix (P) and subsequent feasible direction vector d. Gradient Projection Methods

22. Feasible Direction Vector and Projection Matrix

23. Nonlinear Constraints For the general case of min ƒ(x) s.t. h(x) = 0 g(x)  0 the basic idea is that at a feasible point xk one determines the active constraints and projects the negative gradient of ƒ onto the subspace tangent to the surface determined by these constraints. This vector (if nonzero) determines the direction for the next step. However, this vector is in general not a feasible direction since the working surface may be curved. Therefore, it may not be possible to move along this projected negative gradient to obtain the next point.

24. What is typically done to overcome the problem of curvature and loss of feasibility is to search along a curve along the constraint surface, the direction of the search being defined by the projected negative gradient. A new point is found as follows: First, a move is made along the projected gradient to a point y. Then a move is made in the direction perpendicular to the tangent plane at the original point to a nearby feasible point on the working set. Once this point is found, the value of the objective function is determined. This is repeated with various y's until a feasible point is found that satisfies the descent criteria for improvement relative to the original point. Overcoming Curvature Difficulties

25. The movement away from the feasible region and then coming back introduces difficulties that require series of interpolations for and nonlinear equation solutions for their resolution, because: 1) You first have to get back in the feasible region, and 2) next, you have to find a point on the active set of constraints. Thus, a satisfactory gradient projection method is quite complex. Computation of the nonlinear projection matrix is also more time consuming than for linear constraints: Difficulties and Complexities Nevertheless, gradient projection method has been successfully implemented and found to be effective (your book says otherwise). But, all the extra features needed to maintain feasibility require skill.

26. (Generalized) Reduced Gradient Method

27. Reduced gradient method is closely related to simplex LP method because variables are split into basic and non-basic groups. From a theoretical viewpoint, the method behaves very much like the gradient projection method. Like gradient projection method, it can be regarded as a steepest descent method applied on the surface defined by the active constraints. Reduced gradient method seems to be better than gradient projection methods. Reduced Gradient Method

28. Dependent and Independent Variables Consider min ƒ(x) s.t. Ax = b, x0 Partition variables into two groups x = (y, z) where y has dimension m and z has dimension n-m. This partition is formed such that all varaibles in y are strictly positive. Now, the original problem can be expressed as: min ƒ(y, z) s.t. By + Cz = b, y ≥ 0, z ≥ 0 (with, of course, A = [B, C]) Key notion is that if z is specified (independent variables), than y (the dependent variables) can be uniquely solved. NOTE: y and z are dependent. Because of this dependency, if we move z along the line z + aDz, then y will have to move along a corresponding line y + aDy. Dependent variables y are also referred to as basic variables Independent variables z are also referred to as non-basic variables

29. The Reduced Gradient

30. The generalized reduced gradient solves nonlinear programming problems in the standard form minimize ƒ(x) subject to h(x) = 0, a x b where h(x) is of dimension m. Generalized Reduced Gradient GRG algorithm works similar as with linear constraints. However, is also plagued with similar problems as gradient projection methods regarding maintaining feasibility. Well known implementation: GRG2 software

31. Basic idea in GRG is to search with z variables along reduced gradient r for improvement of objective function and use y variables to maintain feasibility. If some zi is at its bound (see Eq. 5.62), then set the search direction for that variable di = 0, dependent on the sign of the reduced gradient r. Reason: you do not want to violate a variable’s bound. Thus, that variable is fixed by not allowing it to change (di = 0 means it has now no effect on f(x)). Search xk+1 = xk + ad with d = [dy, dz] (column vector) If constraints are linear, then new point is (automatically) feasible. See derivation on page 177; constraints and objective function are combined in reduced gradient. If constraint(s) are non-linear, you have to adjust y with some Dy to get back to feasibility. Different techniques exist, but basically it is equivalent to an unconstrained optimization problem that minimizes constraint violation. Movement Basics

32. Picking a basis is sometimes poorly discussed in textbooks Some literally only say “pick a set z and y” Your textbook provides a method based on Gaussian elimination on pages 180-181 that is done every iteration of the method in your textbook. Other (“recent”) implementations favor changing basis only when a basic variable reaches zero (or equivalently, its upper or lower bound) since this saves recomputation of B-1. Thus, if a dependent variable (y) becomes zero, then this zero-valued dependent (basic) variable is declared independent and one of the strictly positive independent variables is made dependent. Either way, this is analogous to an LP pivot operation Changing Basis

33. Pick a set of independent (z) and dependent (y) variables Let z = -riif ri < 0 or ri > 0. Otherwise let z = 0 If z = 0, then stop because the current point is the solution. Otherwise, find Dy = –B-1CDz Find a1, a2, a3 achieving, respectively: max {a1: y + a1 Dy ≥ 0 } max {a2: z + a2 Dz ≥ 0 } min {ƒ(x + a3 Dx ): 0 ≤ a3 ≤ a1, 0 ≤ a3 ≤ a2 } Let x = x + a3 Dx If a3 < a1, return to (1). Otherwise, declare the vanishing variable in the dependent set (y) independent and declare the strictly positive variable in the independent set (z) dependent (pivot operation). Update B and C. Reduced Gradient Algorithm (one implementation) Note that your book has a slightly different implementation on page 181!

34. GRG method can be quite complex. Also note that inequality constraints have to be converted to equalities first through slack and surplus variables. GRG2 is a very well known implementation. Comments