1 / 31

Interior Point Optimization Methods in Support Vector Machines Training

Interior Point Optimization Methods in Support Vector Machines Training. Part 3: Primal-Dual Optimization Methods and Neural Network Training Theodore Trafalis E-mail: trafalis@ecn.ou.edu ANNIE’99, St. Louis, Missouri, U.S.A, Nov. 7, 1999. Outline. Objectives Artificial Neural Networks

donelle
Télécharger la présentation

Interior Point Optimization Methods in Support Vector Machines Training

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interior Point Optimization Methodsin Support Vector Machines Training Part 3: Primal-Dual Optimization Methodsand Neural Network Training Theodore Trafalis E-mail: trafalis@ecn.ou.edu ANNIE’99, St. Louis, Missouri, U.S.A, Nov. 7, 1999

  2. Outline • Objectives • Artificial Neural Networks • Neural Network Training as a Mathematical Programming Problem • A Nonlinear Primal-Dual Technique • A Stochastic Variant • An Incremental Primal-Dual Method • Primal-Dual Path Following Algorithms for QP

  3. Objectives

  4. Artificial Neural Networks xp1 z1 1 1 1 vij wjk xpi zk i j k xpn(1) zn(3) n(1) n(2) n(3) a f w1 o=f(w1a+w2b+w3c) f(x)=tanh(x) w2 b c w3 Neuron

  5. Neural Network Training as a Mathematical Programming Problem

  6. Constraints on the Weights g(v,w) • To avoid saturation of the neurons (Network Paralysis), we restrict the weights in the region  • Block constraints with respect to p. • The error minimization problem can be decomposed.

  7. A Nonlinear Primal-Dual Technique • Consider the general nonlinear programming problem min f(x) s.t. h(x)=0 (NLP) g(x)0 where f:n h:nm, and g:np. • The Lagrangian associated with (NLP) is L(x,y,z)=f(x)+yTh(x)-zTg(x) where ym and zp are the Lagrange multipliers.

  8. KKT Optimality Conditions • The Karush-Kuhn-Tucker (KKT) conditions are • To ensure adherence to central path, we use perturbed KKT complementarity slackness conditions ZSe=e

  9. x* x0 Adherence to Central Trajectory • When Newton’s method is used to solve the KKT system, Zs + Sz = -ZSe • If becomes zero, it will remain at zero in the following iterations. • If current iterate approaches the boundary, it gets trapped by that boundary.

  10. Consider vk=(xk,yk,sk,zk) and vk=(xk,yk,sk,zk). Newton’s method J(vk) vk = -F(vk) (S) where NLPD Algorithm Initialization. Solve linear system of equations (S). Calculate step lengths. Update current point. If stopping criterion satisfied,STOP. Otherwise, update  and go to step 2. Solving the KKT Conditions & Algorithm

  11. Hessian Calculation • For convex problems, x2L(x,y,z) is positive definite, J(vk) is nonsingular. x2L(x,y,z) is calculated by central differences. • For nonconvex problems, x2L(x,y,z) is generally indefinite and J(vk) might become singular. We approximate x2L(x,y,z) by a positive definite matrix H using a recursive formula. H(k+1) = H(k) + xL(x,y,z)(xL(x,y,z))T • Update based on the Recursive Prediction Error Method (RPEM) (Soderstrom and Stoica, 1989; Davidon, 1976). • Hock and Schittkowski database of constrained nonlinear programming problems (Hock and Schittkowski, 1981). • Comparisons with Breitfeld and Shanno’s Modified Barrier Algorithm (Breitfeld and Shanno, 1994).

  12. A Stochastic Variant of NLPD • We add random noise to the objective function as follows • Resulting perturbation on the direction of move • Probability of accepting a “bad” move:

  13. An Incremental Primal-Dual Algorithm • Consider problems of the form • Applications General least square problems Artificial neural network training problem.

  14. Example: Unconstrained Case • Consider the following unconstrained minimization problem min f(x) = f1(x) + f2(x) + f3(x) where x  and, f1(x) = x2 f2(x) = (0.75 x + 5)2 f3(x) = (1.5 x - 5)2

  15. Example: (cont’d)

  16. From a point v10, the sequence (v1t,...,vL+1t)t=0,1,... is generated where vlt is calculated by performing one Newton step towards the solution of the KKT conditions of the following subproblem min fl(x) s.t. hl(x)=0 gl(x)0 INCNLPD Algorithm Initialization.l=1, t=0. Solve linear system of equations (Sl). Calculate step lengths. Update current point. If stopping criterion satisfied,STOP. Otherwise, - if lL+1, set l=l+1, go to step 2. - if l>L+1, set t=t+1, l=1, v1t=vL+1t+1, update  and go to step 2. The Algorithm

  17. Algorithm Convergence Local convergence of the algorithm can be shown (Trafalis and Couellan 1997, paper submitted to SIAM Journal on Optimization, under revision). Starting from a neighborhood of the optimal solution, the sequence of iterates generated by INCNLPD converges q-linearly to that solution. • Motivations The algorithm is suitable to online applications. Leads to memory space savings. Leads to better fit of the data for some applications.

  18. Primal- dual Path Following Algorithms for QP • The problem we are concerned with is • Converting inequalities into equalities

  19. Dual of this problem is

  20. Central Path

  21. As m goes to zero, the central path converges to an optimal solution to both primal and dual problems. • Primal dual path following algorithm is defined as an iterative process that starts from a point in the feasible region and at each iteration estimates a value of m representing a point on the central path that is in some sense closer to the optimal solution than the current point • then attempts to step toward this central path point making sure that the new point remains in the strict interior of the appropriate orthant [Vanderbei 1998].

  22. Suppose we have already decided the value of m. Let (x,y,z,s,g,t) be the current point on the orthant, and (x+Dx,....t+Dt) denotes the point on the central path. Then we have

  23. Predictor-corrector algorithm will be used to solve this problem. First, we solve the above system after dropping m and D's from right hand side of the equations. Then estimate of the target value of m is made. • The m and D terms are reinstated on the right hand side of the above equation using the current estimates and then resulting system is again solved for delta variables. • The second step is called corrector step and resulting step directions are used to move to a new point in the primal dual space. As it can be seen from this procedure, we need to solve system of equations twice in each step.

  24. Predictor Corrector Method predicting direction • • • Centering or•xk Correcting direction • path of centers  k+1 xk xk+1  k

  25. The drawback of this method is to solve the system of equations twice in each iteration. The system of equations is a large, indefinite and sparse and linear system. It can be converted into a symmetric system by negating certain rows and rearranging rows and columns [Vanderbei 1998].

  26. The following systematic elimination procedure is applied to the above system (Vanderbei, 1998). We use the pivot elements -ST-1 and -G-1Z to solve for Dt and Dg. After solving for t and g, we get the following system of equations.

  27. By using S-1T and GZ-1 as pivot elements, we get the following system of equations, called reduced KKT system.

  28. In order to start the algorithm, we need to provide initial values for all the variables. Vanderbei (1998) recommends the following procedure to start the algorithm. First, we solve the following system to find initial values of x and y. • Then, other variables are set as follows

  29. m is updated by the following formula • ap and ad are the step directions for primal and dual variables. They must be normalized to 1. The following formulas are used to compute them.

  30. At the end of each iteration, the current solution is updated by using the following formulas

  31. Conclusions • An incremental primal-dual technique has been developed for problems with special decomposition properties. The algorithm, its implementation and its convergence results are provided in (Trafalis and Couellan 1997). • A stochastic primal-dual technique has been proposed (Trafalis andCouellan 1997, paper submitted to Journal of Global Optimization, under revision). Results show that it achieves better resultsthan the deterministic approach. • A primal-dual path following algorithm for QP was developed.

More Related