1 / 26

4. Method of Steepest Descent

4. Method of Steepest Descent There are two problems associated with the Wiener filtering in practical applications. The matrix inversion operation is difficult to implement. The R and P may not may easy to estimate .

marlin
Télécharger la présentation

4. Method of Steepest Descent

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 4. Method of Steepest Descent • There are two problems associated with the Wiener filtering in practical applications. • The matrix inversion operation is difficult to implement. • The R and P may not may easy to estimate. • To overcome the first problem, we may solve the Wiener solution iteratively. • Consider a optimization problem. • A simplest procedure to solve the optimization problem iteratively is called the method of steepest descent.

  2. Observation: The gradient of J(w0) corresponds to a direction that has a largest slope at w0.

  3. Method of steepest descent (SD): • Initial guess : w(0) • Compute the gradient vector • update w • Repeat the process (i=i+1) • The parameter  is called the step size. It controls the rate of the convergence.

  4. Graphical interpretation:

  5. Project to the xy plane:

  6. Recall that for the Wiener filtering problem. • Thus, we can use the SD method to solve the Wiener filtering problem. • For real signals, we have

  7. For complex signals, we have • The the update equation for the SD method is • The SD method is a recursive algorithm. It is subject to the possibility of unstable.

  8. Let • The weight update equation can be written as • Since R is the correlation matrix, R=QQH (QHQ=I). • Let v(i)=QHc(i). Then

  9. For the k-th component of v(i), we have • Thus, for vk(i) to converge, it is necessary that • Since all eigenvalues are nonnegative, • To ensure every mode is convergent, we have

  10. Thus, if the step size satisfies the condition, • The time constant (measuring the convergence speed). • Since v(i)=QH[w(i)-wopt], we have

  11. For the i-th component of w(i), we have • Let a be the time constant of wi(i). Then • As we can see that the convergence speed is limited by min. However, we can adjust the step size such that the mode corresponding to max converges fast. • We conclude that the factor control the rate of convergence is the eigenvalue spread (max/min). The smaller the eigenvalue spread, the faster the convergence rate we can achieve.

  12. The MSE can be analyzed similarly. • If the step size is properly chosen, • The curve by plotting J[w(i)] versus i is called the learning curve. The time constant associated with the k-th mode is

  13. Eigenvectors/eigenvalues of R: • Thus, J(w) is a paraboloid. If we cut the paraboloid with planes parallel to w plane [J(w)=constant]. We obtain concentric ellipses. • Let c=w-wopt. Then, vHRv=-Jmin and J=2Rc. Note that J is normal to cHRc. The principle axis of an ellipsoid passes the origin (c=0) and is normal to vTRv. If cp is a principle axis, it must satisfy • Thus, the eigenvectors of R define the principle axes of the error surface.

  14. Geometrical interpretation: c1 v1 c0 v0

  15. The eigenvalues of R give the second derivative of the error surface r.w.t. the principle axes of J=c (what does this mean?). • Thus, if the eigenvalue spread is larger, the shape of the ellipsoid is more peculiar. • Note that • If we can translate and rotate the coordinates of w (to v), components of weights can be decoupled. As a matter of fact, we can use a different step size for different mode. This can have a fastest convergence rate.

  16. Recall the weight update equation. • Let rk=(1-k). Thus, for the k-th mode, the convergence condition is then -1<ck<1.

  17. Weight convergence: underdamped overdamped

  18. Example: identification of AR parameters d(n)=u(n)

  19. Convergence trajectory:

  20. Four eigenvalue spreads:

  21. continue:

  22. In terms of w(n):

  23. Newton’s method • Newton’s method is primarily a method for finding zeros of a equation. • Finding the minimum of a function g(x) means solve the equation g’(x)=0. This leads to the searching algorithm

  24. For the Wiener filtering problem, we have • Thus, Newton’s method is then • As shown, Newton’s method do not proceed in the gradient direction. Introducing the step size, we have Or,

  25. Convergence properties: • Thus, Newton’s method will converge if • Properties • Convergence of Newton’s method is same for every mode and doesn’t dependent on the eigenvalue spread of R. • The computation is more intensive (require R-1). • For nonquadratic cost function, Newton’s method is easy to become unstable.

  26. Question: if we know R-1, we can directly find wopt. Why do we have to use Newton’s method? • Reason: • Exact R-1 may not be necessary. Some efficient methods can be applied to find an approximated of R-1. This is specially true when the input is time-variant. • In general, straightforward Newton’s method is seldom used. Only the concept is adopted.

More Related