Mathematical Methods for Minimization

Mathematical Methods for Minimization Computing Across the Sciences

Why Minimization? Many applications in science involve maximizing or minimizing a function of one or more variables. For example, social scientists often maximize profit; physical scientists often minimize some kind of energy.

Why Minimization? • Example (Lennard-Jones): Consider two atoms that are a distance x apart. A common model for the energy of these two atoms is the Lennard-Jones energy: where e and s are parameter values specific to the two atoms in question. For example, for two argon atoms, typically s = 3.42 (in units of Angstroms) and e = 0.01 (in units of eV). Here’s a graph of this function.

Why Minimization? One useful feature of this energy function is the location of its minimum. In-class Exercise (calculus review): Use techniques from calculus to find the location of this minimum

Why Minimization? Solution to In-class Exercise: We first compute the derivative of E with respect to x: and then we set this derivative to zero, to find x = 3.838820205. (I should really just call this 3.84, given the number of significant figures in our parameters, but for future discussion of accuracy and convergence, let's imagine that all of these figures are significant).

Why Minimization? Now, imagine that Energy had been a very complicated function, or a function of thousands of variables (rather than just x). Then a calculation like we did above would have been hopeless: the derivative of Energy would have been very complicated, and even on the computer, it would be impossible to find the exact x where the derivative equals zero. We would like a general algorithm that will let a computer minimize a function, no matter how complicated the function or how many variables it involves.

Continuous vs. Discrete Minimization The Lennard-Jones example above is "continuous", because the variable x can take on a continuous range of values (all positive real numbers). There are also "discrete" minimization problems, where x can only sensibly take on a discrete set of values. Example (Traveling salesperson): A salesperson must visit cities #1, 2, 3, 4, 5, 6 in some order, and would like to minimize the total distance traveled. This is a minimization problem: the input variable "x" represents any ordering of the numbers 1 through 6, and the output "Energy" is the total distance traveled for that ordering. It's a discrete problem, because the set of orderings of 1 through 6 is a discrete space (there are 6! such orderings), not a continuous space like "all positive real numbers" in the Lennard-Jones example.

Continuous vs. Discrete Minimization For discrete minimization problems, calculus techniques such as we used in the Lennard-Jones example will not work, since the idea of a derivative only works nicely in a continuous space. Many of the science applications we will look at in this course will be continuous minimization problems, and many of the minimization algorithms we study will be calculus-based and thus only work for continuous problems. However, we will see some important discrete minimization problems, e.g., sequence alignment in the bioinformatics module, and some of our minimization algorithms can be used for both continuous and discrete problems.

Bisection (1-variable) Bisection is an algorithm that looks for solutions to an equation g(x) = 0. Unlike algebraic methods (like Mathematica's Solve command in the Lennard-Jones example), bisection is a numerical method, i.e., it does not find an exact solution x, only an approximation to the exact x (with as many digits of accuracy as we request). On the positive side, bisection can be used even for complicated functions g, when algebraic methods are likely to fail. In the Lennard-Jones example above, the function we want to set to zero is the derivative of the energy:

Bisection (1-variable) If we plot this, we see it crosses zero around the value 3.838820205 that we found before:

Bisection (1-variable) Bisection is a simple idea for finding this value 3.838820205... numerically. Begin with two values a,b with g(a)<0 and g(b)>0, e.g., a=3.6, b=4.0. For a continuous function g, there must be some x between a and b with g(x)=0.

Bisection (1-variable) To zero in on this solution, we consider c, the midpoint of a and b (here, c=3.8). Since g(c)<0, we know that a solution x lies between c and b, so we can replace a by c and continue. (If g(c) had been >0, we would have replaced b by c).

Bisection (1-variable) This is obviously a simple algorithm and is guaranteed to converge to a solution, as long as we begin with a,b so that g(a)<0 and g(b)>0. However, it is a relatively slow algorithm, in that it can require many steps to give a certain number of digits of accuracy. It is also not easy to extend this idea to problems with many variables.

Newton’s Method (1-variable) Like bisection, Newton's Method (sometimes called Newton-Raphson) is a numerical method for solving g(x) = 0, but a more sophisticated method which uses more information about the function g. As with bisection, keep in mind that when using Newton's Method to minimize a function f, you take the function g to be f ', so that you are really solving the equation f ' = 0.

Newton’s Method (1-variable) The idea of Newton's Method is: • Pick an initial guess for the x where g(x) = 0. Call this x0. • Find the tangent line to the graph of g at the point x0. Follow this tangent line to where it crosses zero. That is your next guess for the x where g(x) = 0. Call this x1. • Continue this process to compute x2, x3, ... They should approach the exact x where g(x) = 0.

Newton’s Method (1-variable) Here's an example using the Lennard-Jones function (g is the derivative of the Lennard-Jones energy, as in the bisection example). We choose x0 = 4.0, and then by following tangent lines, we find that x1 is around 3.72, x2 is around 3.805, etc.

Newton’s Method (1-variable) In-class Exercise (Newton's Method formula): Say we want to solve g(x)=0 by Newton's Method, and we have a "current guess” xn. (a) Find the equation of the tangent line to the curve y=g(x) at x=xn, written in the form y=mx+b. (b) Find the x value where this tangent line crosses the x-axis (i.e. where y=0). This x value is the "next guess"xn+1. Answer (derivation on next page):

Newton’s Method (1-variable) (a) The slope of the tangent line to the curve y=g(x) at x=xn is g'(xn). The tangent line passes through the point ( xn , g(xn) ). So, using the point-slope form of the equation of a line, the equation of the tangent line is: Writing this in y = m x + b form, we find: (b) Setting y to 0, we have the equation:

Newton’s Method (1-variable) Solving for x, we find: So, the next guess is:

Newton’s Method (1-variable) Beginning with the initial guess x0, we apply this formula to find x1, then again to find x2, etc., until the x values appear to converge to a steady solution. Here's a Mathematica code that does 5 iterations of this procedure:

Newton’s method (1-variable) And here is its output:

Newton’s Method: Another perspective Here's another geometric way to think about using Newton's method to minimize a function E. Each step of Newton's method tries to solve E' = 0 by approximating the graph of E' by a "best-fit" line (the tangent line). This means that each step of Newton's method is approximating the graph of E by a best-fit quadratic function or parabola (since the derivative of a quadratic function is a linear function). The bottom of this best-fit parabola is our "next guess".

Newton’s Method: Convergence With any iterative algorithm such as Newton's Method, there is the question "When am I done?" A common heuristic method is to stop the iteration when consecutive values resemble each other closely. For example, in the above Mathematica computation, x4 and x5 both start with 3.8388..., so we might stop the iteration and say that the solution is 3.8388, correct to 4 decimal places. This would be correct, since the exact answer we found earlier to be 3.838820205… This is a reasonable strategy, but be aware that it is not perfect. It's possible (but unlikely) that x6 could be very different from x4 and x5. Also, you might care more about how close g(x5) is to 0 than how close x4 is to x5, since after all, the goal was to find where g(x)=0. It's possible that x4 and x5 need to agree to within 20 decimal places before g(x5) is within 2 decimal places of zero. It depends on what g is.

Newton’s Method: Convergence Finally, it is helpful to know how quickly an algorithm is likely to converge to the solution. The field of "numerical analysis" (beyond the scope of our course) addresses such questions. For example, a theorem in numerical analysis tells us that Newton's Method has "quadratic convergence" once the guesses xn get close enough to the true solution x. What does that mean? Once we get close to the true solution, the error | xn - x| will be approximately squared at the next guess: for some constant C.

Newton’s Method: Convergence This is considered fast convergence; it means that the number of accurate digits in our guesses roughly doubles at each step! For example, in the Mathematica computation above, x2 had one decimal place correct (3.8), x3 had two places correct (3.83), x4 had four places correct (3.8388), and x5 had at least nine places correct (3.838820205). Keep in mind, however, that this fast convergence only kicks in once you are close to the solution, and many problems can arise before you get to that point, as we will soon see. For a precise statement of the quadratic convergence theorem, or for more on convergence of iterative algorithms in general, please see a numerical analysis reference, such as [1] or [2].

Newton’s Method

Difficulties with Newton’s Method Newton's Method works very well when your initial guess is close to the exact solution, but it can go astray if not, such as in this example, the g from before but a different initial guess.

Difficulties with Newton’s Method When there is more than one solution to g(x)=0, Newton's Method can converge to any of these solutions, in a somewhat unpredictable manner. Here's an example of this with a new function g(x)= -x3 + 4x + 1: Where do we end if we run Newton’s Method with x0=0.8?

Difficulties with Newton’s Method Here’s what happens when you try it: Now what about if you start at x0 = 0.825?

Difficulties with Newton’s Method We get a different answer when we try it: Now what about if you start at x0 = 0.85?

Difficulties with Newton’s Method We get yet another answer when we try it:

Difficulties with Newton’s Method It could even get trapped in an infinite loop, as shown in the following example for the function g(x)=x/(x2+3):

More on 1-variable minimization Now we can solve f '(x)=0 by bisection or Newton's Method, which means we can find local minima of f(x). This is just one of many approaches to minimizing f(x). For example, Mathematica has a command FindMinimum that will attempt to minimize f(x) by a combination of Newton's Method and other ideas.Here we'll use FindMinimum to search for the minimum of the Lennard-Jones energy. Note that we have to give FindMinimum an initial guess, just like we did with our home-grown Newton's Method. It finds the minimum at x=3.8388..., and even reports the minimum value of the energy, here Energy = -0.01.

More on 1-variable minimization Just like there is a major issue with Newton's Method (or bisection) finding multiple solutions to f '(x) = 0 depending on the initial guess, there is a major issue with any minimization algorithm finding multiple local minima of f(x). For example, consider the molecule ethane:

More on 1-variable minimization The dihedral angle f is the angle between the planes C-C-HA and C-C-HB. When f=0, the hydrogens HA and HB are torsionally aligned, so that their respective C-H bonds repel each other, raising the overall molecular energy. At other values of f, these bonds are torsionally staggered, reducing this repulsion and thus lowering the molecular energy. This contribution to the molecular energy is often modeled in molecular mechanics by a term: E = a Cos[3f] (where a is some constant, which we assume here = 1 for simplicity).

More on 1-variable minimization When we seek local minima of this function with FindMinimum, the solution depends on our initial guess:

Multidimensional minimization What if we want to minimize a function of more than one variable, like f(x,y), f(x,y,z), etc? Many minimization algorithms have multivariable versions. For example, imagine two argon atoms and one krypton atom on a line (and in that order). Let the left Ar atom be at (-x,0), the center Ar atom be at (0,0), and the right atom (Kr) be at (y,0). Using standard Lennard-Jones parameters, the total energy is:

Multidimensional minimization The graph of a function of two variables is a surface, with the x and y axes denoting the input variables (here, appropriately called x and y) and the z axis representing the value of the function (here, the energy E).

Multidimensional minimization Sometimes a more useful picture is the "contour plot" of the function, which is like a topographic map of a mountain, showing curves of constant height: Here we can see a shallow "well" around x=y=3.8. What algorithm could we use to locate this well precisely?

Multidimensional minimization There is a version of Newton's Method that works for two or more variables. Geometrically, at each step it finds the "best-fit" quadratic function (or paraboloid) for the graph, and makes the bottom of that paraboloid its next guess. Implementing this algorithm requires some knowledge of linear algebra, so we will not spell out the details here. It enjoys the same rapid quadratic convergence close to the solution as in the 1-variable case, but also suffers from the same potential problems (possibility of non-convergence or convergence to different local minima for poor initial guesses)

Multidimensional minimization There are also many other algorithms in use. One popular one is called "the method of steepest descent". Geometrically, we imagine we are standing at some point on the surface and we find the direction that leads most steeply downhill and follow that direction until we stop descending. Then we reassess what direction leads most steeply downhill, and continue until we converge to the local minimum. Again, implementing this algorithm requires some multivariable calculus (e.g., the gradient vector, which points in the direction of steepest ascent), so we will not spell out the details. It is an appealingly simple idea, but can be surprisingly slow for some functions.

Multidimensional minimization Many other minimization algorithms exist, combining these and other ideas. For example, Mathematica's FindMinimum command can be used for functions of two or more variables:

Multidimensional minimization For functions of three or more variables, the graph of the energy requires more than three dimensions, and the nice 2-dimensional contour plots no longer apply. However, the basic minimization algorithms can still be used (although it requires more intuition to come up with good initial guesses when the energy can not be easily plotted). For example, we could release our two Ar and one Kr atoms from their "line prison" and allow them roam anywhere on a plane. We could choose coordinate so that the argon atom are still at (0,0) and (-x,0), but now the krypton atom could be at any (y,z). Then the Lennard Jones energy is:

Multidimensional minimization

References [1] W. Cheney and D. Kincaid, Numerical mathematics and computing, 4th edition, Brooks/Cole Publishing Company (1999). [2] J. Stoer and R. Bulirsch, Introduction to numerical analysis, Springer-Verlag (1979).

Mathematical Methods for Minimization

Mathematical Methods for Minimization

Presentation Transcript

Cost Minimization

Tier I: Mathematical Methods of Optimization

Introduction to Numerical Methods Mathematical Procedures

Mathematical Methods

Advanced Mathematical Methods

Tier I: Mathematical Methods of Optimization

Mathematical Methods (CAS) Unit 3

MATHEMATICAL METHODS FOR CARDIOVASCULAR STENTING

Mathematical Methods For Image Compression

Mathematical Methods

Mathematical Methods for the Segmentation of Medical Images

Developing Mathematical Methods

Develop mathematical, statistical, and computational methods

Mathematical Methods

Mathematical methods for implicit solvent models

MATHEMATICAL METHODS

36 Methods of Mathematical Proof

Develop mathematical, statistical, and computational methods

SOME MATHEMATICAL METHODS

Tier I: Mathematical Methods of Optimization

Info 2950 Mathematical Methods for Information Science

Mathematical Methods in Linguistics