Zeroing in on the Implicit Function Theorem

Zeroing in on the Implicit Function Theorem The Implicit Function Theorem for several equations in several unknowns.

So where do we stand? • Solving a system of m equations in n unknowns is equivalent to finding the “zeros” of a vector-valued function from n→ m. • When n > m, such a system will “typically” have infinitely many solutions. In “nice” cases, the solution will be a function from n-m→ m.

So where do we stand? • Solving linear systems is easy; we are interested in the non-linear case. • We will ordinarily not be able to solve a system “globally.” But under reasonable conditions, there will be a solution function in a (possibly small!) region around a single “known” solution.

(a,b) y x (a,b) y = g(x) For a Single Equation in Two Unknowns In a small box around (a,b), we hope to find g(x). And we can, provided that the y-partials at (a,b) are continuous and non-zero.

(a,b) (a,b) y x Make the box around (a,b) small enough so that all of the y-partials in this box are “close” to D. Start with a point (a,b) on the contour line, where the partial with respect to y is not 0: y = g(x)

(a,b) (a,b) y x x Fix x. For this x, construct function Iterate x(y). What happens? Start with a point (a,b) on the contour line, where the partial with respect to y is not 0: y = g(x)

A Whole Family of Quasi-Newton Functions Remember that (a,b) is a “known” solution. and There are a whole bunch of these functions: There is one for each x value.

If x = a, then we have If xa, then we have A Whole Family of Quasi-Newton Functions Remember that (a,b) is a “known” solution to f(x,y)=0, and The “pretty good” Quasi-Newton method! The “best of all possible worlds” Leibniz method!

“Pretty good” quasi-Newton’s method How does that work in our case? If we make sure that the partials of f with respect to y are all near enough to D to guarantee that | x’(y) | < ½ for all (x,y) in the square, then the iterated maps will converge. If we choose D near enough to f’(p) so that |Q’(p)| < ½, iterated maps will converge in a neighborhood of p. What are the issues? • We have to make sure the iterated maps converge---how do we do this?

If we have a function f: [a,b] →  which is differentiable on (a,b) and continuous on [a,b], the Mean Value Theorem says that there is some c in [a,b] such that The Role of the Derivative If |f (x)| < k < 1 for all x in [a,b] Distances contract by a factor of k! Likewise, if |f (x)| > k > 1 for all x in [a,b] Distances expand by a factor of k!

But f (p) = p, so The Role of the Derivative If p is a fixed point of f in [a,b], and |f (x)| < k < 1 for all x in [a,b], then f moves other points closer to p by a factor of k! Each time we re-apply the next iterate is even closer to p! (Attracting fixed point!) Likewise, if |f (x)| > k > 1 for all x in [a,b], f moves other points farther and farther away from p. (Repelling fixed point!)

(a,b) y x x What are the issues? • Not so obvious . . . we have to work a bit to make sure we get the right fixed point. (We don’t leave the box!)

Where Df(p) is the Jacobian matrix made up of all the partial derivatives of f. Systems of Equations and Differentiability A vector valued function f of several variables is differentiable at a vector p if in some small neighborhood of p, the graph of f “looks a lot like” an affine function. That is, there is a linear transformation Df(p) so that for all z “close” to p, Suppose that f (p) = 0. When can we solve f(z) = 0 in some neighborhood of p?

Because the existence of a solution depends on the geometry of the function. Systems of Equations and Differentiability For z “close” to p, Since f (p) = 0, When can we solve f(z) = 0 in some neighborhood of p? Answer: Whenever we can solve

When can we do it? is a linear system. We understand linear systems extremely well. We can solve for the variables y1, y2, and y3 in terms of x1 and x2if and only if the sub-matrix . . . is invertible.

A bit of notation • To simplify things, we will write our vector valued function F : n →m. • We will write our “input” variables as concatenations of n-vectors and m-vectors. e.g. (y,x)=(y1, y2,. . ., yn, x1, x2,. . ., xm) • So when we solve F(y,x)=0 we will be solving for the y-variables in terms of the x-variables.

The System We can solve for the variables y1, y2, . . . yn in terms of x1, x2 ,. . .,xmif and only if the sub-matrix . . . is invertible.

So the Condition We Need is. . . . . . the invertibility of the matrix We will refer to the inverse of the matrix D as D-1.

The Implicit Function Theorem • F : n+m →m has continuous partials. • Suppose b nand a mwith F(b,a)=0. • The nx n matrix that corresponds to the y-partials of F (denoted by D) is invertible. Then “near” a there exists a unique function g(x) such that F(g(x),x)=0; moreover g(x) is continuous.

Where The 2-variable case. Fix x. For this x, construct function Iterate x(y). What Function Do We Iterate?

The 2-variable case. Fix x. For this x, construct function The n-variable case. Fix x. For this x, construct function Iterate x(y). Iterate x(y). What Function Do We Iterate? Multi-variable Parallel?

n+m Partials of F in the ball are all close to the partials at (b,a) r t (b,a) t n g(x) We want g continuous & unique x a b r r F(g(x),x) = 0 m

n+m r (b,a) Notation: Let dF(y,x) denote the nxn submatrix made up of all the y partial derivatives of F at(y,x). Step I: Since D is invertible,  D-1  0. We choose r as follows:

Notation: Let dF(y,x) denote the nxn submatrix made up of all the y partial derivatives of F at(y,x). Step I: Since D is invertible,  D-1  0. We choose r as follows: use continuity of the partials of F to choose r small enough so that for all (y,x) Br(b,a) Then So ! Multivariable Mean Value Theorem 

There are two (highly non-trivial) steps in the proof: • In Step 1 we choose the radius r of the ball in n+mso that the partials in the ball are all “close” to the (known) partials at (b,a). This makes x contract distances on the ball, forcing convergence to a fixed point. (Uses continuity of the partial derivatives.) • In Step 2 we choose the radius t around a “small” so as to guarantee that our iterates stay in the ball. In other words, our “initial guess” for the quasi-Newton’s method is good enough to guarantee convergence to the “correct” root. • The two together guarantee that we can find value a g(x) which solves the equation. We then map x to g(x).

(a,b) y x x Since this is the central idea of the proof, I will reiterate it: • In Step 1 we make sure that iterated maps on our “pretty good” quasi-Newton function actually converge. • In Step 2, by makingsure that they didn’t “leave the box,” we made sure the iterated maps converged to the fixed point we were aiming for. That is, that they didn’t march off to some other fixed point outside the region.

Final thoughts The Implicit Function Theorem is a DEEP theorem of Mathematics. Fixed point methods show up in many other contexts: For instance: • They underlie many numerical approximation techniques. • They are used in other theoretical contexts, such as in Picard’s Iteration proof of the fundamental theorem of differential equations. The iteration functions frequently take the form of a “quasi-Newton’s method.”

Zeroing in on the Implicit Function Theorem