430 likes | 549 Vues
This lecture by Prof. Tom Overbye, featuring guest Dr. Hao Zhu, explores the Least Squares method, a fundamental technique for approximating solutions in overdetermined systems in electrical engineering. It covers the historical development of the method, its application in power systems, specifically state estimation, and the significance of choosing norm criteria. The lecture emphasizes the minimization of errors in overdetermined equations and the conditions under which unique solutions exist. Essential for students and professionals working with large-scale electrical systems.
E N D
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Lecture 19: Least Squares Prof. Tom Overbye Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign overbye@illinois.edu Special Guest Lecture by Dr. Hao Zhu
Announcements • HW 6 is due Thursday November 7
Least Squares • So far we have considered the solution of Ax = b in which A is a square matrix; as long as A is nonsingular there is a single solution • That is, we have the same number of equations (m) as unknowns (n) • Many problems are overdetermined in which there more equations than unknowns (m > n) • Overdetermined systems are usually inconsistent, in which no value of x exactly solves all the equations • Underdetermined systems have more unknowns than equations (m < n); they never have a unique solution but are usually consistent
Method of Least Squares • The least squares method is a solution approach for determining an approximate solution for an overdetermined system • If the system is inconsistent, then not all of the equations can be exactly satisfied • The difference for each equation between its exact solution and the estimated solution is known as the error • Least squares seeks to minimize the sum of the squares of the errors • Weighted least squares allows differ weights for the equations
Least Squares Solution History • The method of least squares developed from trying to estimate actual values from a number of measurements • Several persons in the 1700's, starting with Roger Cotes in 1722, presented methods for trying to decrease model errors from using multiple measurements • Legendre presented a formal description of the method in 1805; evidently Gauss claimed he did it in 1795 • Method is widely used in power systems, with state estimation the best known application, dating from Fred Schweppe's work in 1970 • State estimation is covered in ECE 573
Least Squares and Sparsity • In many contexts least squares is applied to problems that are not sparse. For example, using a number of measurements to optimally determine a few values • Regression analysis is a common example, in which a line or other curve is fit to potentially many points) • Each measurement impacts each model value • In the classic power system application of state estimation the system is sparse, with measurements only directly influencing a few states • Power system analysis classes have tended to focus on solution methods aimed at sparse systems; we'll consider both sparse and nonsparse solution methods
Least Squares Problem • Consider or
Least Squares Solution • We write (ai)T for the row i of Aand aiis a column vector • Here, m ≥ n and the solution we are seeking is that which minimizes Ax - bp, where pdenotes some norm • Since usually an overdetermined system has no exact solution, the best we can do is determine an x that minimizes the desired norm.
Example 1: Choice of p • We discuss the choice of p in terms of a specific example • Consider the equation Ax = b with (hence three equations and one unknown) • We consider three possible choices for p:
Example 1: Choice of p (i) p= 1 (ii) p= 2 (iii) p=
The Least Squares Problem • In general, is not differentiable for p = 1 or p = ∞ • The choice of p = 2 has become well established given its least-squares fit interpretation • We next motivate the choice of p = 2 by first considering the least–squares problem
The Least Squares Problem • The problem is tractable for 2 major reasons (i) the function is differentiable in x ; and
The Least Squares Problem (ii) the 2 norm is preserved under orthogonal transformations: with Q an arbitrary orthogonal matrix; that is, Q satisfies
The Least Squares Problem • We introduce next, the basic underlying assumption: Ais full rank, i.e., the columns of A constitute a set of linearly independent vectors • This assumption implies that the rank of A is nbecause n ≤ m since we are dealing with an overdetermined system • Fact: The least squares solution x* satisfies
Proof of Fact • Since by definition the leastsquares solution x* minimizes at the optimum, the derivative of this function vanishes:
Implications • This underlying assumption implies that A is full rank • Therefore, the fact that ATA is positive definite (p.d.) follows from considering any x ≠ 0 and evaluating which is the definition of a p.d. matrix • We use the shorthand ATA > 0 for ATA being a symmetric, positive definite matrix
Implications • The underlying assumption that A is full rank and therefore ATA is p.d. implies that there exists a unique leastsquares solution • Note: we use the inverse in a conceptual, rather than a computational, sense • The below formulation is known as the normal equations, with the solution conceptually straightforward
Implications • An important implication of positive definiteness is that we can factor ATA since ATA> 0 • The expression ATA = GTG is called the Cholesky factorization of the symmetric positive definite matrix ATA
Least Squares Solution Algorithm Step 1: Compute the lower triangular part of ATA Step 2: Obtain the Cholesky Factorization Step 3: Compute Step 4: Solve for y using forward substitution in and for x using backward substitution in
Practical Considerations • The two key problems that arise in practice with the triangularization procedure are: (i) While A maybe sparse, ATAis much less sparse and consequently requires more computing resources for the solution (ii)ATAmay be numerically less well-conditioned than A • We must deal with these two problems 20
Example 2: Loss of Sparsity • Assume the B matrix for a network is • Then BTB is • Second neighbors are now connected! But large networks are still sparse, just not as sparse
Numerical Conditioning • To understand the point on numerical ill-conditio-ning, we need to introduce terminology • We define the norm of a matrix to be
Numerical Conditioning i.e., li is a root of the polynomial • In words, the 2 norm of B is the square root of the largest eigenvalue of BTB
Numerical Conditioning • The conditioning number of a matrix B is defined as • A well–conditioned matrix has a small value of , close to 1; the larger the value of , the more pronounced is the ill-conditioning
Numerical Conditioning • The illconditioned nature of ATA may severely impact the accuracy of the computed solution • We illustrate the fact that an illconditioned matrix ATA results in highly sensitive solutions of leastsquares problems with the following example:
Example 3: Ill-Conditioned ATA • Consider the matrixThen
Example 3: Ill-Conditioned ATA • We consider a “noise” in A to be the matrix dAwith
Example 3: Ill-Conditioned ATA • The noise leads to an error E in the computation of ATA with • Let and assume that there is nonoise in b, that is, db = 0
Example 3: Ill-Conditioned ATA • The resulting error in solving the normal equations is independent of db since it is cause purely by nd is • Let x be the true solution of the normal equationso the solution of is x = [1 0]T
Example 3: Ill-Conditioned ATA • Let x' be the solution of the system with the error arising due to dA,i.e., the solution of • Therefore
Example 3: Ill-Conditioned ATA Implies that Therefore the relative error isNow, the conditioning number of ATA is
Example 3: Ill-Conditioned ATA • Since • The product is • Thus the conditioning number is a major contributor to the error in the computation of x • In other words, the sensitivity of the solution to any error, be it data entry or of a numerical nature, is very dependent on the conditioning number
What can be done? • Introduce regularization term to the LS cost • Ridge regression (l2 norm regularization) • At the optimum, the derivative • Different inverse matrix (improving the conditioning)
Example 4: Ridge regression • Recalling Example 3 and • Ridge regression solution with versus • x= [1 0]T
Example 4: Ridge regression • With noise matrix and • Ridge regression solution with versus
Regularization • Can be used for solving underdetermined systems too • Level of regularization important! • Large λ : better conditioning number, but less accurate • Small λ: close to LS, but not improving conditioning • Recent trend: sparsity regularization using l1norm
The Least Squares Problem • With this background we proceed to the typical schemes in use for solving least squares problems, all along paying adequate attention to the numerical aspects of the solution approach • If the matrix is full, then often the best solution approach is to use a singular value decomposition (SVD), to form a matrix known as the pseudo-inverse of the matrix • We'll cover this later after first considering the sparse problem • We first review some fundamental building blocks and then present the key results useful for the sparse matrices common in state estimation
Householder Matrices and Vectors • Consider the nn matrix where, is called a Householder vector • Note that the definition of P in terms of vector v implies the following properties for P: Symmetry: Orthonormality:
Householder Matrices and Vectors • Let x n be an arbitrary vector; then • Now, suppose we want P x to be a multiple of e1, the first unit vector and so P x is a linear combination of the x and v vectors • Then, v is a linear combination of x and e1, and we writeso that
Householder Matrices and Vectors and • Therefore,
Householder Matrices and Vectors • For the coefficient of x to vanish, we require thator so that • Consequentlyso thatThus the determination of v is straightforward
Example 4: Construction of P • Assume we are given • Then
Example 4: Construction of P • Then • It follows then that