Understanding the Least Squares Method in Large-Scale Electrical Systems

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Lecture 19: Least Squares Prof. Tom Overbye Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign overbye@illinois.edu Special Guest Lecture by Dr. Hao Zhu

Announcements • HW 6 is due Thursday November 7

Least Squares • So far we have considered the solution of Ax = b in which A is a square matrix; as long as A is nonsingular there is a single solution • That is, we have the same number of equations (m) as unknowns (n) • Many problems are overdetermined in which there more equations than unknowns (m > n) • Overdetermined systems are usually inconsistent, in which no value of x exactly solves all the equations • Underdetermined systems have more unknowns than equations (m < n); they never have a unique solution but are usually consistent

Method of Least Squares • The least squares method is a solution approach for determining an approximate solution for an overdetermined system • If the system is inconsistent, then not all of the equations can be exactly satisfied • The difference for each equation between its exact solution and the estimated solution is known as the error • Least squares seeks to minimize the sum of the squares of the errors • Weighted least squares allows differ weights for the equations

Least Squares Solution History • The method of least squares developed from trying to estimate actual values from a number of measurements • Several persons in the 1700's, starting with Roger Cotes in 1722, presented methods for trying to decrease model errors from using multiple measurements • Legendre presented a formal description of the method in 1805; evidently Gauss claimed he did it in 1795 • Method is widely used in power systems, with state estimation the best known application, dating from Fred Schweppe's work in 1970 • State estimation is covered in ECE 573

Least Squares and Sparsity • In many contexts least squares is applied to problems that are not sparse. For example, using a number of measurements to optimally determine a few values • Regression analysis is a common example, in which a line or other curve is fit to potentially many points) • Each measurement impacts each model value • In the classic power system application of state estimation the system is sparse, with measurements only directly influencing a few states • Power system analysis classes have tended to focus on solution methods aimed at sparse systems; we'll consider both sparse and nonsparse solution methods

Least Squares Problem • Consider or

Least Squares Solution • We write (ai)T for the row i of Aand aiis a column vector • Here, m ≥ n and the solution we are seeking is that which minimizes Ax - bp, where pdenotes some norm • Since usually an overdetermined system has no exact solution, the best we can do is determine an x that minimizes the desired norm.

Example 1: Choice of p • We discuss the choice of p in terms of a specific example • Consider the equation Ax = b with (hence three equations and one unknown) • We consider three possible choices for p:

Example 1: Choice of p (i) p= 1 (ii) p= 2 (iii) p= 

The Least Squares Problem • In general, is not differentiable for p = 1 or p = ∞ • The choice of p = 2 has become well established given its least-squares fit interpretation • We next motivate the choice of p = 2 by first considering the least–squares problem

The Least Squares Problem • The problem is tractable for 2 major reasons (i) the function is differentiable in x ; and

The Least Squares Problem (ii) the 2 norm is preserved under orthogonal transformations: with Q an arbitrary orthogonal matrix; that is, Q satisfies

The Least Squares Problem • We introduce next, the basic underlying assumption: Ais full rank, i.e., the columns of A constitute a set of linearly independent vectors • This assumption implies that the rank of A is nbecause n ≤ m since we are dealing with an overdetermined system • Fact: The least squares solution x* satisfies

Proof of Fact • Since by definition the leastsquares solution x* minimizes at the optimum, the derivative of this function vanishes:

Implications • This underlying assumption implies that A is full rank • Therefore, the fact that ATA is positive definite (p.d.) follows from considering any x ≠ 0 and evaluating which is the definition of a p.d. matrix • We use the shorthand ATA > 0 for ATA being a symmetric, positive definite matrix

Implications • The underlying assumption that A is full rank and therefore ATA is p.d. implies that there exists a unique leastsquares solution • Note: we use the inverse in a conceptual, rather than a computational, sense • The below formulation is known as the normal equations, with the solution conceptually straightforward

Implications • An important implication of positive definiteness is that we can factor ATA since ATA> 0 • The expression ATA = GTG is called the Cholesky factorization of the symmetric positive definite matrix ATA

Least Squares Solution Algorithm Step 1: Compute the lower triangular part of ATA Step 2: Obtain the Cholesky Factorization Step 3: Compute Step 4: Solve for y using forward substitution in and for x using backward substitution in

Practical Considerations • The two key problems that arise in practice with the triangularization procedure are: (i) While A maybe sparse, ATAis much less sparse and consequently requires more computing resources for the solution (ii)ATAmay be numerically less well-conditioned than A • We must deal with these two problems 20

Example 2: Loss of Sparsity • Assume the B matrix for a network is • Then BTB is • Second neighbors are now connected! But large networks are still sparse, just not as sparse

Numerical Conditioning • To understand the point on numerical ill-conditio-ning, we need to introduce terminology • We define the norm of a matrix to be

Numerical Conditioning i.e., li is a root of the polynomial • In words, the 2 norm of B is the square root of the largest eigenvalue of BTB

Numerical Conditioning • The conditioning number of a matrix B is defined as • A well–conditioned matrix has a small value of , close to 1; the larger the value of , the more pronounced is the ill-conditioning

Numerical Conditioning • The illconditioned nature of ATA may severely impact the accuracy of the computed solution • We illustrate the fact that an illconditioned matrix ATA results in highly sensitive solutions of leastsquares problems with the following example:

Example 3: Ill-Conditioned ATA • Consider the matrixThen

Example 3: Ill-Conditioned ATA • We consider a “noise” in A to be the matrix dAwith

Example 3: Ill-Conditioned ATA • The noise leads to an error E in the computation of ATA with • Let and assume that there is nonoise in b, that is, db = 0

Example 3: Ill-Conditioned ATA • The resulting error in solving the normal equations is independent of db since it is cause purely by nd is • Let x be the true solution of the normal equationso the solution of is x = [1 0]T

Example 3: Ill-Conditioned ATA • Let x' be the solution of the system with the error arising due to dA,i.e., the solution of • Therefore

Example 3: Ill-Conditioned ATA Implies that Therefore the relative error isNow, the conditioning number of ATA is

Example 3: Ill-Conditioned ATA • Since • The product is • Thus the conditioning number is a major contributor to the error in the computation of x • In other words, the sensitivity of the solution to any error, be it data entry or of a numerical nature, is very dependent on the conditioning number

What can be done? • Introduce regularization term to the LS cost • Ridge regression (l2 norm regularization) • At the optimum, the derivative • Different inverse matrix (improving the conditioning)

Example 4: Ridge regression • Recalling Example 3 and • Ridge regression solution with versus • x= [1 0]T

Example 4: Ridge regression • With noise matrix and • Ridge regression solution with versus

Regularization • Can be used for solving underdetermined systems too • Level of regularization important! • Large λ : better conditioning number, but less accurate • Small λ: close to LS, but not improving conditioning • Recent trend: sparsity regularization using l1norm

The Least Squares Problem • With this background we proceed to the typical schemes in use for solving least squares problems, all along paying adequate attention to the numerical aspects of the solution approach • If the matrix is full, then often the best solution approach is to use a singular value decomposition (SVD), to form a matrix known as the pseudo-inverse of the matrix • We'll cover this later after first considering the sparse problem • We first review some fundamental building blocks and then present the key results useful for the sparse matrices common in state estimation

Householder Matrices and Vectors • Consider the nn matrix where, is called a Householder vector • Note that the definition of P in terms of vector v implies the following properties for P: Symmetry: Orthonormality:

Householder Matrices and Vectors • Let x n be an arbitrary vector; then • Now, suppose we want P x to be a multiple of e1, the first unit vector and so P x is a linear combination of the x and v vectors • Then, v is a linear combination of x and e1, and we writeso that

Householder Matrices and Vectors and • Therefore,

Householder Matrices and Vectors • For the coefficient of x to vanish, we require thator so that • Consequentlyso thatThus the determination of v is straightforward

Example 4: Construction of P • Assume we are given • Then

Example 4: Construction of P • Then • It follows then that

Understanding the Least Squares Method in Large-Scale Electrical Systems

Understanding the Least Squares Method in Large-Scale Electrical Systems

Presentation Transcript

AnyGL: A Large Scale Hybrid Distributed Graphics System

Large-Scale SQL Server Deployments for DBAs

COEN 252 Computer Forensics

Chapter 6 Systems Development Steps, Tools, and Techniques

4. SCALE-UP OF BIOREACTOR SYSTEMS

Impact, Washback and Consequences of Large-scale Testing

COEN 252 Computer Forensics

Voltage Collapse in Electrical Power Systems

AIRCRAFT ELECTRICAL SYSTEMS

Large-Scale Copy Detection

Thesis Defense Large -Scale Graph Computation on Just a PC

Subjectivity and Sentiment Analysis: from Words to Discourse

GraphChi : Large-Scale Graph Computation on Just a PC

Ignition and Electrical Systems

Overview

POWER ELECTRONICS EPE 550 CIRCUITS, DEVICES, AND APPLICATIONS ELECTRICAL DRIVES:

Scalable Web Architectures

LECTURE 2

Week 4 The Large Scale Universe

Large Scale Studies of Dyslexia in Florida

Characteristics of Effective Selection Techniques