Multivariate Statistics

Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam

Overview • Introduction • Definitions • Special names • Matrix transposition • Matrix addition • Matrix multiplication

Introduction • The mathematics in which multivariate analysis is cast is matrix algebra. • We will present enough matrix algebra to facilitate the description of the operations we need to understand the matrix algebra involving the multivariate analysis discussed in this course. In addition this basic understanding is necessary for the more advanced courses of the Research Master. • Basically, all we need is a few basic tricks, at least at first. Let us summarize them, so that you will have some idea of what is coming and, more importantly, of why these topics must be mastered.

Introduction • Our point of departure is always a multivariate data matrix with a certain number, n, of rows for the individual observation units, and a certain number, m, of columns for the variables. • In most applications of multivariate analysis, we shall not be interested in variable means. They have their interest, of course, in each study, but multivariate analysis instead focuses on variances and covariances. Therefore, the data matrix will in general be transformed into a matrix where columns have zero means and where the numbers in the column represent deviations from the mean. • Such a matrix is the basis for the variance-covariance matrix with n rows and m columns. For a variable i, the variance is defined as Σxi2/n,whereas for two variables i and j the covariance is defined as Σxixj /n, xi and xjbeing taken as deviations from the mean. Variances and covariances can be collected in the variance-covariance matrix, in that the number in row i, column i (on the diagonal), gives the variance of variable i, while the number in row i, column j (i ≠ j), gives the covariance between the pair of variables i and j, and is the same number as in row j, column i. • An often useful transformation is to standardize the data matrix: we first take deviations from the mean for each column, then divide the deviation from the mean by the standard deviation for the same column. The result is that values in a column will have zero mean and unit variance. • The standardized data matrix is then the basis for calculating a correlation matrix, which is nothing but a variance-covariance matrix for standardized variables. In the diagonal of this matrix, we therefore find values equal to unity. In the other cells we find correlations: in row i, column j, we shall find the correlation coefficient rij= Σxixj /nσiσj.

Introduction • Very often we shall need a variable that is a linear compound of the initial variables. The linear compound is simply a variable whose values are obtained by a weighted addition of values of the original variables. For example, with two initial variables x1and x2, values of the compound are defined as y = w1x1 + w2x2, where w1and w2 are weights. A linear compound could also be called a weighted sum. • For some techniques of multivariate analysis, we need to be able to solve simultaneous equations. Doing so usually requires a computational routine called matrix inversion. • Multivariate analysis nearly always comes down to finding a minimum or a maximum of some sort. A typical example is to find a linear compound of some variables that has maximum correlation with some other variable (multiple correlation), or to find a linear compound of the observed scores that has maximum variance (factor analysis). Therefore, among our stock of basic tricks, we need to include procedures for finding extreme values of functions. • In addition, we shall often need to find maxima (or minima) of functions where the procedure is limited by certain side-conditions. For instance, we are given two sets of variables, and are required to find a linear compound from the first set, and another from the second set, such that the value of the correlation between these two compounds is maximum. This task can be reformulated as follows: find the two compounds in such a way that the covariance between them is maximum, given that the compounds both have unit variance. • Very often in multivariate analysis, a maximization procedure under certain side-conditions takes on a very specific and recognizable form, namely, finding eigenvectors and eigenvalues of a given matrix.

Data matrix Definitions • For multivariate statistics the most important matrix is the data matrix. Data file • The data matrix has a certain number, n, of rows for the individual observation units, and a certain number, m, of columns for the variables.

Definitions • In general a matrix has an n by m dimension. • The convention is to denote matrices by boldface uppercase letters. • The first subscript in a matrix element (xij) refers to the row and the second subscript refers to the column. • It is important to remember this convention when matrix algebra is performed.

Definitions • A vector is a special type of matrix that has only one row (called a row vector) or one column (called a column vector). Below, a is a column vector while b is a row vector. • The convention is to denote vectors by boldface lowercase letters.

Definitions • A scalar is a matrix with only one row and one column. • The convention is to denote scalars by italicized, lower case letters (e.g., x).

Special names • If n = m then the matrix is called a square matrix. • The data matrix is normally not square, but the variance-covariance matrix is; and many others. • Matrix A is square but matrix B is not square.

Special names • A symmetric matrix is a square matrix in which xij = xji , for all i and j. • The data matrix is normally not symmetric, but the variance-covariance matrix is. • Matrix A is symmetric; matrix B is not symmetric.

Special names • A diagonal matrix is a symmetric matrix where all the off diagonal elements are 0. • The data matrix is normally not diagonal, neither is the variance covariance matrix. The variance matrix is diagonal. • These matrices are often denoted with D; matrix D is diagonal.

Special names • An identity matrix is a diagonal matrix with 1s and only 1s on the diagonal, it is also sometimes called the unity matrix. • This is a useful matrix in matrix algebra. • The convention is to denote the identity matrix by I.

Special names • An unit vector is a vector containing only 1s. • This is a useful vector in matrix algebra. • The convention is to denote the identity matrix by u.

Matrix transposition • Matrix transposition is a useful transformation, with many purposes. • The transpose of a matrix is denoted by a prime (A’) or a superscript t or T (At or AT). • What it does? The first row of a matrix becomes the first column of the transpose matrix, the second row of the matrix becomes the second column of the transpose, etc.

Matrix transposition • What the transpose of A? And the dimensions of A’? • The transpose of a square matrix is a square matrix • What type of special matrix is this matrix? • What the transpose of this matrix? • The transpose of a symmetric matrix is simply the original matrix.

Matrix transposition • The transpose of a row vector will be a column vector, and the transpose of a column vector will be a row vector.

Matrix addition • To add two matrices; • they both must have the same number of rows, and • they both must have the same number of columns. • The elements of the two matrices are simply added together, element by element, to produce the results. • That is, for R = A + B, then rij = aij + bij.

Matrix addition • Matrix subtraction works in the same way, except that elements are subtracted instead of added. • What is the result of this subtraction? • What is the result of this addition?

Matrix addition • Rules for matrix addition and subtraction: • A + B = B + A Commutative • (A + B) + C = A + (B + C) Associative • (A + B)’ = A’ + B’

Matrix multiplication • Multiplication between a scalar and a vector. • Each element in the product matrix is simply the scalar multiplied by the element in the vector. • That is, for p = xa, then pij = xaij for all i and j. Thus, • The following multiplication is also defined: p = ax. That is, scalar multiplication is commutative.

Matrix multiplication • Multiplication between two vectors. • To perform this, the row vector must have as many columns as the column vector has rows. • The product is simply the sum of the first row vector element multiplied by the first column vector element plus the second row vector element multiplied by the second column vector element plus the product of the third elements, etc. • In algebra, if p = ab, then .

Matrix multiplication • Multiplication between two matrices. • This is similar to the multiplication of two vectors. • Specifically, in the expression P = AB, p=ai•b•j, where ai•is the ith row vector in matrix A and b•j is the jth column vector in matrix B. • Thus, if

Matrix multiplication • Summary of multiplication procedure.

Matrix multiplication • For matrix multiplication to be legal, the first matrix must have as many columns as the second matrix has rows. This, of course, is the requirement for multiplying a row vector by a column vector. • The resulting matrix will have as many rows as the first matrix and as many columns as the second matrix. • In the example A had 2 rows and 3 columns while B had 3 rows and 2 columns, the matrix multiplication was therefore defined resulting in matrix with 2 rows and 2 columns. • Or in general: • Dimension A is na by maDimension B is nb by mb, • Then the product P = AB is defined if ma=nb • And the dimension of P is na by mb.

Matrix multiplication • Rules for matrix and vector multiplication: • AB ≠ BA Not commutative • A(BC ) = (AB)C Associative • A(B+C) = AB + AC Distributive • (B+C)A = BA + CA • (AB) = BA • (ABC) = CBA • Rules for scalar multiplication: • xA = AxCommutative • x(A+B) = xA + xB Distributive • x(AB) = (xA)B = A(xB) Associative

Matrix multiplication • What is the product of: Not possible: [1x2][3x1]

Matrix multiplication • What is the product of: Not defined: [3x2] by [1x3]

Matrix multiplication • Matrix division. • For simple numbers, division can be reduced to multiplication by the reciprocal of the divider • 32 divided by 4, is the same as • 32 multiplied by ¼, or • multiplied by 4-1, • where 4-1 is defined by the general equality a-1a = 1. • When working with matrices, we shall adopt the latter idea, and therefore not use the term division at all; instead we take the multiplication by an inverse matrix as the equivalent of division. • However, the computation of the inverse matrix is quite complex, and discussed next time.

Multivariate Statistics