1 / 21

Matrix Factorization & Principal Component Analysis

Matrix Factorization & Principal Component Analysis. Bamshad Mobasher DePaul University. Principal Component Analysis. PCA is a widely used data compression and dimensionality reduction technique

cher
Télécharger la présentation

Matrix Factorization & Principal Component Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Matrix Factorization&Principal Component Analysis Bamshad Mobasher DePaul University

  2. Principal Component Analysis • PCA is a widely used data compression and dimensionality reduction technique • PCA takes a data matrix, A, of n objects by p variables, which may be correlated, and summarizes it by uncorrelated axes (principal components or principal axes) that are linear combinations of the original p variables • The first k components display most of the variance among objects • The remaining components can be discarded resulting in a lower dimensional representation of the data that still captures most of the relevant information • PCA is computed by determining the eigenvectors and eigenvalues of the covariance matrix • Recall: The covariance of two random variables is their tendency to vary together

  3. Covariance ofvariables i and j Mean ofvariable j Value of variable i in object m Value of variable j in object m Mean ofvariable i Sum over all n objects Principal Component Analysis (PCA) • Notes: • For a variable X, cov(X,X) = var(X) • For independent variables X and Y, cov(X,Y ) = 0 • The covariance matrix is a matrix Cwith elements Ci,j= cov(i,j) • The covariance matrix is square and symmetric. • For independent variables, the covariance matrix will be a diagonal matrix with the variances along the diagonal and covariances in the non-diagonal elements • To calculate the covariance matrix from a dataset, first center the data by subtracting the mean of each variable, then compute: 1/n (AT.A)

  4. PC 1 PC 2 Geometric Interpretation of PCA • The goal is to rotate the axes of the p-dimensional space to new positions (principal axes) that have the following properties: • ordered such that principal axis 1 has the highest variance, axis 2 has the next highest variance, .... , and axis p has the lowest variance • covariance among each pair of the principal axes is zero (the principal axes are uncorrelated). Note: Each principal axis is a linear combination of the original two variables Credit: Loretta Battaglia, Southern Illinois University

  5. Covariance Matrix - Example Centered Data Original Data X = A = Cov(X) = 1/(n-1) ATA = Covariance Matrix

  6. Eigenvalues and Eigenvectors • Finding the principal axes involves finding eigenvalues and eigenvectors of the covariance matrix (C = ATA) • eigenvalues are values () such that C.Z = .Z (Zare special vectors called eigenvectors) • This can be re-written as: (C - I).Z= 0 • So, eigenvalues can be found by solving the characteristic equation: det(C - I) = 0 • The eigenvalues, 1, 2, ... p are the variances of the coordinates on each principal component axis • the sum of all p eigenvalues equals the trace of C(the sum of the variances of the original variables) • The eigenvectors of the covariance matrix are the axes of max variance • a good approximation of the full matrix can be computed using only a subset of the eigenvectors and eigenvalues • the eigenvalues are truncated below some threshold; then the data is reprojectedonto the remaining r eigenvectors to get a rank-rapproximation

  7. Eigenvalues and Eigenvectors 1 = 73.718 2 = 0.384 3 = 0.298 Eigenvalues Covariance Matrix Note: 1+2 +3= 74.4 = trace of C (sum of variances in the diagonal) Eigenvectors Z =

  8. Reduced Dimension Space • Coordinates of each object i on the kth principal axis, known as the scores on PC k, are computed as where Uis the n x k matrix of PC scores, X is the n x p centered data matrix and Zis the p x k matrix of eigenvectors • Variance of the scores on each PC axis is equal to the corresponding eigenvalue for that axis • the eigenvalue represents the variance displayed (“explained” or “extracted”) by the kth axis • the sum of the first keigenvalues is the variance explained by the k-dimensional reduced matrix

  9. Reduced Dimension Space • So, to generate the data in the new space: • RowFeatureVector: • Matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top • RowZeroMeanData • The mean-adjusted data transposed, i.e. the data items are in each column, with each row holding a separate dimension FinalData = RowFeatureVectorxRowZeroMeanData

  10. Reduced Dimension Space U = ZT.AT= Taking only the top k =1 principle component: U = ZkT.AT=

  11. Matrix Decomposition • Matrix D = m x n • e.g., Ratings matrix with m customers, n items • e.g., term-document matrix with m terms and n documents • Typically • D is sparse, e.g., less than 1% of entries have ratings • n is large, e.g., 18000 movies (Netflix), millions of docs, etc. • So finding matches to less popular items will be difficult • Basic Idea: • compress the columns (items) into a lower-dimensional representation Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine

  12. Singular Value Decomposition (SVD) D = U SVt m x n m x n n x n n x n where: rows of Vtare eigenvectors of DtD = basis functions S is diagonal, with dii = sqrt(li) (ith eigenvalue) rows of U are coefficients for basis functions in V (here we assumed that m > n, and rank(D) = n) Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine

  13. SVD Example • Data D = Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine

  14. SVD Example • Data D = Note the pattern in the data above: the center column values are typically about twice the 1st and 3rd column values: • So there is redundancy in the columns, i.e., the column values are correlated Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine

  15. SVD Example D = U SVt where U = 0.50 0.14 -0.19 0.12 -0.35 0.07 0.41 -0.54 0.66 0.49 -0.35 -0.67 0.56 0.66 0.27 where S = 48.6 0 0 0 1.5 0 0 0 1.2 and Vt = 0.41 0.82 0.40 0.73 -0.56 0.41 0.55 0.12 -0.82 • Data D = Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine

  16. SVD Example D = U S Vt where U = 0.50 0.14 -0.19 0.12 -0.35 0.07 0.41 -0.54 0.66 0.49 -0.35 -0.67 0.56 0.66 0.27 where S = 48.6 0 0 0 1.5 0 0 0 1.2 and Vt = 0.41 0.82 0.40 0.73 -0.56 0.41 0.55 0.12 -0.82 • Data D = Note that first singular value is much larger than the others Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine

  17. SVD Example D = U SVt where U = 0.50 0.14 -0.19 0.12 -0.35 0.07 0.41 -0.54 0.66 0.49 -0.35 -0.67 0.56 0.66 0.27 where S = 48.6 0 0 0 1.5 0 0 0 1.2 and Vt = 0.41 0.82 0.40 0.73 -0.56 0.41 0.55 0.12 -0.82 • Data D = Note that first singular value is much larger than the others First basis function (or eigenvector) carries most of the information and it “discovers” the pattern of column dependence Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine

  18. Rows in D = weighted sums of basis vectors 1st row of D = [10 20 10] Since D = U S V, then D[0,: ] = U[0,: ] * S* Vt = [24.5 0.2 -0.22] * Vt Vt= 0.41 0.82 0.40 0.73 -0.56 0.41 0.55 0.12 -0.82 • D[0,: ]= 24.5 v1 + 0.2 v2 + -0.22 v3 where v1 , v2 , v3 are rows of Vt and are our basis vectors Thus, [24.5, 0.2, 0.22] are the weights that characterize row 1 in D In general, the ith row of U*S is the set of weights for the ith row in D Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine

  19. Summary of SVD Representation D = U S Vt Data matrix: Rows = data vectors Vt matrix: Rows = our basis functions U*S matrix: Rows = weights for the rows of D Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine

  20. How do we compute U, S, and V? • SVD decomposition is a standard eigenvector/value problem • The eigenvectors of Dt D = the rows of V • The eigenvectors of D Dt = the columns of U • The diagonal matrix elements in S are square roots of the eigenvalues of Dt D => finding U,S,V is equivalent to finding eigenvectors of DtD • Solving eigenvalue problems is equivalent to solving a set of linear equations – time complexity is O(m n2 + n3) Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine

  21. ~ ~ Matrix Approximation with SVD D U S Vt m x n m xk k xk k x n where: columns of V are first k eigenvectors of DtD S is diagonal with k largest eigenvalues rows of U are coefficients in reduced dimension V-space This approximation gives the best rank-k approximation to matrix D in a least squares sense (this is also known as principal components analysis) Credit: Based on lecture notes from Padhraic Smyth, University of California, Irvine

More Related