Data Mining: Data

Data Mining: Data Lecture Notes for Chapter 2 Introduction to PCA (Principal Component Analysis)

What is PCA? • Stands for “Principal Component Analysis” • Useful technique in many applications such as face recognition, image compression, finding patterns in data of high dimension • Before introducing this topic, you should know the background knowledge about • Standard deviation • Covariance • Eigenvectors • Eigenvalues (Elementary Linear Algegra)

What is PCA? • “It is a way of identifying patterns in data and expressing the data in such a way as to highlight their similarities and differences” • PCA is a powerful tool for analyzing data • Finding the patterns in the data (Feature extraction)— as in the name “Principal Component” means major or maximum information • Reducing the number of dimensions without much loss of information (data reduction, noise rejection, visualization, data compression etc.)

Application of PCA • Bivariate of Data set

Tutorial by Example • Step1: Get some data

Tutorial by Example • Step2: Make a data set whose mean is zero • Compute the mean and std, Then subtract the mean from each of data dimensions

Tutorial by Example

Tutorial by Example • Step3: Calculate the covariance matrix (see PCATutorial.pdf) Since the data is 2 dim, the covariance matrix will be 2x2 • What to notice?

Tutorial by Example • Step4: Calculate the eigenvectors and eigenvalues of the covariance matrix

Tutorial by Example • Step5: Choosing components and forming a feature vector • The eigenvector with the highest eigenvalue is the principle component of the data set • The principle component from the example • You can decide to ignore the components of lesser significance, you do lose some information • If the eigenvalues are small, you don’t lose much • If you leave out some components, the final data set will have less dimensions (features) than the original

Tutorial by Example • Then after ordering the eigenvectors by eigenvalues (highest to lowest), this can form a feature vector FeatureVector = (eig1 eig2 eig3 … eign) • From this example, we have two eigenvectors • So we have two chioces • Form a featuer vector with both of the eigenvectors • Leave out smaller, less significant component and only have a single column

Tutorial by Example • Step6 : Deriving the new data set

Data Mining: Data

Data Mining: Data

Presentation Transcript

Regression for Data Mining

Data Mining: Concepts and Techniques

Scalable Data Mining

Chapter 2 Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation

Data Mining

Data Mining Tools

CS 277: Data Mining Notes on Classification

INTRODUCTION TO DATA MINING

Web Mining : A Bird ’ s Eye View

XML and Web Data

Mining Complex Types of Data

Data Mining 2

DATA WAREHOUSING AND DATA MINING

Data Mining using Fractals and Power laws

CS590D: Data Mining Chris Clifton

DATA MINING LECTURE 4

Data Mining : Implementations

Data Mining using Fractals and Power laws