DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering Maastricht University

PART 2 Exploratory Data Analysis

VISUALISING AND EXPLORING DATA-SPACE Data Mining Lecture II [Chapter 3 from Principles of Data Mining by Hand,, Manilla, Smyth ]

Observe that the spatial extent appears different in each dimension. Also observe that in this case the set is almost 1-dimensional. Can we project the set so that the spatial extent in one dimension is optimal?

a Data X: n rows of p fields: the vectors are rows in X. STEP 1: Subtract the average value from the dataset X: mean centered data. The spatial extent of this cloud of points can be measured by the variance in the dataset X. This is an entry in the correlation matrix V = XTX. The projection of the dataset X in a direction a is: y = Xa. The spatial extent in direction a isthe variance in the projected dataset Y: i.e. the variance σa2 =yTy = (Xa)T(Xa) = aTXTXa = aTV a . We now want to maximize this extent σa2 over all possible vectors a (why?).

STEP 2: Maximize: σa2 =aTV aover all possible vectors a. This is unlimited, just like maximizing x2over x, therefore we restrict the size of vector a to 1: aTV a – 1 = 0 So we have: maximize: aTV asubject to:aTV a – 1 = 0 This can be solved with the Lagrange-multipliers method: maximize: f(x)subject to:g(x) = 0 → d/dx{ f(x) – λg(x)} = 0 For our case this means: d/da{ aTV a – λ(aTV a – 1 )} = 0 →2 Va – 2λa = 0 →Va = λa This means that we are looking for the eigen-vectors and eigen-values of the correlation matrix V = XTX.

MEAN

Principal axis 1 Principal axis 2 MEAN

EXAMPLE of PCA

astronomical application: PCs for elliptical galaxies Rotating to PC in BT – Σ space improves Faber-Jackson relation as a distance indicator Dressler, et al. 1987

astronomical application: Eigenspectra (KL transform) Connolly, et al. 1995

1 pc 2 pc 4 pc 3 pc

DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

Presentation Transcript

From Data to Knowledge: Web-Based Knowledge Engineering System

FROM DATA TO INFORMATION TO KNOWLEDGE

Intro to Data Mining: Extracting Information and Knowledge from Data

From Data to Knowledge

From Data to Knowledge

DATA MINING Extracting Knowledge From Data

Data Mining: Discovering Information From Bio-Data

RAADSELS VAN DE STERRENKUNDE Ronald Westra Dep. Mathematics

From Data to Knowledge

From Data to Knowledge: Web-Based Knowledge Engineering System

Data Mining: Extracting Knowledge from Past Data

DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University

DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University

DATA MINING from data to information Ronald Westra Dep. Mathematics Knowledge Engineering

From Data Mining to Knowledge Discovery: An Introduction

Ronald L. Westra Department Mathematics Maastricht University

From data to information to knowledge

From Data to Knowledge: Web-Based Knowledge Engineering System

DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University

From data to information to knowledge