100 likes | 261 Vues
Multivariate Statistics. Harry R. Erwin, PhD School of Computing and Technology University of Sunderland. Resources. Everitt , BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold .
E N D
Multivariate Statistics Harry R. Erwin, PhD School of Computing and Technology University of Sunderland
Resources • Everitt, BS, and G Dunn (2001) Applied Multivariate Data Analysis, London:Arnold. • Everitt, BS (2005) An R and S-PLUS® Companion to Multivariate Analysis, London:Springer
Introduction • Most statistical data sets are multivariate. • Sometimes it’s useful to study a variable in isolation, but usually you need to examine all the variables to understand the data. • The next few lectures are the core of this module. • We will examine the description, exploration, and analysis of multivariate data.
Multivariate Data • Natural form of multivariate data is a table or data frame. • Kinds of data • Unordered categorical variables (nominal data) • Ordinal data (numbered but not measured) • Interval data (measured data) • Ratio data (numerical with a defined ‘zero’) • Missing values (common)
Handling Missing Data • Ignore it. • Often biased. • Fill in plausible values • Known as imputation • Advanced topic • Be aware this is a problem area
Summary Statistics • Means • Generated by mean • Variances • Generated by var • Covariances • Also generated by var • Correlation coefficients • Generated by cor • Distances • Generated by dist
Aims • Data exploration (data mining) • Looking for non-random patterns and structures • Visual and graphical displays • Confirmatory analysis (later in the module) • Statistical testing
Looking at Multivariate Data • Scatterplots • Demonstration • “The convex hull of bivariate data” • Demonstration • Chiplot • Demonstration • BivariateBoxplot • Demonstration
More Multivariate Graphics • Bivariate Densities • Demonstration • Other Variables in a Scatterplot • Demonstration • Scatterplot Matrix • Demonstration of pairs • 3-D Plots • Demonstration • Conditioning Plots and Trellis Graphics • Demonstration
Summary • Most statistical data are multivariate. • Most multivariate data have structure. • Detecting that structure is what data mining is all about. • Most data mining involves data visualisation and graphing—nothing more. • Most of your conclusions from data mining will be obvious—once you see them! • And you really don’t need to learn very much statistics to be good at multivariate data analysis.