1 / 13

Exploratory Data Analysis and Multi- variate Strategies

Exploratory Data Analysis and Multi- variate Strategies. Simon French simon.french@warwick.ac.uk. Aims of Session. Understand the value of ‘looking at’ your data rather than analysing it Be aware of terminology and basic ideas in: Exploratory Data Analysis Multivariate Analysis

leone
Télécharger la présentation

Exploratory Data Analysis and Multi- variate Strategies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploratory Data Analysis and Multi-variate Strategies Simon French simon.french@warwick.ac.uk

  2. Aims of Session • Understand the value of ‘looking at’ your data rather than analysing it • Be aware of terminology and basic ideas in: • Exploratory Data Analysis • Multivariate Analysis • Data Mining • Ideas of data presentation and visualisation

  3. Exploring and Visualising Data • EDA • Tables, charts, plots • Look for patterns or something interesting in 2 or 3 dimensions • Simple presentations of data • Multivariate Analysis • Factor Analysis, cluster analysis, etc • Identify patterns or something interesting in more than a few dimensions • Data mining • Automatic/Computer search for patterns in (parts of) large data sets. In all cases anything you find needs checking in all sorts of ways

  4. Cynefin and statistics Uniqueevents exploratoryanalyses Repeatable events Events? Estimation andconfirmatoryanalysis

  5. Cynefin and statistics Uniqueevents Actually you needexploratory statisticshere: outliers, residualanalysis, simple model checking Repeatable events Events?

  6. Exploratory analyses • Look at the data • In any, repeatany analysis, look at the data • It is too easy for data to pass from web questionnaire to Excel to SPSS to analysis without your looking at the data. • Simple plots and tables • Tables – do not think them ‘simple’ to construct! • Histograms, Boxplots, Scatterplots, … • Useful in presenting results too • Generally easy to produce with Excel or SPSS • If you know what you are trying to achieve • References • A.S. Ehrenberg (1986b), "Reading a Table: an Example," Applied Statistics, 35 (3), 237-44 • M. Chapman and B. Mahon (1986). Plain Figures. Edn. London, Her Majesty's Stationery Office. • J. W. Tukey (1977). Exploratory Data Analysis. Edn., Addison-Wesley. • The exploratory data analysis chapter in most statistics texts.

  7. Tables and Charts • Clarify in titles and notes • What the data are and where they come from • Units • 2 or 3 ideas can be shown/explored in a table or chart … no more • Do not make over ‘busy’ • x’s not dustbins for data on waste! • Do not introduce spurious features • E.g. number the data and accidentally introduce a ranking • Watch for cognitive aspects • Appropriate scales • Appropriate number of significant figures • In tables: put important variation down the columns • Use of colour • red-green  bad (‘stop’) and good (‘go’) or just colour blind

  8. Regression and Factor Analysis as exploratory analyses • Often (usually!!!) data is multi-dimensional • It is difficult to see the key trends and variations by eye • Regression and factor analyses reduce dimensions to the ‘significant’ ones

  9. Regression Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points 16 (x,y) points = 32 numbers

  10. Regression Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points Regression line: y = mx + cPlus standard deviation 3 numbers …Trend, base case, and spread

  11. Factor Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points 16 (x,y) points = 32 numbers

  12. Factor Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points Project each point onto line of greatest variation -- 16 numbers Keeps each item separate in summary

  13. Data Mining • Huge data sets, many dimensions, very many inhomogeneous objects • Biological/genetic data • Large scale longitudinal population studies • Loyalty cards • … • Computer searches for patterns (often conditional patterns in parts of the data) • Beware: seek and you will find ….SO CHECK!!!!

More Related