1 / 13

Exploring Data with EDA and Multivariate Analysis Techniques

This session aims to emphasize the significance of exploratory data analysis (EDA) and its role in understanding data. Participants will learn the concepts and terminology associated with EDA, multivariate analysis, and data mining. The session covers techniques for visualizing data through tables, charts, and plots, and highlights the identification of patterns across dimensions. It emphasizes the importance of verifying findings from data explorations, checks for outliers, and constructing effective presentations. Key statistical methodologies such as regression and factor analysis will also be discussed.

leone
Télécharger la présentation

Exploring Data with EDA and Multivariate Analysis Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploratory Data Analysis and Multi-variate Strategies Simon French simon.french@warwick.ac.uk

  2. Aims of Session • Understand the value of ‘looking at’ your data rather than analysing it • Be aware of terminology and basic ideas in: • Exploratory Data Analysis • Multivariate Analysis • Data Mining • Ideas of data presentation and visualisation

  3. Exploring and Visualising Data • EDA • Tables, charts, plots • Look for patterns or something interesting in 2 or 3 dimensions • Simple presentations of data • Multivariate Analysis • Factor Analysis, cluster analysis, etc • Identify patterns or something interesting in more than a few dimensions • Data mining • Automatic/Computer search for patterns in (parts of) large data sets. In all cases anything you find needs checking in all sorts of ways

  4. Cynefin and statistics Uniqueevents exploratoryanalyses Repeatable events Events? Estimation andconfirmatoryanalysis

  5. Cynefin and statistics Uniqueevents Actually you needexploratory statisticshere: outliers, residualanalysis, simple model checking Repeatable events Events?

  6. Exploratory analyses • Look at the data • In any, repeatany analysis, look at the data • It is too easy for data to pass from web questionnaire to Excel to SPSS to analysis without your looking at the data. • Simple plots and tables • Tables – do not think them ‘simple’ to construct! • Histograms, Boxplots, Scatterplots, … • Useful in presenting results too • Generally easy to produce with Excel or SPSS • If you know what you are trying to achieve • References • A.S. Ehrenberg (1986b), "Reading a Table: an Example," Applied Statistics, 35 (3), 237-44 • M. Chapman and B. Mahon (1986). Plain Figures. Edn. London, Her Majesty's Stationery Office. • J. W. Tukey (1977). Exploratory Data Analysis. Edn., Addison-Wesley. • The exploratory data analysis chapter in most statistics texts.

  7. Tables and Charts • Clarify in titles and notes • What the data are and where they come from • Units • 2 or 3 ideas can be shown/explored in a table or chart … no more • Do not make over ‘busy’ • x’s not dustbins for data on waste! • Do not introduce spurious features • E.g. number the data and accidentally introduce a ranking • Watch for cognitive aspects • Appropriate scales • Appropriate number of significant figures • In tables: put important variation down the columns • Use of colour • red-green  bad (‘stop’) and good (‘go’) or just colour blind

  8. Regression and Factor Analysis as exploratory analyses • Often (usually!!!) data is multi-dimensional • It is difficult to see the key trends and variations by eye • Regression and factor analyses reduce dimensions to the ‘significant’ ones

  9. Regression Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points 16 (x,y) points = 32 numbers

  10. Regression Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points Regression line: y = mx + cPlus standard deviation 3 numbers …Trend, base case, and spread

  11. Factor Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points 16 (x,y) points = 32 numbers

  12. Factor Analysis x x x x x x x x x x x x x x x x x Describe the cloud of data points Project each point onto line of greatest variation -- 16 numbers Keeps each item separate in summary

  13. Data Mining • Huge data sets, many dimensions, very many inhomogeneous objects • Biological/genetic data • Large scale longitudinal population studies • Loyalty cards • … • Computer searches for patterns (often conditional patterns in parts of the data) • Beware: seek and you will find ….SO CHECK!!!!

More Related