1 / 33

Analysing Data

Analysing Data. Module I3 Session 1 . Session 1 2 & 3 4 & 5 6 & 7 8 & 9 10 to 12 13 14 to 16 17 & 18 19 & 20. Contents Review of concepts from Basic Level Graphical summaries for quantitative data Numerical summaries for quantitative data Processing single and multiple variables

goldy
Télécharger la présentation

Analysing Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysing Data Module I3 Session 1

  2. Session 1 2 & 3 4 & 5 6 & 7 8 & 9 10 to 12 13 14 to 16 17 & 18 19 & 20 Contents Review of concepts from Basic Level Graphical summaries for quantitative data Numerical summaries for quantitative data Processing single and multiple variables Risks and return periods Tables for frequencies and other statistics Introducing statistics packages Coping with common complications Group project Presentation and evaluation Contents of this Module

  3. Modules at basic and Intermediate levels • Module B2 • From the data to the report • Intermediate level • Module I1 – collecting data (follows from B1) • Modules I2 to I4 follow from B2 • Module I2 – organising the data • Module I3 – analysing the data • Module I4 – presenting results

  4. Module Objectives Successful students will be able to: • Use descriptive analysis tools • to answer practical questions. • Produce descriptive statistical analyses • including summary statistics, tables and graphs. • Interpret common summary statistics, • particularly measures of variation. • Produce summary statistics in a range of ways • to suit different types of user. • Suggest ways of coping with common complications when analysing survey data. • Work constructively in a team • to produce an analysis on time. • Evaluate team skills of themselves and others.

  5. Pre-requisites - computing Assume Basic Level or equivalent • Mainly use of Excel: • Importance of having data in list format • Pivot tables and pivot graphs • Calculations to produce percentages and proportions from frequencies • Familiarity with SSC-Stat add-in • Some familiarity with Word and Powerpoint • Though manly needed for Module I4

  6. Pre-requisites - statistics Assumes Basic Level or equivalent • What is statistics? • Module B2 Session 3 • Use of CAST • Interactive statistics textbook • Types of data and appropriate summaries • Categorical and numerical • Enthusiasm and no fear • Module B2 showed statistics was logical and not so difficult?

  7. Session Overview • Activity 1:Introduction • This PowerPoint presentation • Activity 2 to 5:Practical • CAST • Excel for Tables • Dot plots in CAST and Excel • The objectives of an analysis • Activity 6: Summary of key ideas • This presentation continued

  8. Learning objective • Answer questions expected of students who have taken module B2.

  9. Practical work • You use CAST • At basic level • To review tables and also dot plots • You use Excel • To produce and edit pivot tables • And to produce dot plots • You view demonstrations • To remind you of Excel • And also statistics • Then the key ideas are discussed • and some of the case studies are re-introduced

  10. This is a problem based course • Examples are used throughout: • Skills and tools are introduced to solve problems • Survey of Principles of Official Statistics • Used extensively in B2 • Also useful to remember the principles • Are countries applying them yet? • Rice survey data • Used in B2 to illustrate many ideas • and produced in the paddy (simulation) game in I1 • Tanzania and Swaziland agriculture data • Large surveys • Will be used again in this module

  11. CAST • CAST is an electronic textbook • It was used extensively in Module B2 • It covers key topics in an interactive way. • Some from the course • Others related to but not covered by the course • As the course progresses students are expected to • Work independently more and more • Read around • Use books to enrich the course materials

  12. Editing a pivot table How did you do? Was it easy? What questions do you have?

  13. Rice Survey Case Study - objectives Overall objectives are: • To estimate the total production in the district • To examine the relationship with inputs

  14. Analyses corresponding to simple objectives

  15. More complicated objectives Objectives require analysis of a single column or variable Some variables are categorical Others are numerical Objectives require analysis of multiple variables

  16. Using Excel effectively • Dot plots are not on Excel’s menus • Dot plots are not in Excel’s help • But you decided to do dot plots in Excel! • You therefore need to understand them better • So you can construct them yourself • And this understanding is good anyway • And helps with effective data analysis • It is an example • Of you controlling the software • And not being limited by it • That applies to all software

  17. Jittered dot plots in CAST and Excel CAST EXCEL Rainfall data: 608, 746, 767, ….. 1395, 1425, 1482

  18. Jittered dot plots in CAST and Excel CAST EXCEL Why are the vertical heights different in the 2 cases?

  19. Excel for analysis and training • Excel is not designed as a training resource • Unlike CAST – that is all CAST is for • Excel is to support • data organisation • and analysis • But we used it also to support training • With dot plots • And stem and leaf plots • Neither of which are in the Excel menus

  20. Data exploration • Before and during formal analysis • For all variables • But particularly for numerical variables • That are treated extensively in this module • Review data exploration from Module B2

  21. Dot plots - yield by variety Outliers (typing errors) are clear, but only because of the 2nd variable They are not outliers overall

  22. EDA is a continuous process • EDA effectively is a continuation of the data checking process • The example on the previous slide shows • how some oddities only become clear once the analysis is undertaken • This continues into the formal analysis • where it involves looking at the “residuals” • They are the unexplained variation • As discussed in Module B2 Session 3! • So analysis is not just a set of rules • It is a thoughtful process • Where you become the data detective!

  23. Swaziland data was for checking

  24. Investigating the column called Presence What does 0 mean? Why are there blanks? Next steps: 1. Look at the questionnaire 2. Select these records You are becoming detectives!

  25. Codes for the column Seems clear enough. Zeros and blanks still a puzzle

  26. Selecting the blank records Missing also Too young and all the same Crop code not recognised Areas too large i.e. serious problems with the whole record

  27. Dot plot of area by Presence Odd crop areas were ALL associated with odd codes for the column PRESENCE It was found to be a data transfer problem with one byte missing in these records

  28. Tanzania agriculture survey This is the variable we wish to explore. It is a value between 0 and 100

  29. The data in Excel The variable to explore before analysis

  30. How to explore this value • Try a pivot table • a powerful feature in Excel • used previously on categorical data • Used here for a numerical variable

  31. Some results

  32. Drilling down – an example Make the 6 corresponding to 2% the active cell Then double click to give the detail 4 of these values are from the same village – so same enumerator

  33. Are you now ready for module I3? To continue to build skills for data analysis

More Related