1 / 103

Aggregate Data and Statistics

Aggregate Data and Statistics. Wendy Watkins Carleton University. Chuck Humphrey University of Alberta. Statistics Canada Data Liberation Initiative . Outline. What are aggregate data? Why aggregate? How to aggregate? Computing exercise. What are aggregate data?.

juro
Télécharger la présentation

Aggregate Data and Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Aggregate Data and Statistics Wendy Watkins Carleton University Chuck Humphrey University of Alberta Statistics Canada Data Liberation Initiative

  2. Outline • What are aggregate data? • Why aggregate? • How to aggregate? • Computing exercise

  3. What are aggregate data? Let’s start with the relationship between statistics and data.

  4. Statistics and Data Data • numeric files created and organized for analysis • requires processing • not ready for display Statistics • numeric facts/figures • created from data, i.e, already processed • presentation-ready

  5. Statistics and Data

  6. Statistics and Data

  7. Statistics and Data In short, statistics are created from data and represent summaries of the detail observed in the data.

  8. What is aggregation? Building on this example, let’s explore aggregation. We see a table with the number of smokers summarized over categories for age, education, sex, geography, and different time points.

  9. Categories of Periods Statistics Categories of Sex Categories of Region Age and Education are in the background and display totals.

  10. What is aggregation? Aggregation involves tabulating a summary statistic across all of the categories or levels of a set of variables.

  11. The summary statistic The summary statistic in this example is the total number of smokers.

  12. Variables and categories The variables and their categories are: Region (11): Canada and the ten provinces Age (5) : Total, 15-19, 20-44, 45-64, 65+ Sex (3) : Total, Female, Male Education (4) : Total, Some secondary or less, Secondary graduate or more, Not stated Periods (5) : 1985, 1989, 1991, 1994-95, 1996-97

  13. Variables and categories The tabulation consists of determining the combinations of all categories across variables and then counting the number of smokers within each of these combinations. 11 x 5 x 3 x 4 x 5 = 3300 category combinations

  14. Tabulating or aggregating One might be wondering if there is a difference between tabulating and aggregating. Usually, they are the same thing.

  15. Tabulating = aggregating In creating tables from data, the variables are arranged in various combinations along the columns and the rows.

  16. Tabulating = aggregating Placing multiple variables along the columns or rows is called nesting. Tables may have variables nested on both the columns and rows.

  17. Categories of Sex nested within Periods

  18. Categories of Education nested within Sex Categories of Sex nested within Region

  19. A quick summary Up to this point, we have noted that • statistics are created from data • aggregations consist of tabulating statistics within the categories of selected variables • variables may be nested within columns and rows to display these tabulations

  20. What are aggregate data? Q: What is the difference between an aggregation or tabulation and aggregate data? A: The display of the aggregation (that is, the structure of the tabulated output).

  21. Statistical data structure A statistical data structure is a fixed, two-dimensional matrix with the variables in the columns and cases in the rows. V1 V2 V3 V4 V5 V6 V7 Case 1 Case 2 Case 3 Case 4 Case 5 Case 6 Case 7

  22. Statistical data structure Aggregate data require the same type of statistical data structure. With aggregate data, the variables are the cells of a tabulation while the cases are the categories of one or more of the table’s dimensions.

  23. What are aggregate data? Here is an example of a tabulation of three variables. One variable consists of five levels and will be used to represent the cases in an aggregate data file. The other two variables make up a two-way table and the cells in this table will be the variables in the aggregate data file.

  24. R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 From tabulation to data Dimension 1 5 4 Dimension 3 3 2 1 Dimension 2

  25. From tabulation to data Start with dimension 3 and convert this dimension into the rows or cases of an aggregate data file.

  26. R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 From tabulation to data Dimension 1 5 4 Dimension 3 3 2 1 Dimension 2

  27. R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 From tabulation to data Dimension 1 Dimension 2

  28. 1 2 3 4 5 From tabulation to data Dimension 3

  29. From tabulation to data Working with the six cells from the tabulation for level 1 of dimension 3, locate these six cells as six variables in the new data structure.

  30. R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 From tabulation to data R1C1 Dimension 1 Dimension 2

  31. 1 2 3 4 5 From tabulation to data R1C1 Dimension 3

  32. R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R2 C1 R3 C1 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 From tabulation to data R2C1 Dimension 1 Dimension 2

  33. 1 2 3 4 5 From tabulation to data R1C1 R2C1 Dimension 3

  34. R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R3 C1 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 From tabulation to data Dimension 1 R3C1 Dimension 2

  35. 1 2 3 4 5 From tabulation to data R1C1 R2C1 R3C1 Dimension 3

  36. R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 From tabulation to data R1C2 Dimension 1 Dimension 2

  37. 1 2 3 4 5 From tabulation to data R1C2 R1C1 R2C1 R3C1 Dimension 3

  38. R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R2C2 R3 C2 From tabulation to data R1C2 Dimension 1 Dimension 2

  39. 1 2 3 4 5 From tabulation to data R1C2 R2C2 R1C1 R2C1 R3C1 Dimension 3

  40. R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R3 C2 From tabulation to data Dimension 1 R3C2 Dimension 2

  41. 1 2 3 4 5 From tabulation to data R1C2 R2C2 R3C2 R1C1 R2C1 R3C1 Dimension 3

  42. R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 From tabulation to data Dimension 1 Dimension 2

  43. From tabulation to data Repeat this for level 2 of dimension 3.

  44. R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 From tabulation to data Dimension 1 Dimension 2

  45. 1 2 3 4 5 R1C2 R1C2 R2C2 R2C2 R3C2 R3C2 R1C1 R1C1 R2C1 R2C1 R3C1 R3C1 From tabulation to data Dimension 3

  46. R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 From tabulation to data Dimension 1 Dimension 2

  47. From tabulation to data Now for level 3 of dimension 3, etc.

  48. R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C1 R2 C1 R3 C1 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 R1C2 R2 C2 R3 C2 From tabulation to data Dimension 1 Dimension 2

  49. 1 2 3 4 5 R1C2 R1C2 R1C2 R2C2 R2C2 R2C2 R3C2 R3C2 R3C2 R1C1 R1C1 R1C1 R2C1 R2C1 R2C1 R3C1 R3C1 R3C1 From tabulation to data Dimension 3

More Related