1 / 59

The Winter Institute on Statistical Literacy for Librarians

The Winter Institute on Statistical Literacy for Librarians. Demystifying statistics for the practitioner. Outline. Introductions A framework for understanding statistics Statistics shaped by geography Official statistics: national Official statistics: international Non-official statistics

chaeli
Télécharger la présentation

The Winter Institute on Statistical Literacy for Librarians

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Winter Institute on Statistical Literacy for Librarians Demystifying statistics for the practitioner Anna Bombak, Chuck Humphrey, Larry Laliberte, David Sulz, and Amanda Wakaruk February 20-22, 2014

  2. Outline • Introductions • A framework for understanding statistics • Statistics shaped by geography • Official statistics: national • Official statistics: international • Non-official statistics • Applying what you have learned Day 1 Day 2 Day 3

  3. Introductions: your backgrounds • Please introduce yourself • Your name • Your institutional affiliation • Your job responsibilities

  4. Introductions: your backgrounds • Three-fourths are from academic libraries. The split in earlier Institutes was closer to 50/50. • The largest group, with 10, is from universities other than the U of A. • The second largest group, with 5, is from the U of A. (10) Other Universities Academic (05) U. of Alberta (03) Government Non-Academic (02) Public / Special * Work affiliation of one participant unknown

  5. Introductions: your backgrounds • Geographically, 10 of you are from outside Alberta. • Nine are from five other provinces: two ON, two BC, two NB, two SK, one MB. • Eleven are from the Edmonton region. • This is the first year that we have a participant from South Africa, welcome! (11) Alberta OutsideAlberta (09) (01) S.A.

  6. Uses of quantitative evidence • To provide a description • This typically entails answering the question about the scale or scope of something observable and its characteristics. • To make a comparison • This usually involves establishing the degree of similarity or dissimilarity among observables. • To identify a relationship • This method looks at the correlation among characteristics of observables, that is, how are things related?

  7. Statistics are ubiquitous “Statistics are generated today about nearly every activity on the planet. Never before have we had so much statistical information about the world in which we live. Why is this type of information so abundant? For one thing, statistics have become a form of currency in today’s information society. Through computing technology, society has become very proficient in calculating statistics from the vast quantities of data that are collected. As a result, our lives involve daily transactions revolving around some use of statistical information.” Data Basics, page 1.1

  8. Statistics: what are we talking about? • Statistics and data are related but different

  9. Statistics numeric facts & figures derived from data, i.e, already processed needs definitions & classifications presentation-ready published Data numeric files created and organized for analysis or processing requires processing needs detailed documentation not display-ready disseminated, not published How statistics and data differ

  10. A statistic isn’t real without data • A ‘real’ statistic requires a data source. If the publisher of a statistic can’t tell you the data source behind a statistic, you should question that the statistic is ‘real.’ After all, people do make up statistics. • Notorious example: In an interview with Meredith Whitney on the December 19, 2010 episode of CBS’ 60 Minutes, she claimed that 50 to 100 “sizable” cities and counties in the U.S. would default on billions of dollars of municipal bonds. Her estimate sparked a mini-panic on the bond market. She refused to release the report behind these predictions on the grounds that her research is proprietary. Bloomberg revealed on February 1, 2011 that she “doesn’t have any numbers to back up her assertions -- she pulled the numbers out of thin air.”

  11. A statistic isn’t real without data • Some make wrong generalizations from statistics. • Notorious example: Approximately two year ago during the Republican Party presidential primaries, Rick Santorum claimed on television and on the campaign trail that “"62 percent of kids who enter college with some kind of faith commitment leave without it.” Stephen Colbert suggested that this statistic had to be taken “on faith.” Jonathan Hill reported that “Studies using comparable data from recent cohorts of young people (for example, the National Longitudinal Survey of Youth 1997, the National Longitudinal Study of Adolescent Health, and the National Study of Youth and Religion) have found virtually no overall differences on most measures of identity, practice, and belief between those who [go] to college and those who do not.”

  12. A statistic isn’t real without data • A statistic may have been derived from poor quality data and, consequently, may be of limited value. But nevertheless, it remains a ‘real’ statistic. • The desire is to have quality statistics that are derived from quality data. • Notorious example: A long-standing debate erupted over a Lancet article published in 2004 that estimated the number of civilian deaths in Iraq, following the 18 months after the invasion, to be around 98,000. The Iraq Body Count project compiled a database of reported civilian deaths showing between 11,000 and 13,000 deaths in this same period. The UK governmentembraced statistics from the Iraq Ministry of Health, which reported 3,853 civilian deaths and 15,517 injuries over six months in 2004.

  13. Statistics Canada’s quality criteria • Statistics Canada uses the following criteria to define quality statistics or statistics “fit for use” • Relevance: addresses issues of important to users • Accuracy: degree it describes what it was designed to measure • Timeliness: the delay between when the information was collected and when it is made available • Accessibility: the ease to which the information can be obtained by users • Interpretability: access to metadata that facilitates interpretation and use • Coherence: the fit with other statistical information through the use of standard concepts, classifications and target populations

  14. Statistics are about definitions

  15. Six dimensions or variables in this table Concepts and definitions Geography Region Time Periods Social Content Smokers Education Age Sex The cells in the table are the number of estimated smokers.

  16. Statistics are about definitions! Statistics are dependent on definitions. You may think of statistics as numbers, but the numbers represent measurements or observations based on specific definitions. Tables, a common tool for displaying statistics, are structured around geography, time and content based on the attributes of the unit of observation. These properties are all depend on definitions.

  17. Statistics are about definitions! • Consider the following example from the Canadian Census on the data behind statistics about visible minorities. This table displays the size of the visible minority population in Canada from the 2006 Census. Visible Minority Groups (15), Generation Status (4), Age Groups (9) and Sex (3) for the Population 15 Years and Over of Canada, Provinces, Territories, Census Metropolitan Areas and Census Agglomerations, 2006 Census - 20% Sample Data

  18. Statistics are about definitions! • How is visible minority status identified in the Census? Are aboriginals among the visible minority in Canada? What is the definition of visible minority?

  19. Statistics involve classifications Classifications Sex Total Male Female Periods 1994-1995 1996-1997

  20. Statistics involve classifications Some classifications are based on standards while others are based on convention or practice. For example, Standard Geography classifications

  21. Statistics involve classifications • The definitions that shape statistics specify the metric of the data they summarize (for example, Canadian dollars) or the categories used to classify things if a statistic represents counts or frequencies. In this latter case, classification systems are used to identify categories of membership in a concept’s definition. • Examples of standard classifications include the North American Industrial Classification System (NAICS), the National Occupational Classification (NOC-S) and the International Classification of Diseases (ICD). • Look at these examples and describe the coding systems used.

  22. Statistics are presentation ready • Tables and charts (or graphs) are typically used to display many statistics at once. You will find statistics sprinkled in text as part of a narrative describing some phenomenon; but tables and charts are the primary methods of organizing and presenting statistics.

  23. A quick review • To this point, we have established that: • Statistics are ‘real’ only if they are derived from data; • Statistics are dependent of definitions of the concepts they summarize; • Statistics that represent counts of things in the data employ classification systems, which are based either on standards or convention; and • Statistics are typically organized for display using tables or charts.

  24. Pre-institute Homework • You received two news articles in your pre-Institute readings. One was from The Globe and Mail about charitable giving by Canadians. Seven questions were asked about this story: • What source is cited as showing Canadians as “a nation of Scrooges?” • What statistics were reported from the Generosity Index for the percentage of giving by Americans and Canadians? • Why does McKenna claim that these statistics are not comparable? • McKenna makes reference to a Statistics Canada survey as a “more comprehensive survey.” Is the name of this survey provided? • Do the statistics from this Statistics Canada source show the percentage of givers and the amounts going up, staying the same, or going down between 2007 and 2010? • What two other sources for statistics on charitable giving does McKenna cite for international comparison? • Is there enough information provided for each of these sources that you would be able to locate the data for each source? If yes, what information is provided?

  25. Pre-institute Homework • The Fraser Institute’s annual Generosity Index[Sources] • Americans: 1.33 percent and Canadians: 0.64 percent [Concept] • Focuses almost exclusively on tax-deductible giving. There is a lot of giving for which a tax receipt is not given in Canada. [Definitions] • No: Canada Survey of Giving, Volunteering and Participating, 2010; 94% come from Charitable giving by Canadians by Martin Turcotte in April 2012 article in Canadian Social Trends [Source] • b. Staying the same (Table 1 from the CST article by Turcotte) • a. 2013 World Giving Index, based on Gallup World Poll data : personal givingb. Committee Encouraging Corporate Philanthropy : 60 multinational corporations • a. Yes : title and sourceb. Yes : source

  26. Pre-institute Homework • The second reading was from The Vancouver Sun about insolvency and the debt level that becomes the breaking point. Seven questions were asked about this story: • What is the average non-mortgage consumer debt carried by British Columbians? • What is the source cited by Yaffe for statistics on insolvencies? • What percentage of British Columbians became insolvent in 2012 according to Industry Canada? • What was given as the cause of insolvency in Vancouver and the Fraser Valley by Sands & Associates? • What options are reported to have been used by over a quarter of those in the 31 to 54 year old age group to address their credit overextension? • What amount was reported as the overall breaking point of debt when a person tended to pursue the insolvency option? • In what type of business is Sands & Associates?

  27. Pre-institute Homework • $38,682 • Sands & Associates, BC’s largest bankruptcy trustee : 2013 BC CONSUMER DEBT STUDY REPORT ON FINDINGS • 3.2 percent : Annual Consumer Insolvency Rates by Province and Economic Region • Overextended on a home mortgage. • Take out payday loans or apply for even more credit • $25,000 and $50,000 owed • Bankruptcy trustee

  28. Being a critical user of statistics • Who published this statistic? • Can you name the producer or distributor of the data? • Does the publisher identify a data source for this statistic? • Do you have enough information to cite this statistic? • What view of the data is shown in this statistic? • What level of geography is shown? • What time period is shown? • What social characteristics are shown? • Why was this view shown?

  29. Being a critical user of statistics • What concepts are represented in this statistic? • Are definitions provided with the statistic for geography, time or the social characteristics? • Was a standard classification system used for the categories of the statistic? • Can you identify a data source for the statistic? • Is there enough information provided with the statistics to find its data source? • Is there a name for the data source? • Is there a distributor for the data source?

  30. Critique a statistical table • To practice critiquing statistical tables, we will use a table published by Statistics Canada about the average undergraduate tuition fees for full-time students by field of study. • Use “Check List for Critiquing a Statistical Table” to evaluate this table. • “Tips for Reading a Statistical Table” is offered to highlight the features that a table should provide and how this information can help in interpreting the table.

  31. Statistics numeric facts & figures derived from data, i.e, already processed needs definitions & classifications presentation-ready published Data numeric files created and organized for analysis or processing requires processing needs detailed documentation not display-ready disseminated, not published Data as a focus

  32. WHERE ARE THE DATA!

  33. Microdata

  34. Microdata record layout

  35. Microdata record dictionary

  36. What about data? • While we are not focusing our attention on data in this workshop, it is helpful to understand some basics about the origins of data, especially since statistics are derived from data. As we will see later, having a good understanding of data can greatly help in the search for statistics. • There are three generic methods by which data are produced. One will find statistics generated from the data arising out of all of these methods.

  37. Methods producing data

  38. Lifecycle production of data • The production of data across these three methods happens through a lifecycle process. Understanding the basics of the lifecycle process in which statistics are derived from data can help in the search for statistics.

  39. 1 2 9 3 8 Access to Information 4 7 5 6 Life cycle of survey statistics

  40. 1 2 9 3 8 Preserving Information 4 7 5 6 Life cycle of survey statistics

  41. 1 2 9 3 8 4 7 5 6 Life cycle applied to health statistics Health Information Roadmap Initiative

  42. 1 2 9 3 8 4 7 5 6 Life cycle applied to health statistics Health Information Roadmap Initiative

  43. Reconstructing statistics • One way to see the relationship between statistics and the data upon which they were derived is to reconstruct statistics that someone else has produced from data that are publicly accessible.

  44. Reconstructing statistics 1 2 9 3 Health Information Roadmap Initiative 8 4 7 5 6

  45. Reconstructing statistics • The statistics that we will reconstruct are reported in “Health Facts from the 1994 National Population Health Survey,” Canadian Social Trends, Spring 1996, pp. 24-27. • The steps we will follow are: • identify the characteristics of the respondents in the article; • identify the data source; • locate these characteristics in the data documentation; • find the original questions used to collect the data; • retrieve the data; and • run an analysis to reproduce the statistics.

More Related