Exploring Statistics: Real Data Insights & Analysis
260 likes | 350 Vues
Delve into statistical procedures for learning from real data sets, powering critical thinking and knowledge gathering. Understand variability, population sampling, frequency distributions, percentiles, and graphing for data interpretation.
Exploring Statistics: Real Data Insights & Analysis
E N D
Presentation Transcript
Chapter 1 Why Statistics?
Learning can result from: • Critical thinking • Asking an authority • Religious experience However, collecting DATA is the surest way to learn about the world
Data in the Sciences are messy • At first glance, data often look like an incoherent jumble of numbers • How do we make sense of data? Statistical procedures are tools for learning about the world by Learning from Data.
Real Data! • To help you understand the power and usefulness of statistics, we will explore two real and interesting data sets • “The Smoking Study” • “The Maternity Study”
The Smoking Study • From the University of Wisconsin Center for Tobacco Research and Intervention • 608 participants provided data on smoking, addiction, withdrawal, and how best to quit smoking • The full data set is provided on the CD, a description of the data collected in provided in the appendices of the book
The Maternity Study • From Wisconsin Maternity Leave and Health Project • 244 families provided data on marital satisfaction, child-rearing styles, and other household events • The full data set is provided on the CD, a description of the data collected in provided in the appendices of the book
Variability • Why are data messy? • Consider a concrete example: Depression scores (“CESD”) for participants in the Smoking Study • Some participants (each has a different ID number) have CESD scores of 0, while others have scores of 2, 11 or 7, or some other value • These data are messy in that the scores are different from one another • Variability is the statistical term for the degree to which scores (such as the depression scores) differ from one another.
Sources of Variability • It is easy to see that depression scores are variable, by why? • Individual differences • Some people are more depressed than others • Some people have difficulty reading the and understanding the questions on the test • Some people answer the questions more honestly than others • Procedure • Differences in the ways the data were collected • Conditions or Treatments • The conditions that are imposed on the participants of the study
Populations and Samples • Statistical Population – a collection or set of measurements of a variable that share some common characteristic • Sample – a subset of measurements from a population • Random sample – a sample selected such that every score in the population has an equal chance of being included
Chapter 2 Frequency Distributions and Percentiles
Variability (revisited) • Collecting Data means measuring a variable • Those measurements differ (vary) from one another • One way to organize and summarize a set of measurements is to construct a frequency distribution • These methods can be applied to both populations and samples
Example YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study
Example YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study
A Better Summary? YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study
Percentiles • We have been focusing on distributions rather than individual scores • Sometimes, individual scores are of great importance • Computing Percentiles, when n=608 • The 50-th percentile is the “middle” score. It is the 304-th sorted score. • The 32-th percentile is the 608*0.32=194.56, i.e., the 195-th sorted score.
Percentile Rank • The percentile rank of a score is the percent (the proportion times 100) of the measurements in the distribution below that score value • Computing percentile rank for YRSMK: • Sort the variable, called YRSMK_sorted • The percentile rank of 9 is 50/608 = 0.082, so it is the 8-th percentile • The percentile rank of 21 is 246/608 = 0.4046053, so it is the 40-th percentile
Graphing Distributions • Graphing distributions is a very valuable tool for highlighting features of the data • Shape • Range • Central Tendency • Variability
Shape • We classify the shape of distributions in three ways: • Symmetry – is one half a mirror image of the other half? • Skew – are there high/low frequencies of low/high scores? • Modality – how many humps or modes?
Symmetry • Is one half of the distribution a mirror image of the other (along a vertical axis)? • Three examples of symmetrical distributions:
Skew • Negative – low frequencies of low values and high frequencies of high values • Positive – high frequencies of low values and low frequencies of high values
Modality • How many humps (or modes)? Unimodal Bimodal
Characterizing Shape Asymmetric Negatively Skewed Bimodal Asymmetric Positively Skewed Unimodal
Central Tendency and Variability • In addition to shape, distributions differ in terms of: • Central Tendency - scores near the center of the distributions; where the scores “tend” to be • Variability – the degree to which scores differ from one another; the “spread” of the scores
Comparing Distributions • It is very useful to be able to compare and contrast (name similarities and differences) of distributions • Distributions can differ in terms of shapes, central tendencies, and variability
Comparing Distributions How do these distributions differ?