Exploring Statistics: Real Data Insights & Analysis

Chapter 1 Why Statistics?

Learning can result from: • Critical thinking • Asking an authority • Religious experience However, collecting DATA is the surest way to learn about the world

Data in the Sciences are messy • At first glance, data often look like an incoherent jumble of numbers • How do we make sense of data? Statistical procedures are tools for learning about the world by Learning from Data.

Real Data! • To help you understand the power and usefulness of statistics, we will explore two real and interesting data sets • “The Smoking Study” • “The Maternity Study”

The Smoking Study • From the University of Wisconsin Center for Tobacco Research and Intervention • 608 participants provided data on smoking, addiction, withdrawal, and how best to quit smoking • The full data set is provided on the CD, a description of the data collected in provided in the appendices of the book

The Maternity Study • From Wisconsin Maternity Leave and Health Project • 244 families provided data on marital satisfaction, child-rearing styles, and other household events • The full data set is provided on the CD, a description of the data collected in provided in the appendices of the book

Variability • Why are data messy? • Consider a concrete example: Depression scores (“CESD”) for participants in the Smoking Study • Some participants (each has a different ID number) have CESD scores of 0, while others have scores of 2, 11 or 7, or some other value • These data are messy in that the scores are different from one another • Variability is the statistical term for the degree to which scores (such as the depression scores) differ from one another.

Sources of Variability • It is easy to see that depression scores are variable, by why? • Individual differences • Some people are more depressed than others • Some people have difficulty reading the and understanding the questions on the test • Some people answer the questions more honestly than others • Procedure • Differences in the ways the data were collected • Conditions or Treatments • The conditions that are imposed on the participants of the study

Populations and Samples • Statistical Population – a collection or set of measurements of a variable that share some common characteristic • Sample – a subset of measurements from a population • Random sample – a sample selected such that every score in the population has an equal chance of being included

Chapter 2 Frequency Distributions and Percentiles

Variability (revisited) • Collecting Data means measuring a variable • Those measurements differ (vary) from one another • One way to organize and summarize a set of measurements is to construct a frequency distribution • These methods can be applied to both populations and samples

Example YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study

A Better Summary? YRSMK – Number of Years Smoking Daily From the First 60 Participants in the Smoking Study

Graphing Distributions

Percentiles • We have been focusing on distributions rather than individual scores • Sometimes, individual scores are of great importance • Computing Percentiles, when n=608 • The 50-th percentile is the “middle” score. It is the 304-th sorted score. • The 32-th percentile is the 608*0.32=194.56, i.e., the 195-th sorted score.

Percentile Rank • The percentile rank of a score is the percent (the proportion times 100) of the measurements in the distribution below that score value • Computing percentile rank for YRSMK: • Sort the variable, called YRSMK_sorted • The percentile rank of 9 is 50/608 = 0.082, so it is the 8-th percentile • The percentile rank of 21 is 246/608 = 0.4046053, so it is the 40-th percentile

Graphing Distributions • Graphing distributions is a very valuable tool for highlighting features of the data • Shape • Range • Central Tendency • Variability

Shape • We classify the shape of distributions in three ways: • Symmetry – is one half a mirror image of the other half? • Skew – are there high/low frequencies of low/high scores? • Modality – how many humps or modes?

Symmetry • Is one half of the distribution a mirror image of the other (along a vertical axis)? • Three examples of symmetrical distributions:

Skew • Negative – low frequencies of low values and high frequencies of high values • Positive – high frequencies of low values and low frequencies of high values

Modality • How many humps (or modes)? Unimodal Bimodal

Characterizing Shape Asymmetric Negatively Skewed Bimodal Asymmetric Positively Skewed Unimodal

Central Tendency and Variability • In addition to shape, distributions differ in terms of: • Central Tendency - scores near the center of the distributions; where the scores “tend” to be • Variability – the degree to which scores differ from one another; the “spread” of the scores

Comparing Distributions • It is very useful to be able to compare and contrast (name similarities and differences) of distributions • Distributions can differ in terms of shapes, central tendencies, and variability

Comparing Distributions How do these distributions differ?

Exploring Statistics: Real Data Insights & Analysis

Exploring Statistics: Real Data Insights & Analysis

Presentation Transcript

Chapter 1

CHAPTER 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1

CHAPTER 1 1

Chapter 1

Chapter 1

Chapter 1

Chapter 1.

Chapter 1 - 1

Chapter 1 1