490 likes | 516 Vues
Data Analysis for Description. Research Methods for Public Administrators Dr. Gail Johnson. Simple But Concrete. The Children’s Defense Fund reports on each day in America: Four children are killed by abuse or neglect Five children or teens commit suicide
E N D
Data Analysis for Description Research Methods for Public Administrators Dr. Gail Johnson Dr. G. Johnson, www.ResearchDemystified.org
Simple But Concrete • The Children’s Defense Fund reports on each day in America: • Four children are killed by abuse or neglect • Five children or teens commit suicide • Eight children or teens are killed by firearms • Seventy-five babies die before their 1st birthday • http://www.childrensdefense.org/child-research-data-publications/each-day-in-america.html Dr. G. Johnson, www.ResearchDemystified.org
Simple But Concrete • A million seconds = 11 ½ days • A billion seconds= 32 years • A trillion seconds= 32,000 years Dr. G. Johnson, www.ResearchDemystified.org
Simple But Concrete • A $700 billion bailout translates into $2,333 IOU from every person in the U.S. • Or—using a different metric-it comes to $45 per week for each person in the U.S. • Going one step further, it comes out to $6 a day • Framing: are you willing to pay $6 a day to have a functioning financial system?Read more: http://www.time.com/time/business/article/0,8599,1870699,00.html#ixzz0aqek0mRZ Dr. G. Johnson, www.ResearchDemystified.org
Going Too Far? • Six dollars a day is also 25 cents an hour, or less than half a penny a minute. • Framing: Would you be willing to pay less than half a penny a minute? • Key Point: Does the comparison point make a difference in what you would be willing to pay? • Read more: http://www.time.com/time/business/article/0,8599,1870699,00.html#ixzz0aqf9HSQ9 Dr. G. Johnson, www.ResearchDemystified.org
Common Descriptive Analysis • Counts: how many • Decennial census • Percents • Women earned 77% of what men earned in 2006, up from 59% in 1970 • Parts of a whole • Percents (75%) and proportions (.75 or three-quarters) Dr. G. Johnson, www.ResearchDemystified.org
Common Descriptive Analysis • But be mindful of “bigger pie” distortions when working with percents and proportions • If the pie grows much faster than the slice, the slice will appear relatively smaller as a percent even though it still grew • Best example is budget deficit as a percent of the GDP: if GDP grows much faster than the budget deficit, it will appear smaller even though it has also grown. Dr. G. Johnson, www.ResearchDemystified.org
Common Descriptive Analysis • Rates: number of occurrences that are standardized • Deaths of infants per 100,000 births • Crop yields per acre • Crime rates • Rates provide an apples-to-apples comparison between places of different size or populations Dr. G. Johnson, www.ResearchDemystified.org
Common Descriptive Analysis • Ratio: numbers presented in relationship to each other • Student to teacher ratio: 15:1 • Divide number of students by the number of teachers • 1,500 students and 45 teachers equals a 33 to 1 student to teacher ratio (1,500 divided by 45) Dr. G. Johnson, www.ResearchDemystified.org
Common Descriptive Analysis • Rates of change • Percentage change from one time period to the other • For example: The budget increased 23% from FY 2006 to FY 2007. Three Steps: • Divided newest data by oldest data • Subtract 1 • Multiple by 100 to get the percentage change Dr. G. Johnson, www.ResearchDemystified.org
Common Descriptive Analysis • Rates of change • Percentage change from one time period to the other • For example: The budget increased 23% from FY 2006 to FY 2007. Three Steps: • Divided newest data by oldest data • Subtract 1 • Multiple by 100 to get the percentage change Dr. G. Johnson, www.ResearchDemystified.org
Common Descriptive Analysis • Rates of change: applied • What was the rate of change in 1992 budget deficit as compared to 1980. • Divide 1992 budget deficit ($290 billion) by the 1980 budget deficit ($73.8 billion) = 3.93 • 3.93-1 – 2.93 • 2.93 x 100 = 293 percent • The budget deficit in current dollars (meaning not controlled for by inflation) increased 293 percent. Dr. G. Johnson, www.ResearchDemystified.org
Common Descriptive Analysis • Frequency Distributions • Number and percents of a single variable Dr. G. Johnson, www.ResearchDemystified.org
In The News: Women Now Are Majority of College Graduates Dr. G. Johnson, www.ResearchDemystified.org
Interpretation? • How would you interpret these percentages in the comparative trend analysis? • Are you surprised by the changes over time? • Why or why not? Dr. G. Johnson, www.ResearchDemystified.org
Frequency and Percent Distributions • Survey data: analyzed by distributions • How many men and women are in the program? Distribution of Respondents by Gender: Male Female Total Number Percent Number Percent Number 100 33% 200 67% 300 Dr. G. Johnson, www.ResearchDemystified.org
Frequency and Percent Distributions • How many men and women are in the program? Write-up: Of the 300 people in this program, 67% are women and 33% are men. Dr. G. Johnson, www.ResearchDemystified.org
Different Analysis Tools For Different Situations • Frequency/percent distributions make sense when working with nominal and ordinal data • But frequency/percent distributions for interval/ratio data can result in a ridiculously long table that is impossible to interpret • If I ask 500 people how many years they lived in an area, I can can get a wide range of answers. • For this type of data, I would then look at means, medians, modes to describe that variable. Dr. G. Johnson, www.ResearchDemystified.org
Describing Distributions • Central tendency • Means, Medians, Modes • How similar are the characteristics? • Example: Use when we want to describe the similarity of the ages of a group of people. • Dispersion • Range, standard deviation • How dissimilar are the characteristics? • Example: how much variation in the ages? Dr. G. Johnson, www.ResearchDemystified.org
Measures of Central Tendency • The 3-Ms: Mode, Median, Mode. • Mode: most frequent response. • Median: mid-point of the distribution • Mean: arithmetic average. Dr. G. Johnson, www.ResearchDemystified.org
Basic Concepts Revisited • Levels of Measurement • Nominal Level Data: names, categories • Eg. Gender, religion, state, country • Ordinal Level Data: data with an order, going from low to high • Eg. Highest educational degree, income categories, agree—disagree scales • Interval Level Data: numbers but no zero • Eg. IQ scores, GRE scores • Ratio Level Data: real numbers with a zero point • Eg. Age, weight, income, temperature Dr. G. Johnson, www.ResearchDemystified.org
Which Measure of Central Tendency to Use? Depends on the type of data you have: • Nominal data: mode • Ordinal data: mode and median • Interval/ratio: mode, median and mean Dr. G. Johnson, www.ResearchDemystified.org
For Interval Or Ratio Data: Which One To Use? • Concept of the Normal Distribution—also called the bell-shape curve • In a normal distribution, the mean, median and mode should be very similar • Use mean if distribution is normal • Use median if distribution is not normal Dr. G. Johnson, www.ResearchDemystified.org
Normal Distribution: Bell-Shaped Curve Mean http://en.wikipedia.org/wiki/Normal_distribution Dr. G. Johnson, www.ResearchDemystified.org
Office contributions • $10, $ 1, $.50, $.25, $.25. • The mean is $2.40 (add up and divide by 5) • The median is .50 (the mid-point of this distribution) • The mode is .25 (the most frequently reported contribution) • Best description of contributions is median. Dr. G. Johnson, www.ResearchDemystified.org
Salaries • Assume that you had 11 teachers. 10 teachers earned $21,000 per year and one earned $1,000,000. • What would be the best measure to describe this data? Dr. G. Johnson, www.ResearchDemystified.org
Salaries • The average salary would be $110,000. • The median and mode is $21,000. • The curve would be positively skewed, i.e. Mean higher than Mode and Median • The median would do the best job at describing the center the salaries Dr. G. Johnson, www.ResearchDemystified.org
Skewed Data • negative skew: The mass of the distribution is concentrated on the right of the figure. It has relatively few low values. The distribution is said to be left-skewed. • positive skew: The mass of the distribution is concentrated on the left of the figure. It has relatively few high values. The distribution is said to be right-skewed. The $ million salary pulls the average up. Wikipedia: http://en.wikipedia.org/wiki/Skewness Dr. G. Johnson, www.ResearchDemystified.org
Skewed Distributions:Negative and Positive http://en.wikipedia.org/wiki/File:Skewness_Statistics.svg Dr. G. Johnson, www.ResearchDemystified.org
Using Means With Survey Data? • Survey data is typically coded using numbers: • Gender: Male is coded 1 • Female is coded 2 • It is faster and less error-prone to code variables using numbers • But the computer could treat these as numbers and will compute a mean if asked • How would you interpret a mean for gender of 1.6? Or a mean for religion of 2.8 Dr. G. Johnson, www.ResearchDemystified.org
Do Not Use Means With Nominal Data • Gender (and religion) are nominal variables and should only be reported in terms of distributions: • Frequency distribution: 10 men and 12 women • Percentage distribution: 45% men and 55% women Dr. G. Johnson, www.ResearchDemystified.org
Using Means With Survey Data? • Scales (very satisfied<->very dissatisfied are ordinal scales • But they coded into the computer using numbers • 5 for very satisfied<->1 for very dissatisfied • The computer will compute a mean if asked: • The mean was 3.8 for job satisfaction. • The mean satisfaction with faculty performance was 4.2 on a scale from 1-5 • Grade-point averages are an example of means based on an ordinal scale (A—F (scale of 0-4) Dr. G. Johnson, www.ResearchDemystified.org
Using Means With Ordinal Data? • There is disagreement in the field—partly based on academic discipline-about whether to use means with ordinal data. • Things like GPA or faculty ratings are often shown as means • It is often helpful for researchers to look at the means initially when working with a lot of data—researchers are looking for unusually high or low means. • It is also true that sometimes it is easier to show the means than the percentage distribution for every variable Dr. G. Johnson, www.ResearchDemystified.org
Using Means With Ordinal Data? • But most people are more familiar with polling results, which report percent distributions. • We tend to see something like 55% report supporting cap and trade legislation rather than a mean of 3.4 on a scale of 5 (for) to 1 (against). • The decision about whether means or percent distributions are used to report ordinal data should reflect audience preference and ease of audience understanding. • Not an ideological stance Dr. G. Johnson, www.ResearchDemystified.org
Measures of Dispersion • Used with Interval and Ratio Data • Simple Description: The Range • Reported salaries ranged from $21,000 to $1,000,000 • Ages in the group ranged from 18 to 32 • Standard Deviation • Measures the dispersion in terms of the the distance from the mean • Small standard deviation: not much dispersion • Large standard deviation: lots of dispersion Dr. G. Johnson, www.ResearchDemystified.org
Standard Deviation • Normal Distribution: Bell-shaped curve • 68% of the variation is within 1 standard deviation of the mean • 95% of the variation is within 2 standard deviations of the mean Dr. G. Johnson, www.ResearchDemystified.org
Normal Distribution 95% of the distribution Standard deviations Standard deviations Mean
Applying the Standard Deviation • Average test score= 60. • The standard deviation is 10. • Therefore, 95% of the scores are between 40 and 80. • Calculation: • 60+20=80 60-20=40. Dr. G. Johnson, www.ResearchDemystified.org
Standard Deviation with Means • The Standard Deviation is used with interval/ratio level data • Typically, standard deviations are presented with means so the reader can tell whether there is a lot or a little variation in the distribution. • Note: the standard deviation is sometimes used in other statistical calculations, such as z-scores and confidence intervals Dr. G. Johnson, www.ResearchDemystified.org
Describing Two Variables Simultaneously • Cross-tabulations (cross tabs, contingency tables) • Used when working with nominal and ordinal data • It provides great detail Dr. G. Johnson, www.ResearchDemystified.org
Describing Two Variables Simultaneously Detail about the race and gender of the 233 people in the workplace: Dr. G. Johnson, www.ResearchDemystified.org
Describing Race and Gender • Write-up: Of the 233 employees, the greatest proportion are white women (31%) followed by white men (21%). Fifteen percent of the employees are black men and 11% are black women, and 14% are men of other race identity and 6% are women of other race identity. Dr. G. Johnson, www.ResearchDemystified.org
Describing Two Variables Simultaneously Comparison of Means • Used when one variable is nominal or ordinal, and the second variable is interval/ration level of measurement. • Examples: • Men in the MPA program have a GPA of 3.2 as compared to 3.0 for women. • The mean overall citizen satisfaction score is 4.2 this year as compared to 3.5 last year. • Mean salary for women was $35,000 as compared to $38,000 for men last year. Dr. G. Johnson, www.ResearchDemystified.org
Key Points • These simple descriptive analysis techniques can be effective: • Illuminates, provides feedback, informs and might persuade. • The math is generally straight-forward. • Descriptive data is generally easy for many people understand as compared to more complex statistics (stay tuned). • Complex statistics are not inherently better! Dr. G. Johnson, www.ResearchDemystified.org
The Tough Question • If descriptive data is distorted, it is tends to be in the way things are being counted and measured. • The math is usually correct. • Example: The federal debt is often presented just in terms of percent of debt held by the public but the total debt includes money borrowed from other government funds. • As a result, the debt looks smaller than what it actually is. Dr. G. Johnson, www.ResearchDemystified.org
The Tough Question • If descriptive data is distorted, it is tends to be in the way things are being counted and measured. The math is usually correct • Example. Health insurance profits look different when calculated as a percent of corporate revenue than when calculated as a percent of all spending on health care. • It will look smaller when presented as a percent of all health care spending which is larger than just corporate insurance revenue. Dr. G. Johnson, www.ResearchDemystified.org
The Tough Question • Always ask: what exactly is being measured and counted? • Consider whether there are other ways of counting and other ways of doing the analysis that might yield different results (or create different perceptions). • Do the choices reflect a political agenda? Dr. G. Johnson, www.ResearchDemystified.org
Creative Commons • This powerpoint is meant to be used and shared with attribution • Please provide feedback • If you make changes, please share freely and send me a copy of changes: • Johnsong62@gmail.com • Visit www.creativecommons.org for more information