Chapter 8 A Statistics Primer

Chapter 8A Statistics Primer Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e

Level of Measurement • Measures can be designed to have a higher, more complex level or a more basic, rudimentary level • Influenced by how the variable is conceptualized • Gender: can have only two categories (males and females) • Age: can be age (year of birth) or age groups (age categories, e.g., adolescent, young adult, etc.) • Influences choice of statistical analysis • Shown on Table 8.11 on page 222 • Related to measurement error (Chapter 13) © 2007 Pearson Education Canada

Three Levels of Measurement Nominal: involves no underlying continuum; assignment of numeric values arbitrary • Examples: religious affiliation, gender, etc Ordinal: implies an underlying continuum; values are ordered but intervals are not equal. • Examples: community size, Likert items, etc. Ratio: involves an underlying continuum; numeric values assigned reflect equal intervals; zero point aligned with true zero. • Examples: weight, age in years, % minority © 2007 Pearson Education Canada

Example of Ordinal Level Measure The population of the place I considered my hometown when growing up was: Rural area 1 town under 5,000 2 5,000 to 19,999 3 20,000 to 99,999 4 100,000 to 999,999 5 1,000,000 or over 6 © 2007 Pearson Education Canada

Examples of Ratio Level Measures In the following items, circle a number to indicate the extent to which you agree or disagree with each statement. I would quit my present job if I won $1,000,000 through a lottery. Strongly Disagree 1 2 3 4 5 6 7 8 9 Strongly Agree I would be satisfied if my child followed the same type of career as I have. Strongly Somewhat Neither Agree Somewhat Strongly Disagree Disagree nor Disagree Agree Agree 1 2 3 4 5 © 2007 Pearson Education Canada

Describing an Individual Variable • Statistics provide ways to describe and compare sets of observations (e.g., income levels, infant mortality, morbidity, crime, etc.) • Two common ways of describing a distribution (a set of scores in a data set) • Measures of central tendency • Measures of dispersion © 2007 Pearson Education Canada

Median • The midpoint • Used to describe central tendency of ordinal level data • Calculated by ordering a set of values and then using the middlemost value (in cases of two middle values, calculate the mean of the two values). • Often used when a data set has extreme cases © 2007 Pearson Education Canada

Mode • The most frequently occurring value • Used to describe central tendency of nominal level data (gender, religion, nationality) TABLE 8.8 DISTRIBUTION OF RESPONDENTS BY COUNTRY COUNTRY NUMBER PERCENT Canada 65 34.9 ←mode New Zealand 58 31.2 Australia 63 33.9 TOTAL 186 100.0 © 2007 Pearson Education Canada

Measures of Dispersion • Indicates dispersion or variability of values • Are scores close together or spread out? • Three common measures of dispersion: • Range • Standard deviation • Variance © 2007 Pearson Education Canada

The standard deviation measures the average amount of deviation from the mean value of the variable The variance is the standard deviation squared Standard Deviation and Variance © 2007 Pearson Education Canada

Table 8.10 Computation of Standard Deviation, Beth’s Grades Note: The “N – 1” term is used when sampling procedures have been used. When population values are used the denominator is “N.” SPSS uses N – 1” in calculating the standard deviation in the DESCRIPTIVES procedure. © 2007 Pearson Education Canada

Standardizing Data • Standardizing data facilitates making comparisons between units of different size • Also, can standardize data to create variables that have similar variability (Z scores) • Standardization of data is commonly done • Several methods of standardizing data: proportions, percentages, percentage change, rates, ratios © 2007 Pearson Education Canada

Proportions • A proportion represents the part of 1 that some element represents. Proportion female = Number female Total persons Proportion female = 31,216 58,520 Proportion female = .53 The females represent .53 of the population © 2007 Pearson Education Canada

Percentage • A percentage represents how often something happens per 100 times • A proportion may be converted to a percentage by multiplying by 100 • Females constitute 53% of the population © 2007 Pearson Education Canada

Percentage Change • Percentage change is a measure of how much something has changed over a given time period. • Percentage change is: Time 2 – Time 1 x 100 Time 1 • Example: percentage change in number of women in selected occupations (Table 8.13, p. 223) © 2007 Pearson Education Canada

Rates • Rates represent the frequency of an event for a standard-sized unit. Divorce rates, suicide rates, crime rates are examples. • So if we had 104 suicides in a population of 757,465 the suicide rate per 100,000 would be calculated as follows: SR = 104 x 100,000 = 13.73 757,465 There are 13.73 suicides per 100,000 © 2007 Pearson Education Canada

Ratios • A ratio represents a comparison of one thing to another. • So if there are 200 burglaries per 100,000 in the U.S. and 57 per 100,000 in Canada, the U.S./Canadian burglary ratio is: US Burglary Rate = 200 = 3.51 Canadian Burglary Rate 57 © 2007 Pearson Education Canada

Normal Distribution • Much data in the social and physical world are “normally distributed”; this means that there will be a few low values, many more clustered toward the middle, and a few high values. • Normal distributions: • symmetrical, bell-shaped curve • mean, mode, and median will be similar • 68.28% of cases ± 1 standard deviation of mean • 95.46% of cases ± 2 standard deviations of mean © 2007 Pearson Education Canada

Areas Under the Normal Curve • Can determine what proportion of cases fall between two values or above/below a value Steps: • Draw normal curve, marking mean and SD, and including lines to represent problem • Calculate Z score(s) for the problem • Look up value on Table 8.17, page 230 • Solve problem. Recall that .5 of cases fall above the mean, and .5 below the mean • Convert proportion to percentage, if needed © 2007 Pearson Education Canada

Other Distributions • Not all variables are normally distributed • Bimodal: two overlapping normally distributed plots • weight (females will have lower average rates) • Leptokurtic: little variability • distribution appears tall and peaked • Platykurtic: great deal of variability • distribution appears flat and wide • Having a normal distribution is important for doing tests of statistical significance (Table 8.18) © 2007 Pearson Education Canada

Describing Relationships Among Variables Involves three important steps: • Decide which variable is to be treated as dependent variable and independent variable • Decide on the appropriate procedure for examining the relationship • Perform the analysis © 2007 Pearson Education Canada

Methods Selection of statistical method depends upon the level of measurement of the dependent and independent variables • Contingency tables: Crosstabs • Comparing means: means analysis • Correlational analysis: correlation © 2007 Pearson Education Canada

Contingency Tables: Crosstabs • A contingency table cross-classifies cases on two or more variables to show the relation between an independent and dependent variable • Uses a nominal dependent variable and an ordinal or nominal independent variable A standard table looks like the one on the following slide. © 2007 Pearson Education Canada

Table 8.19 Plans to Attend University by Size of Home Community If a test of significance is appropriate for the table, the value for the raw Chi-Square value (which will be introduced in Chapter 9), the degrees of freedom, whether the test is one- or two-tailed, and the probability level should be indicated. © 2007 Pearson Education Canada

Rules for Constructing a Contingency Table • In table titles, name the dependent variable first • Place dependent variable on vertical plane • Place independent variable on horizontal plane • Use variable labels that are clear • Run percentages toward the independent variable • Report percentages to one decimal point • Report statistical test results below table • Interpret the table by comparing categories of the independent variable • Minimize categories in control tables © 2007 Pearson Education Canada

Comparing Means: Means • Used when dependent variable is ratio • Comparison to categories of independent variable (nominal or ordinal) • Both t-test and ANOVA may be used (Chapter 9) Presentation may be as shown on the following slide. © 2007 Pearson Education Canada

Table 8.22 Mean Income by Gender If a test of significance is appropriate for the table, the values for the t-test or F-test value, the degrees of freedom, whether the test is one- or two-tailed, and the probability level should be indicated. © 2007 Pearson Education Canada

Correlational Analysis: Correlation • Correlational analysis is a procedure for measuring how closely two ratio level variables co-vary together • Basis for more advanced procedures: partial correlations, multiple correlations, regression, factor analysis, path analysis and canonical analysis • Advantage: can analyze many variables (multivariate analysis) simultaneously • Relies on having ratio level measures © 2007 Pearson Education Canada

Two Basic Concerns • What is the equation that describes the relation between two variables? • What is the strength of the relation between the two? Two visual estimations procedures • The linear equation: Y = a + bX • Correlation coefficient: r © 2007 Pearson Education Canada

The Linear Equation • The linear equation, Y = a + bX, describes the relation between the two variables • Components: Y - dependent variable (e.g., starting salary) X - independent variable (e.g., years of post-secondary education) a - the constant, which indicates where the regression line intersects the Y-axis b - the slope of the regression line © 2007 Pearson Education Canada

Step 2: Insert a straight regression line From the regression line one can estimate how much one has to change the independent variable in order to produce a unit change in the dependent variable A. The Linear Equation (cont’d) © 2007 Pearson Education Canada

Step 3: Observe where the regression line crosses the Y axis; this represents the constant in the equation (a = 1.33 on Figure 8.6) Step 4: Draw a line parallel to the X axis and one parallel to the Y axis to form a right-angled triangle Measure the lines; divide the horizontal distance into the vertical distance to compute the b value (72/91 = 0.79) A. The Linear Equation (cont’d) © 2007 Pearson Education Canada

Step 5: If the slope of the regression is such that it is lower on the right-hand side, the b coefficient is negative, meaning the more X, the less Y. If the slope is negative, use a minus sign in your equation Y= a–bX Step 6: Write the equation: Y = 1.33 + 0.79(X) The above formula is our visually estimated equation between the two variables Equation used to predict the value of a Y variable given a value of the X variable Done in regression analysis A. The Linear Equation (cont’d) © 2007 Pearson Education Canada

B. Correlation Coefficient: A Visual Estimation Procedure • Goal: to develop a sense of what correlations of different magnitudes look like • Correlation coefficient (r) is a measure of the strength of the association between two variables • Vary from +1 to –1 • Perfect correlations are rare • Usually presented by a decimal point, as in .98, .56, –.32 • Negative correlations ~ negative slope © 2007 Pearson Education Canada

B. Correlation Coefficient (cont’d) Figure 8.9 graphs 8 relationships • Graphing allows you to visually estimate the strength of the association • The closer the plotted points are to the regression line (e.g., Plots 1 and 2), the higher the correlation (.99 and .85) • Greater spread (e.g., Plots 3 and 4) ~ lower correlation (.53 and .36) • Would be difficult to draw regression line if r < .36 © 2007 Pearson Education Canada

B. Correlation Coefficient (cont’d) • Plots 5 and 6: curvilinear: not linear, hence r = 0 • Procedure not appropriate for curvilinear relations • Plots 7 and 8: problem plots: deviant cases • This is one of the reasons it is important to plot relationships; extreme values indicate a non-linear relationship, therefore linear regression procedure are not appropriate for studying these relationships © 2007 Pearson Education Canada

Calculating the Correlation Coefficient • The estimation of the correlation coefficient takes two kinds of variability into account: • Variations around the regression line • Variations around the mean of Y r2 = 1 – variations around regression variations around mean of Y • Can calculate (see p. 245); computer programs used by most researchers today © 2007 Pearson Education Canada

Other Correlation Procedures • Spearman Correlation • Appropriate measure of association for ordinal level variables • Partial Correlation • Measures the strength of association between two variables while simultaneously controlling for the effects of one or more additional variables • Also varies from +1 to –1 • Commonly used by social researchers © 2007 Pearson Education Canada

Chapter 8 A Statistics Primer

Chapter 8 A Statistics Primer

Presentation Transcript

A Pepper Primer

A Primer

Chapter 5 A Primer of Molecular Biology

Descriptive Statistics Primer

Chapter 8, part A

Chapter 8 A Statistics Primer

CHAPTER 6 Bond Primer

Chapter Three - A Networking Primer

A Primer

Subjectivity – A Primer

A primer on statistics

Statistics Class 8

A Primer

Chapter 9 : A Primer on Medical Malpractice

Statistics Primer

AP Statistics Chapter 8 Practice Problems

§8. Bose Statistics and Fermi Statistics

Digital Media Primer Chapter 8

AP Statistics Chapter 8

Inferential Statistics (Part IV) Chapter 8 Significance Using Inferential Statistics

Chapter 8 – part A

CHAPTER 8 — Unit A