Bivariate Statistics (continued) & Multivariate Statistics)

Bivariate Statistics (continued) & Multivariate Statistics) Chapter 11

Today’s Topics • Bivariate Statistics (Measures of Association) • Trivariate Statistics (control variables, partials) • Constructing tables & interpreting descriptive statistics • Different Styles of Presentation • Preparation for Lab activities and work on Final Assignment

Recall (Lecture 2) *Types of variables* • independent variable (cause) • dependent variable (effect) • intervening variable • (occurs between the independent and the dependent variable temporally) • control variable • (temporal occurance varies, illustrations later today)

Causal Relationships • proposed for testing (NOT like assumptions) • 5 characteristics of causal hypothesis(p.128) • at least 2 variables • cause-effect relationship (cause must come before effect) • can be expressed as prediction • logically linked to research question+ a theory • falsifiable

- + X X Y Y Types of Correlations & Causal Relationships between Two Variables X=independent variable Y=dependent variable • Positive Correlation (Direct relationship) • when X increases Y increases or vice versa • Negative Correlation (Indirect or inverse relationship) • when X increases Y decreases or vice versa • Independence • no relationship (null hypothesis) • Co-variation • vary together ( a type of association but not necessarily causal)

Interpreting a Relationship between two variables • Do the patterns in the tables mean that there is a relationship between the two variables? • If there is a relationship, how strong is it? Are the results statistically significant? Are the results meaningful ? • Methods for studying relationships • Create Cross-tabulations & percentaged tables • Create Graphs, charts (like scattergrams or plots) • Use a set of statistics called measures of association

Using Graphs to show relations (royal wedding and electricity use)

Five Common Measures of Association between Two Variables

General Idea of Statistical Significance • In general English ‘significance’ means important or meaningful but this is NOT how the term is used in statistics • Tests of statistical significance show you how likely a result is due to chance.

Interpreting Statistical Significance Levels • “The most common level, used to mean something is good enough to be believed, is .95. This means that the finding has a 95% chance of being true. However, this value is also used in a misleading way. • No statistical package will show you "95%" or ".95" to indicate this level. • Instead it will show you ".05," meaning that the finding has a five percent (.05) chance of not being true, which is the converse of a 95% chance of being true. • To find the significance level, subtract the number shown from one. For example, a value of ".01" means that there is a 99% (1-.01=.99) chance of it being true.”

Example Source: http://www.surveysystem.com/signif.htm

Discussion of Example • “Refer to the preceeding table. The chi (pronounced kie like pie) squares at the bottom of the table show two rows of numbers. • The top row numbers of 0.07 and 24.4 are the chi square statistics themselves. The meaning of these statistics may be ignored for the purposes of this article. The second row contains values .795 and .001. These are the significance levels.”

Discussion • “In this table, there is probably no difference in purchases of gasoline X by people in the city center and the suburbs, because the probability is .795 (i.e., there is only a 20.5% chance that the difference is true). • In contrast the high significance level for type of vehicle (.001 or 99.9%) indicates there is almost certainly a true difference in purchases of Brand X by owners of different vehicles in the population from which the sample was drawn. “

Correlation Coefficient • statistics which can help to describe data sets which contain variables measured at the interval and ratio levels. • measures of association between two (or more) variables. • tests whether a relationship exists between two variables. • indicates both the strength of the association and its direction (direct or inverse). • Pearson product-moment correlation coefficient, written as r, can describe a linear relationship between two variables.

Correlation Coefficient • Is there a relationship between: • the budget of the police department and the crime rate? • the hours of batting practice and a player's batting average? • The value of r can range from 0.0, indicating no relationship between the two variables, to positive or negative 1.0, indicating a strong linear relationship between the two variables.

Reading and Constructing TablesSome Important Factors in Interpretation of Tables • percentages vs. “raw” frequencies, • need to know absolute number of cases (N=) • Importance of direction of calculation of percentages • (for bivariate and multivariate statistics) • collapsing categories (recall treatment of missing cases) • Standardization (using rates or ratios) to summarize data • Introducing a third variable (“intervening” or “control variable”) to see what effect this has on apparent relationship between two variables

Using Measures of Central Tendancy & Standardization to Study Earnings by Sex Babbie, E. (1995). The practice of social research Belmont, CA: Wadsworth

In his book on reading statistical tables, Say it with Figures, Hans Zeisel presents the following data: Automobile Accidents by Sex ------------------------------------------ Per Cent Accident Free Women 68% (6,950) Men 56% (7,080) ------------------------------------------ Automobile Accidents by Sex and Distance Driven ---------------------------------------------------------------------------- Distance Under 10,000 km 10,000 km & Over Per Cent Per Cent Accident Free Accident Free Women 75% 55% (5,035) (1,915) Men 75% 51% (2,070) (5,010) ---------------------------------------------------------------------------- Women in this study have fewer accidents than men because women tend to drive shorter distances than men, and people who drive shorter distances tend to have fewer accidents

Discussion of other ways to present same data & talk about observations • styles of presentation of cross-tabulations • Conventions & ease of interpretation of different ways of calculating percentages and presenting bivariate relations • Examination of frequency distributions of univariate statistics • Introduction of third variable • Examples of ways of describing the relationships in words

Table 1a: Experience of Accidents in the Past Year by Gender of Driver (with column frequencies & percentages for independent variable) In this study 68% of the women had been accident free as compared to 56% of the men. Are women better drivers than men?

Table 1b: Automobile Accidents by Gender of Driver (with row frequencies & percentages for independent variable)

Commentary • Based on findings from this survey female drivers are more likely than men to have been accident-free in the past year. The majority of women in the sample (68%) had not had accidents in the past year. A smaller percentage of male drivers (56%) declared they had been accident free in the past year.

Table 2a: Experience of Car Accidents by Gender of Driver (with row frequencies & percentages for dependent variable)

Table 2b: Experience of Car Accidents by Gender of Driver (with column frequencies & percentages for dependent variable)

Comments on Tables 2a & 2b • Table 2a and 2b present finding from this survey which indicate that 63% of drivers who experienced at least one car accident in the past year were male and only 37% were female. • Does this mean men are more likely to experience car accidents than women?

Table 3a: Experience of Car Accidents by Gender of Driver and Distance Driven Annually

Comments on Table 3a • If we control for “distance driven annually” the differences between the percentage of accident free drivers by gender disappears entirely for people who drive shorter distances annually. In the case of people who drive longer distances annually (10,000 km or more) there is only a small difference between men and women. • Overall, people who drive shorter distances are more likely to be accident-free than people who drive longer distances, regardless of gender.

continued • People who drove less than 10,000 km annually had the same likelihood of being accident free regardless of gender. Three quarters (75%) of female drivers who drive less than 10,000 km annually remained accident-free and three quarters of male drivers (75%),

continued • Female drivers who drove longer distances were only slightly more likely than male drivers to have remained accident-free: 55% of the females who drove 10,000k m annually or more had remained accident-free as compared to 51% of male drivers who drove 10,000km or more.

Table 3b: Experience of car accident(s) by Gender of Driver and Distance Driven Annually (row percentages—but a different breakdown)

Comments on Table 3B • In this study male drivers who drove longer distances comprise 55% of drivers who had experienced at least one accident in the past year. What would you say about other patterns in this table concerning the trends related to the three variables? (Having at least one accident last year, gender and distance driven?)

Table 1a: Frequency Distribution of Gender of Drivers • Males represent a slightly higher proportion of the respondents (55%) than females (45%) in the sample survey.

Table 1b: Frequency Distribution of Drivers with and without accidents in the past year • The majority of drivers (61%) had not had any accidents in the past year.

Table1c: Frequency Distribution of Distance Driven Annually • Over half of the drivers in the sample (59%) drive 10,000 km or over annually.

Table 2c : Experience of Car Accident(s) in Past Year by Distance Driven Annually (row percentages) • Driving longer distances annually increased the likelihood of experiencing at least one car accident in the past year. Almost three quarters of drivers who experienced at least one accident in the past year drove longer distances.

More comments on 2c • Driving shorter distances decreased the likelihood of having had a car accident in the past year. However there were more people in the sample who drove longer distances (13391 in the category “10,000 km or more’ as compared to 9473 who drove less than 10,000 km annually.) The percentage of drivers who drove longer distances was larger (61% of the drivers in the survey drove 10,000 km or more.)

Table 2d: Distance Driven Annually by Gender (Column percentages)

Comments on Table 2d • Women drivers in the sample drove shorter distances annually than men. Two-thirds of women in the sample (66%) drove less than 10,000 km annually, while 78% of men in the sample drove 10,000 km or more.

Table 2e: Distance Driven Annually by Gender (row percentages)

Table 3: Experience of car accident(s) by Gender of Driver and Distance Driven Annually (column percentages)

Table 3: Experience of Car Accidents by Gender of Driver and Distance Driven Annually

Comments on 2e • The majority of drivers who drove less than 10,000 annually were women (71%). Almost three quarters (74%) of drivers who drove 10,000 were male.

Comparison with Ziezel’s Tables Automobile Accidents by Sex ------------------------------------------ Per Cent Accident Free Women 68% (6,950) Men 56% (7,080) ------------------------------------------ Automobile Accidents by Sex and Distance Driven ---------------------------------------------------------------------------- Distance Under 10,000 km 10,000 km & Over Per Cent Per Cent Accident Free Accident Free Women 75% 55% (5,035) (1,915) Men 75% 51% (2,070) (5,010) ---------------------------------------------------------------------------- Women in this study have fewer accidents than men because women tend to drive shorter distances than men, and people who drive less frequently tend to have shorter distances.

Ways of Presenting of Percentaged Tables • Table 1. Percentage in support of strike by type of school • Percent supporting • Type of School Strike • Secondary 60% • (800) • Elementary 30% • (1000) • __________________________________________________________ • N = 1800 Dependent Variable Independent Variable Serial Number Descriptive Caption Variable One category of dichotomous dependent variable Categoriess Marginals for independent variable Total Sample

Percentaged Tables (cont’d) Dependent Variable Independent Variable • Table 2. Percentage who support strike by type of school and sex • Sex • Female Per cent Male Per cent • Type of Schoolsupporting strikesupporting strike • Secondary 60% 60% • (400) (400) • Elementary 30% 30% • (900) (100) • __________________________________________________________ Female = .30 : Male = .30 N = 1800 Control variable Categories of control variable Control variable

General portal to SPSS tutorials online: http://www.spsstools.net/spss.htm crosstabs (two variables) and ‘layering’ (introducing variables) http://calcnet.mth.cmich.edu/org/spss/V16_materials/Video_Clips_v16/14crosstabs/14crosstabs.swf Using SPSS to ConstructTables

Example of Raw Data Partials & Reading Marginals Regan, T. (1985). In search of sobriety: Identifying factors contributing to the recovery from alcoholism. Kentville, NS.

Research Question: Is TRUST is related to RACE? TRUST = dependent variable RACE = independent variable Source: http://www.ssric.org/trd/text/chapter8 Another Example

At first glance, RACE differences appear to be very important (overall, 58% of those surveyed said people cannot be trusted, but the epsilon statistic -- the difference between the highest and lowest percentage -- is 22). Also note that few Respondents said “Depends” – most had a definite opinion here. Comments

“Let’s do some recoding: RACE should be recoded into a different variable called RACER (Race Recoded). Whites and Blacks will stay the same, but Other is eliminated by recoding it as missing. Let’s also recode TRUST into a different variable called TRUSTR to eliminate the “Depends” category. Don’t forget to create new value labels after you recode. Now run the crosstabs for TRUSTR and RACER. Your output should look like Figure 8-4. “ (See http://www.ssric.org/trd/text/chapter8 for details) Recoding

Bivariate Statistics (continued) & Multivariate Statistics)