Review for Final Examination

Review for Final Examination COMM 550X, May 12, 11 am- 1pm Final Examination

Practice for the Mid-Term • Multiple choice portion of the test: There will be 50 multiple choice questions chosen at random from this pool of possible test questions. Each item will be worth 1 point • SPSS DATA ANALYSIS: You will be tested in SPSS on bivariate correlation, multiple regression, and MANOVA/discriminant analysis. The questions will use the data sets statelevel.savand NationsoftheWorldModified.sav. The questions will have point values as follows: bivariate correlation, 10 points; multiple regression, 18 points; MANOVA/discriminant analysis, 22 points

Sample Test Question for Bivariate Correlation (8 points) • Using the NationsoftheWorldModified.sav data set, test the hypothesis that that there is a significant positive association between a country’s civil liberties score and the annual number of peace demonstrations in that country. Set your confidence level at .05. Report the obtained value of the test statistic, the N, the df and probability level, and whether or not you can reject the null hypothesis of no association between the two variables.

Testing the Hypothesis • You have been asked to see if there is a significant association between two variables. For tests where both variables are interval level or better and no causal relationship between the two is implied, the appropriate test statistic to compute is the bivariate correlation. You are looking for a significant level of Pearson’s r, the correlation coefficient • In SPSS Data Editor, open the NationsoftheWorldModified.sav data file • Go to Analyze/Correlate/Bivariate and put the two variables, civil liberties score and number of peaceful political demonstrations, into the Variables window • Select a one-tailed test (you do this because you have made a prediction about the direction of the relationship, that it will be positive) and flag significant correlations • Under Correlation Coefficients select Pearson and click OK • Compare your output to the next slide

SPSS Output for Bivariate Correlation • You only get a small amount of output for bivariate correlation. Note the correlation coefficient (.077), the sample size (N = 112) and the significance level (.208). DF is equal to N-2 for Pearson’s r. • Before you did the test, you set your confidence level to .05, so p (the probablility level) needed to be smaller than .05 for you to reject the null hypothesis. But your obtained value of Pearson’s r has a significance level of .208. Consequently, you cannot reject the null hypothesis, and you are not able to confirm your research hypothesis that there is a significant positive association between a country’s civil liberties score and the number of its peaceful political demonstrations Pearson’s r significance level

Writing up your Result • “Bivariate correlation analysis was performed to test the hypothesis that a country’s civil liberties score was positively associated with its number of peaceful political demonstrations. The obtained value of Pearson’s r was .077 (N = 112, df = 110, p = .208, one-tailed test), which was not significant. Consequently, we cannot reject the null hypothesis that there is no association between a country’s civil liberties score and its number of peaceful political demonsrations, and our research hypothesis was not confirmed.” (Note: if the significance level had fallen below .05, then you would have confirmed your research hypothesis only if the sign of the association between the two variables was positive, as predicted, that is, if the obtained correlation coefficient was positive)

Sample Test Question for Multiple Regression • You are asked to test the hypothesis that a country’s scores on the civil liberties index is a function of a linear combination of three variables, (1) percentage of seats in the lower legislative house held by the largest party, (2) percentage in the work force who are women, and (3) percentage of voting age population who voted in the last election. You believe that these variables are of importance in the order listed above. Further, you expect that the signs of the first predictor, percentage of seats, will be negative, and the signs of the second two predictors will be positive. Test the hypothesis and then write an equation for predicting the score of a new case on the civil liberties index based on the three variables. Set your confidence level to .05. Report the test statistic, N, df, and obtained probability level, and all other statistics appropriate to determining whether or not you have used the procedure correctly, and state whether or not your data support rejecting the null hypothesis that civil liberties is unrelated to the three variables, and confirming your research hypothesis

Testing the Hypothesis • To test this hypothesis, you need a procedure which looks at the relationship between a single, interval or better level variable on the one hand and multiple interval level or better predictors on the other. This is multiple regression. Since your theory has given you a reason to order the importance of your predictors ahead of time, you choose a hierarchical regression analysis where you enter the variables into the regression equation in the order of their presumed importance.

SPSS Procedure for Multiple Regression • Download the NationsoftheWorldModified.sav data file • Go to Analyze/ Regression/ Linear • Move civil liberties score into the Dependent Box • Now we are going to enter variables one at a time, in the order predicted by our theory. Move your first to enter variable, percentage of seats in the lower legislative house held by the largest party, into the Independent box and click Next • Move your second to enter variable, percentage of the work force who are women, into the Independent box and click Next • Finally, move your third to enter variable, percentage of the voting age population who voted in the last election, into the Independent box. DON’T click next again • Make sure the enter option is selected under Method • Under Statistics, select Estimates, Confidence Intervals, Model Fit, R squared change, Descriptives, Part and Partial Correlation, and Collinearity Statistics, and click Continue. • Under Options, check Include Constant in the Equation, click Continue and then OK. You are doing this so you will be able to write the equation for predicting new cases’ civil liberties scores from raw scores on the predictor variables. • Compare results to next slides

SPSS Output: The Variables and their Order of Entry • Look for this box to make sure you have done the hierarchical regression form of multiple regression and that your variables have been entered in the order predicted by your theory

The Regression Model Summary Table • Next, look for your model summary. Note that there are three models examined, and the notes a, b, and c tell which of your predictors are in each model. Note that model 1, with only the percent of seats in the lower legislative house variable entered, was significant (F = 52.544, p <.001), and when the percentage of labor force who are women variable was added in model 2, the increase in R square, the percent of variance accounted for, was significant (F = 6.346, p < .014). Thus the two-variable model is significantly correlated with Y. Note that Model three didn’t change R square significantly (p = .471) (didn’t improve prediction significantly) so you really don’t need the third predictor, percent of voting age population who voted in last election. You choose Model 2

Regression Statistics; R and R Square • Note the statistics for the Model you have chosen, Model 2. The multiple correlation R between civil liberties score and the two predictors is .659. The amount of variance in the civil liberties score accounted for by the combination of the two variables is .435

Overall Significance of the Regression Equation • Look in the ANOVA table to get the overall F value for the Model you have chosen (the F (2, 81) value for the two variable combination of percent of seats held by largest party and percent of labor force who are women is 31.158, p < .001

Standardized and Unstandardized Coefficients; Multicollinearity • Continue to examine your output. Note the standardized and unstandardized coefficients. You can use the unstandardized coefficients to write the regression equation Y = 6.194 -.039 percent of seats held by largest party + .034 percent of labor force who are women. You can use the standardized coefficients to compare the relative contributions of number of seats and percent of women (.-620 and .210, respectively) and note that both standardized coefficents were significantly different from zero. Note also that the sign of the standardized coefficient for percentage of seats was a minus sign, as predicted by your theory, and that the sign of the other variable was positive, as predicted. You can also report your tolerance and VIF statistics which suggest that multicollinearity was not a problem (tolerance is 1.0, VIF is not near 10)

Writing up Your Multiple Regression Results • “To test the hypothesis that a country’s civil liberties score was significantly related to a linear combination of the number of seats in the lower legislative house held by the largest party, the number of women in the labor force, and the percentage of the voting age population who voted in the last election, a multiple regression analysis was conducted. It was expected that the variable ‘number of seats held by the largest party’ would be negatively correlated with civil liberties score and the other two variables positively related. Results of the regression analysis indicated that a two-variable model which included number of seats in the lower legislative house held by the largest party and percentage of women in the workplace was significantly correlated with civil liberties scores (F (2, 81) = 31.158, p < .001. Addition of the third variable to the predictive model did not significantly increase the amount of variance in civil liberties score (F = .525, p < .471). The two-variable combination accounted for approximately 43.5% of the variance in civil liberties score. (continued on next slide)

Writing up Your Multiple Regression Results, cont’d • The best fitting regression equation for predicting civil liberties score from the two variables was civil liberties score = 6.194 -.039 percent of seats held by largest party + .034 percent of labor force who are women. Significant standardized coefficients (βs) were obtained for the two variables (-.620 for percent of seats held by the largest party and .210 for percentage of women in the labor force), indicating that countries with higher scores on civil liberties would be likely to have a smaller percentage of seats in the lower legislative house held by the largest party and a larger percentage of women in the labor force, as predicted. Tolerance and VIF for the two-variable model were both equal to 1.0, indicating that multicollinearity was not an issue. Thus we can say that partial support for the hypothesis was obtained.”

Sample Test Question for Discriminant Analysis • Now we are going to test the following hypothesis: Southern and non-Southern states differ significantly on a combination of two types of traffic fatality: restrained and unrestrained motor vehicle accidents, such that Southern states will have a significantly higher value on the combined indicators than non-Southern states.

Testing the Hypothesis • Both discriminant analysis and MANOVA can be used in the case where you have two or more interval or better level predictors (DVs in the usage of MANOVA) and a nominal level grouping variable (IV in the usage of MANOVA). In this case we have a nominal level grouping variable (Southern/non-Southern) and interval level (actually ratio level) DVs or discriminating variables (traffic fatality variables). • We are going to use discriminant analysis to do the MANOVA, which (1) will give the identical result in the case where there are only two groups (two levels of the grouping variable) and (2) let us practice doing discriminant analysis and evaluating the efficacy of the discriminant function. We are going to be looking for a significant level of Wilks’ lambda as an indicator of significant differences and support for the hypothesis. It is also necessary for the signs of the discriminant function coefficients to be in the same direction as that predicted for the two variables (a positive relationship with “southerness”).

SPSS Procedure for Discriminant Analysis • Download the file statelevel.sav. • In SPSS Data Editor, open the data file statelevel.sav • Go to Analyze/Classify/Discriminant • In the Group box put South (dummy) and set the maximum and minimum values to 1 and 0, respectively • In the Independents, put restrained motor vehicle deaths per 100k and unrestrained motor vehicle deaths per 100k • Make sure that the Enter Independents Together button is checked • Under Statistics, check Means, univariate ANOVAs, Box’s M, Unstandardized function coefficients, and click continue • Under Classify, select Summary Table and Territorial Map, and click Continue, and then OK • Compare your output to the next few slides

Examining Your SPSS Output: Group Means • First, look at the group means. Note that the means are in the expected direction with levels of the two vehicle death variables higher in the South than in the non-South. Univariate F tests show that the differences are significant for both of the variables. So you have significant differences in the expected direction on both of your variables considered separately

Box’s M Test for Equality of Group Covariances, and Significance of Wilk’s Lambda Overall Test • Next, look at your Box’s M test for the equality of group covariances. Box’s M is not significant, which means you have met one of the assumptions of MANOVA, that the group covariances for the levels of the grouping variable are equal. Now look at the value of Wilks’ lambda, and assess it for significance. Wilks’ lambda is significant by the Chi-square test, and it equals .783. If we interpret this significant value of Wilks’ lambda in a MANOVA-like way, we have confirmed the hypothesis that Southern and non-Southern states differ significantly on the combination of the two motor vehicle predictors. (If we were interpreting this in a discriminant analysis type of way, we would say that the combination of two types of traffic related fatalities left .783 of the variance in Southern state-ness “unexplained”). Wilks’ lambda is one of those measures you want to be close to zero, so this result is statistically significant, but not all that impressive

The Canonical Correlation • From your printout you will also want to report the canonical correlation between the combination of the two traffic fatality variables and South/Non-South, which is. 465. This represents the correlation of the grouping variable (South/non-South) with the new canonical variable formed by weighting the two original predictors (traffic fatalities belted and unbelted) by the weights from the discriminant function. You don’t usually report the equation for classifying new cases in the write-up when you are using MANOVA or discriminant analysis to test a hypothesis about group differences You would use these weights to classify new cases as to south/non-South

Discriminant Function Coefficients, Group Means on Functions • You would report the standardized discriminant function coefficients to show the relative contribution of each of the two predictors, which in this case are about equal, and both positively associated with the discriminant function, as required for support of your hypothesis. Then would you report the group means (centroids) on the discriminant function which shows that the South is highly positively correlated with it (e.g., being a Southern state is highly correlated with higher vehicle deaths) and the non-south is negative correlated with it.

Classification Results • Finally, you would report the re-classification results (that is, the results of using the discriminant function coefficients to create a new, canonical variable out of the old predictors and use this new variable to re-classify cases as to South or non-South) and the most frequently occurring misclassifications; e.g., 78% of the cases were correctly re-classified based on the discriminant function. Slightly more errors proportionally were made re-classifying the Southern than the non-Southern cases

Writing up your Discriminant Analysis Result • “A discriminant analysis was conducted to perform a multivariate-analysis of variance test of the hypothesis that Southern states differ from non-Southern states on a linear combination of two types of traffic fatality, restrained motor vehicle accidents and unrestrained motor vehicle accidents, such that Southern states will have a significantly higher value on the combined indicators than non-Southern states. The obtained value of Wilks’ lambda, .783, was significant at p <.003 (Chi-square = 11.478, df = 2, Box’s M =1.912, n.s.). The canonical correlation between the grouping variable and the new canonical variable composed of the two predictors was .465. Significant univariate differences of means between Southern and non-Southern states were also obtained for restrained motor vehicle accidents (F (1, 48) = 6.567, p <.014) and unrestrained vehicle accidents (F (1, 48) = 5.664, p < .021). Mean differences were in the expected direction: means for restrained motor vehicle accidents were 10.65 for Southern states and 8.62 for non-Southern states; means for unrestrained motor vehicle accidents were 10.69 for Southern states and 7.54 for non-Southern states.

Writing up Your Discriminant Analysis Result, cont’d • Table 1 presents the standardized discriminant function coefficients. Higher scores on the discriminant function corresponded to higher traffic fatality rates for both of the discriminating variables. Table 2 presents the group centroids on the discriminant function; the Southern states group had a high, positive centroid with respect to the function, corresponding to higher rates of traffic fatalities. Table 3 presents the results of the re-classification analysis, which shows that the discriminant function was successful in reclassifying 78% of the cases.”

Review for Final Examination

Review for Final Examination

Presentation Transcript

Final Examination Review Questions

Review for the Final Examination

Review for final

Review for Final

Review for Final

Final Examination Review Questions

Review for Final

Material for the Final Examination

Final Examination

Final Examination Review

Lecture 29: Review for Comprehensive Final Examination

Final Examination 2011

Review for Final

Review for Final

Final Examination Review

Final Examination

Preparing for the Final Examination

Review for Final

Final Examination

Final Examination

Final Examination:

REVIEW FOR FINAL

Sea Ice

Sea Ice