Extension

Extension The General Linear Model with Categorical Predictors

Extension • Regression can actually handle different types of predictors, and in the social sciences we are often interested in differences between groups • For now we will concern ourselves with the two independent groups case • E.g. gender, republican vs. democrat etc.

Dummy coding • There are different ways to code categorical data for regression, and in general, to represent a categorical variable you need k-1 coded variables1 • k = number of categories/groups • Dummy coding involves using zeros and ones to identify group membership, and since we only have two groups, one group will be zero (the reference group) and the other 1

Dummy coding Group Outcome 0 3 0 5 0 7 0 2 0 3 1 6 1 7 1 7 1 8 1 9 • Example • The thing to note at this point is that we have a simple bivariate correlation/simple regression setting • The correlation between group and the DV is .76 • This is sometimes referred to as the point biserial correlation (rpb) because of the categorical variable • However, don’t be fooled, it is calculated exactly the same way as the Pearson before i.e. you treat that 0,1 grouping variable like any other in calculating the correlation coefficient • However, the sign is arbitrary since either group could have been a one or zero, and so that needs to be noted

Example Graphical display The R-square is .762 = .577 The regression equation is

Example Look closely at the descriptive output compared to the coefficients. What do you see?

The constant • Note again our regression equation • Recall the definition for the slope and constant • First the constant, what does “when X = O” mean here in this setting? • It means when we are in the O group • What is that predicted value? • Ypred = 4 + 3.4(0) = 4 • That is the group’s mean • The constant here is thus the reference group’s mean

The coefficient Now think about the slope What does a ‘1 unit change in X’ mean in this setting? It means we go from one group to the other Based on that coefficient, what does the slope represent in this case (i.e. can you derive that coefficient from the descriptive stats in some way?) The coefficient is the difference between means

The regression line • The regression line covers the values represented • i.e. 0, 1, for the two groups • It passes through each of their means • Using least squares regression the regression line always passes through the mean of X and Y, though the mean of X here is nonsensical • The constant (if we are using dummy coding) is the mean for the zero (reference) group • The coefficient is the difference between means

Furthermore, the previous gives the same results we would have gotten via a t-test, to which we are about to turn, However, you now can see it is not a distinct procedure from regression with a linear model of some outcome predicted by a grouping variable. Two Sample t-test data: Outcome by Group t = 3.3024, df = 8, p-value = 0.01082 95 percent confidence interval: 5.774177 1.025823 Comparison to the t-test

Summary • Understanding the basics regarding the general linear model can go a long way toward one’s ability to understand any analysis • It not only specifically holds here but is utilized in more complex univariate and multivariate analyses, and even in some nonlinear situations (e.g. logistic regression), we use ‘generalized’ linear models • Y = Xb + e • For properly specified models, linear models provide reasonable fits and an intuitive understanding relative to more complex approaches.

Extension

Extension

Presentation Transcript

Extension

EXTENSION

Extension

Extension Conference

Extension Report

Extension

eXtension

Selena Extension

Extension

eXtension extension

eXtension

Extension

eXtension Transformation of Cooperative Extension

Home Extension

Eyelash Extension

Extension

State Extension Programme for Extension Reforms

Extension Builder Leicester | Leicester Extension Specialist

extension