# Lecture 27

Télécharger la présentation

## Lecture 27

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Lecture 27 • Polynomial Terms for Curvature • Categorical Variables

2. Polynomial Terms for Curvature • To model a curved relationship between y and x, we can add squared (and cubic or higher order) terms as explanatory variables. • Fit as a multiple regression with two explanatory variables and • Coefficients are not directly interpretable. Change in the mean of Y that is associated with a one unit increase in X depends on X. • To test whether the multiple regression model with X and X2 as predictors provides better predictions than the multiple regression model with just X, use the p-value of the t-test on the X2 coefficient (null hypothesis is that X2 has a zero coefficient). • Plot residuals vs. X to determine whether quadratic model is appropriate. If there is still a pattern in the mean, can try a cubic model with X, X2 and X3.

3. Regression Model for Fast Food Chain Data • Interactions and polynomial terms can be combined in a multiple regression model • For fast food chain data, we consider the model • This is called a second-order model because it includes all squares and interactions of original explanatory variables.

4. fastfoodchain.jmp results • Strong evidence of a quadratic relationship between revenue and age, revenue and income. Moderate evidence of an interaction between age and income.

5. Categorical variables • Categorical (nominal) variables: Variables that define group membership, e.g., sex (male/female), color (blue/green/red), county (Bucks County, Chester County, Delaware County, Philadelphia County). • Categorical variables can be incorporated into regression through dummy variables. • We will look at categorical variables that have two categories.

6. Sex discrimination revisited • At the beginning of the class, in case study 1.2, we examined data from a sex discrimination case. Strong evidence that male clerks are paid more than female hires. But bank’s defense lawyers say that this is because males have higher education and experience, i.e., there are omitted confounding variables.

7. Multiple regression model for sex discrimination • Let’s look at controlling for education level first. • To examine bank’s claim, we want to look at and compare to • How do we incorporate a categorical explanatory variable into multiple regression? Dummy variables.

8. Dummy variables • Define • Multiple regression model: • , the coefficient on the dummy variable for sex, is the difference in mean earnings between the populations of men and women with the same education levels.

9. Categorical variables in JMP • To color and mark the points by a categorical variable such as Sex, click red triangle to left on first column and select Color or Mark by Column. Select Set Marker by Value to use different marker by column.

10. Parallel Regression Lines • The model implies that • Regression lines for males and females as education varies are parallel. • No interaction between sex and education.

11. Plot produced by JMP version 5 in Fit Model output that shows the parallel regression lines and the actual observations.

12. Interactions with Dummy Variables • The model assumes that difference between men and women’s mean salaries for fixed levels of education is the same for all levels of education. • There might be an interaction between sex and education. Difference between men and women might differ depending on level of education.

13. Interaction Model • Multiple regression model that allows for interaction between sex and education: • To add interaction in JMP, create a new colun sexdummy*educ. Right click on column, select formula and use the formula sexdummy*educ.. Difference in mean salary between men and women of same education level depends on the education level.

14. The model with one continuous explanatory variable, one categorical variable and an interaction is called the separate regression lines model because regression lines of y on continuous explanatory variables for two levels of dummy variable are “separate,” neither coincident nor parallel.

15. Multiple regression with education, experience and sex • We can easily control for both education and experience in the sex discrimination case by adding them both to the multiple regression. A model without interactions is: • Note that • is difference between mean salaries of males and females of same education and experience level.