130 likes | 249 Vues
This guide explains the use of dummy variables in regression analysis, focusing on how to incorporate qualitative variables like color in used cars and restaurant location profitability. By employing indicator variables, we analyze how to predict outcomes such as auction prices for cars based on color categories, and profitability for new restaurant locations based on income and age demographics. Through practical examples, including regression equations, we illustrate how qualitative data can influence results and how to handle multicollinearity in models, ensuring accurate predictions.
E N D
Outline • When X’s are Dummy variables • EXAMPLE 1: USED CARS • EXAMPLE 2: RESTAURANT LOCATION • Modeling a quadratic relationship • Restaurant Example
Qualitative Independent Variables • In many real-life situations one or more independent variables are qualitative. • Including qualitative variables in a regression analysis model is done via indicator variables. • An indicator variable (I) can assume one out of two values, “zero” or “one”. 1 if a degree earned is in Finance 0 if a degree earned is not in Finance 1 if the temperature was below 50o 0 if the temperature was 50o or more 1 if a first condition out of two is met 0 if a second condition out of two is met 1 if data were collected before 1980 0 if data were collected after 1980 I=
1 if the color is white 0 if the color is not white I1 = 1 if the color is silver 0 if the color is not silver I2 = Example 1 • The dealer believes that color is a variable that affects a car’s price. • Three color categories are considered: • White • Silver • Other colors • Note: Color is a qualitative variable. And what about “Other colors”? Set I1 = 0 and I2 = 0
To represent a qualitative variable that has m possible categories (levels), we must create m-1 indicator variables. • Solution • the proposed model is y = b0 + b1(Odometer) + b2I1 + b3I2 + e • The data White car Other color Silver color
There is insufficient evidence to infer that a white color car and a car of “Other color” sell for a different auction price. There is sufficient evidence to infer that a silver color car sells for a larger price than a car of the “Other color” category.
Price 6498 - .0278(Odometer) 6395.2 - .0278(Odometer) 6350 - .0278(Odometer) Odometer From Excel we get the regression equation PRICE = 6350-.0278(ODOMETER)+45.2I1+148I2 For one additional mile the auction price decreases by 2.78 cents. A white car sells, on the average, for $45.2 more than a car of the “Other color” category A silver color car sells, on the average, for $148 more than a car of the “Other color” category The equation for a car of silver color Price = 6350 - .0278(Odometer) + 45.2(0) + 148(1) The equation for a car of white color The equation for a car of the “Other color” category. Price = 6350 - .0278(Odometer) + 45.2(1) + 148(0) Price = 6350 - .0278(Odometer) + 45.2(0) + 148(0)
Example 2 Location for a new restaurant • A fast food restaurant chain tries to identify new locations that are likely to be profitable. • The primary market for such restaurants is middle-income adults and their children (between the age 5 and 12). • Which regression model should be proposed to predict the profitability of new locations?
Revenue Revenue Income age Low Middle High Low Middle High • Solution • The dependent variable will be Gross Revenue • There are quadratic relationships between Revenue and each predictor variable. Why? • Members of middle-class families are more likely to visit a fast food family than members of poor or wealthy families. Revenue = b0 + b1Income + b2Age + b3Income2 +b4Age2 + b5(Income)(Age) +e • Families with very young or older kids will not visit the restaurant as frequent as families with mid-range ages of kids.
Example 2 • To verify the validity of the model proposed in example 19.1, 25 areas with fast food restaurants were randomly selected. • Data collected included (see Xm19-02.xls): • Previous year’s annual gross sales. • Mean annual household income. • Mean age of children
The model can be used to make predictions. However, do not interpret the coefficients or test them. Multicollinearity is a problem!! In excel: Tools > Data Analysis > Correlation
Regression results of the modified model Multicolinearity is not a problem anymore