1 / 17

Categorical Variables

Categorical Variables. Categorical Variables. Independent X – variables that take on only a limited number of values are termed categorical variables, dummy variables, or indicator variables.

bridie
Télécharger la présentation

Categorical Variables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Categorical Variables

  2. Categorical Variables • Independent X – variables that take on only a limited number of values are termed categorical variables, dummy variables, or indicator variables. • Common examples are: time periods in which there is a price surge or bubble; months of the year; days of the week; gender; educational level.

  3. Salary Data Example

  4. Model • Salary is the dependent variable that is to be estimated. • Executive is a categorical variable that has only two values: 0 – represents that the employee is not an executive; 1 – represents that the employee is an executive. • A linear regression model may be constructed:Salary = a + b * Executive

  5. Executive Model Results Salary = a + b * Executive Highly Significant Average Salary for non-Executive: $37,514 Average Salary for Executive: $90,601

  6. Categorical Variables Effect • The effect of a categorical variable is to add an additional constant amount to the y-intercept for the subset of points included in the category. • Graphically, it creates a separate regression line for each category. The slope of the line is constant but the y-intercepts vary.

  7. Executive Variable Effect

  8. Gender Model Results Salary = a + b * Gender Not Significant Average Salary for 0-Gender: $52,126 Average Salary for 1-Gender: $43,646

  9. Gender Variable Effect

  10. Education Model Results Salary = a + b * Education Significant Average Salary for 0 Education: $11,656 Additional Value per year: $8,448

  11. Education Variable Effect

  12. Education Variable • The Education variable has multiple values: 0, 2, 4, 6, 8. This variable was used directly in the regression estimation. Although it appeared categorical, it was in fact used as a numerical variable. • Implicit in the use of any explanatory variable is that its effect is linearly increasing or decreasing. For the education variable, this would mean that the effect on Salary of having a two-year degree would be exactly ½ of the effect of having a four-year degree. • This linearity may be questionable.

  13. Education Results • Linearity would imply “incorrectly” that dropping out after three years of college, in salary terms, would result in a loss of “only” $8448, compared with finishing one’s Bachelor’s degree. • But, since degrees are really “0” and “1”, a better approach is to consider each level of degree as a separate categorical variable.

  14. Constructed Education Categories • If the linearity of a limited value variable is questionable, then the variable may be better modeled by constructing a series of indicator or dummy variables that each represents exactly one value: Education0, Education2, Education4, Education6, Education8. In this way, the effect of each level can be considered independently. • This technique frequently occurs with time variables, i.e. months. One should not implictly assume that the monthly effect in December (12) is 12 times as large as the monthly effect in January (10.

  15. Education Results

  16. Results Discussion • The previous results illustrate some values, but also obscure other values. • The results show that having less than a Bachelor’s degree has a significant $30,000 effect on average salary at this company. • The lack of statistical significance for the Bachelor’s degree and Master’s degree variables obscures the fact that it is unreasonable to use these variables alone since it would conflate the salaries for Ph.D.’s with the salaries of those with no college, when it is clear that at this company the two groups do not earn anything near the same salary.

  17. A Peek Ahead

More Related