1 / 151

Categorical Data Analysis

Categorical Data Analysis. Week 2. Binary Response Models. binary and binomial responses binary: y assumes values of 0 or 1 binomial: y is number of “successes” in n “ trials” distributions Bernoulli: Binomial:. Transformational Approach. linear probability model

corin
Télécharger la présentation

Categorical Data Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Categorical Data Analysis Week 2

  2. Binary Response Models • binary and binomial responses • binary: y assumes values of 0 or 1 • binomial: y is number of “successes” in n “trials” • distributions • Bernoulli: • Binomial:

  3. Transformational Approach • linear probability model • use grouped data (events/trials): • “identity” link: • linear predictor: • problems of prediction outside [0,1]

  4. The Logit Model • logit transformation: • inverse logit: • ensures that p is in [0,1] for all values of x and .

  5. The Logit Model • odds and odds ratios are the key to understanding and interpreting this model • the log odds transformation is a “stretching” transformation to map probabilities to the real line

  6. Odds and Probabilities

  7. Probabilities and Log Odds

  8. The Logit Transformation • properties of logit linear

  9. Odds, Odds Ratios, and Relative Risk • odds of “success” is the ratio: • consider two groups with success probabilities: • odds ratio (OR) is a measure of the odds of success in group 1 relative to group 2

  10. Odds Ratio Y 0 1 • 2 X 2 table: • OR is the cross-product ratio (compare x = 1 group to x = 0 group) • odds of y = 1 are 4 times higher when x =1 than when x = 0 0 1 X

  11. Odds Ratio • equivalent interpretation • odds of y = 1 are 0.225 times higher when x = 0 than when x = 1 • odds of y = 1 are 1-0.225 = .775 times lower when x = 0 than when x = 1 • odds of y = 1 are 77.5% lower when x = 0 than when x = 1

  12. Log Odds Ratios • Consider the model: • D is a dummy variable coded 1 if group 1 and 0 otherwise. • group 1: • group 2: • LOR: OR:

  13. Relative Risk • similar to OR, but works with rates • relative risk or rate ratio (RR) is the rate in group 1 relative to group 2 • OR RR as .

  14. Tutorial: odds and odds ratios • consider the following data

  15. Tutorial: odds and odds ratios • read table: clear input educ psex f 0 0 873 0 1 1190 1 0 533 1 1 1208 end label define edlev 0 "HS or less" 1 "Col or more" label val educ edlev label var educ education

  16. Tutorial: odds and odds ratios • compute odds: • verify by hand tabodds psex educ [fw=f]

  17. Tutorial: odds and odds ratios • compute odds ratios: • verify by hand tabodds psex educ [fw=f], or

  18. Tutorial: odds and odds ratios • stat facts: • variances of functions • use in statistical significance tests and forming confidence intervals • basic rule for variances of linear transformations • g(x) = a + bx is a linear function of x, then • this is a trivial case of the delta method applied to a single variable • the delta method for the variance of a nonlinear function g(x) of a single variable is

  19. Tutorial: odds and odds ratios • stat facts: • variances of odds and odds ratios • we can use the delta method to find the variance in the odds and the odds ratios • from the asymptotic (large sample theory) perspective it is best to work with log odds and log odds ratios • the log odds ratio converges to normality at a faster rate than the odds ratio, so statistical tests may be more appropriate on log odds ratios (nonlinear functions of p)

  20. Tutorial: odds and odds ratios • stat facts: • the log odds ratio is the difference in the log odds for two groups • groups are independent • variance of a difference is the sum of the variances

  21. Tutorial: odds and odds ratios • data structures: grouped or individual level • note: • use frequency weights to handle grouped data • or we could “expand” this data by the frequency weights resulting in individual-level data • model results from either data structures are the same • expand the data and verify the following results expand f

  22. Tutorial: odds and odds ratios • statistical modeling • logit model (glm): • logit model (logit): glm psex educ [fw=f], f(b) eform logit psex educ [fw=f], or

  23. Tutorial: odds and odds ratios • statistical modeling (#1) • logit model (glm):

  24. Tutorial: odds and odds ratios • statistical modeling (#2) • some ideas from alternative normalizations • what parameters will this model produce? • what is the interpretation of the “constant” gen cons = 1 glm psex cons educ [fw=f], nocons f(b) eform

  25. Tutorial: odds and odds ratios • statistical modeling (#2)

  26. Tutorial: odds and odds ratios • statistical modeling (#3) • what parameters does this model produce? • how do you interpret them? gen lowed = educ == 0 gen hied = educ == 1 glm psex lowed hied [fw=f], nocons f(b) eform

  27. Tutorial: odds and odds ratios • statistical modeling (#3) are these odds ratios?

  28. Tutorial: prediction • fitted probabilities (after most recent model) predict p, mu tab educ [fw=f], sum(p) nostandard nofreq

  29. Probit Model • inverse probit is the CDF for a standard normal variable: • link function:

  30. Probit Transformation

  31. Interpretation • probit coefficients • interpreted as a standard normal variables (no log odds-ratio interpretation) • “scaled” versions of logit coefficients • probit models • more common in certain disciplines (economics) • analogy with linear regression (normal latent variable) • more easily extended to multivariate distributions

  32. Example: Grouped Data • Swedish mortality data revisited logit model probit model

  33. Swedish Historical Mortality Data • predictions

  34. Programming • Stata: generalized linear model (glm) glm y A2 A3 P2, family(b n) link(probit) glm y A2 A3 P2, family(b n) link(logit) • idea of glm is to make model linear in the link. • old days: Iteratively Reweighted Least Squares • now: Fisher scoring, Newton-Raphson • both approaches yield MLEs

  35. Generalized Linear Models • applies to a broad class of models • iterative fitting (repeated updating)except for linear model • update parameters, weights W, and predicted values m • models differ in terms of W and m and assumptions about the distribution of y • common distributions for yinclude: normal, binomial, and Poisson • common links include: identity, logit, probit, and log

  36. Latent Variable Approach • example: insect mortality • suppose a researcher exposes insects to dosage levels (u) of an insecticide and observes whether the “subject” lives or dies at that dosage. • the response is expected to depend on the insect’s tolerance (c) to that dosage level. • the insect dies if u > c and survives if u < c • tolerance is not observed (survival is observed)

  37. Latent Variables • u and c are continuous latent variables • examples: • women’s employment: u is the market wage and c is the reservation wage • migration: u is the benefit of moving and c is the cost of moving. • observed outcome y =1 or y = 0 reveals the individual’s preference, which is assumed to maximize a rational individual’s utility function.

  38. Latent Variables • Assume linear utility and criterion functions • over-parameterization = identification problem • we can identify differences in components but not the separate components

  39. Latent Variables • constraints: • Then: • where F(.) is the CDF of ε

  40. Latent Variables and Standardization • Need to standardize the mean and variance of ε • binary dependent variables lack inherent scales • magnitude of βis only in reference to the mean and variance of ε which are unknown. • redefine ε to a common standard • where a and b are two chosen constants.

  41. Standardization for Logit and Probit Models • standardization implies • F*() is the cdf of ε* • location a and scale b need to be fixed • setting • and

  42. Standardization for Logit and Probit Models • distribution of ε is standardized • standard normal probit • standard logistic  logit • both distributions have a mean of 0 • variances differ

  43. Extending the Latent Variable Approach • observed y is a dichotomous (binary) 0/1 variable • continuous latent variable: • linear predictor + residual • observed outcome

  44. Notation • conditional means of latent variables obtained from index function: • obtain probabilities from inverse link functions logit model: probit model:

  45. ML • likelihood function • where if data are binary • log-likelihood function

  46. Assessing Models • definitions: • L null model (intercept only): • L saturated model (a parameter for each cell): • L current model: • grouped data (events/trials) • deviance (likelihood ratio statistic)

  47. Deviance • grouped data: • if cell sizes are reasonably large deviance is distributed as chi-square • individual-level data: Lf=1 and log Lf=0 • deviance is not a “fit” statistic

  48. Deviance • deviance is like a residual sum of squares • larger values indicate poorer models • larger models have smaller deviance • deviance for the more constrained model (Model 1) • deviance for the less constrained model (Model 2) • assume that Model 1 is a constrained version of Model 2.

  49. Difference in Deviance • evaluate competing “nested” models using a likelihood ratio statistic • model chi-square is a special case • SAS, Stata, R, etc. report different statistics

  50. Other Fit Statistics • BIC & AIC (useful for non-nested models) • basic idea of IC : penalize log L for the number of parameters (AIC/BIC) and/or the size of the sample (BIC) • AIC s=1 • BIC s= ½ log n (sample size) • dfmis the number of model parameters

More Related