230 likes | 422 Vues
This article explores the concept of limited dependent variables, focusing on situations where dependent outcomes can only be measured in a restricted format, often dichotomous (e.g., war/no war). We discuss various statistical models used to analyze these variables, including Linear Probability Models (LPM), Logistic Regression, Probit Models, and their assumptions. We'll delve into how these models are estimated using maximum likelihood, the significance of likelihood ratio tests, and the implications of violating model assumptions. Practical applications using Stata are also presented to illustrate these concepts.
E N D
Limited Dependent Variables • Often there are occasions where we are interested in explaining a dependent variable that has only limited measurement • Frequently it is even dichotomous.
Examples • War(1) vs. no War(0) • Vote vs. no vote • Regime change vs. no change
These are often Probability Models • E.g. • Power disparity leads to war: • Where Yt is the occurrence (or not) of war, and Xt is a measure of power disparity • We call this a Linear Probability Model
Problems with LPM Regression • OLS in this case is called the Linear Probability Model • Running regression produces some problems • Errors are not distributed normally • Errors are heteroskedastic • Predicted Ys can be outside the 0.0-1. bounds required for probability
Logistic Model • We need a model that produces true probabilities • The Logit, or cumulative logistic distribution offers one approach. • This produces a sigmoid curve. • Look at equation under 2 conditions: • Xi = +∞ • Xi = -∞
Probability Ratio • Note that • Where
Log Odds Ratio • The logit is the log of the odds ratio, and is given by: • This model gives us a coefficient that may be interpreted as a change in the weighted odds of the dependent variable
Estimation of Model • We estimate this with maximum likelihood • The significance tests are z statistics • We can generate a Pseudo R2 which is an attempt to measure the percent of variation of the underlying logit function explained by the independent variables • We test the full model with the Likelihood Ratio test (LR), which has a χ2 distribution with k degrees of freedom
Neural Networks • The alternate formulation is representative of a single-layer perceptron in an artificial neural network.
Probit • If we can assume that the dependent variable is actually the result of an underlying (and immeasurable) propensity or utility, we can use the cumulative normal probability function to estimate a Probit model • Also, more appropriate if the categories (or their propensities) are likely to be normally distributed • It looks just like a logit model in practice
The Cumulative Normal Density Function • The normal distribution is given by: • The Cumulative Normal Density Function is:
The Standard Normal CDF • We assume that there is an underlying threshold value (Ii) that if the case exceeds will be a 1, and 0 otherwise. • We can standardize and estimate this as
Probit estimates • Again, maximum likelihood estimation • Again, a Pseudo R2 • Again, a LR ratio with k degrees of freedom
Assumptions of Models • All Y’s are in {0,1} set • They are statistically independent • No multicollinearity • The P(Yi=1) is normal density for probit, and logistic function for logit
Ordered Probit • If the dependent variable can take on ordinal levels, we can extend the dichotomous Probit model to an n-chotomous, or ordered, Probit model • It simply has several threshold values estimated • Ordered logit works much the same way
Multinomial Logit • If our dependent variable takes on different values, but they are nominal, this is a multinomial logit model
Some additional info • The Modal category is good benchmark • Present % correctly predicted • This can be calculated and presented. • This, when compared to the modal category, gives us a good indication of fit.
Stata • Use Leadership Change data • (1992 cross section)1992-Stata
Test different models • Dependent variable Leadership change • Examine distribution tables ledchan1 • Independent variables • Try different • Try corr and then (pwcorr)
Try the following regress ledchan1 grwthgdp hlthexp illit_f polity2 logit ledchan1 grwthgdp hlthexp illit_f polity2 logistic ledchan1 grwthgdp hlthexp illit_f polity2 probit ledchan1 grwthgdp hlthexp illit_f polity2 ologit ledchan1 grwthgdp hlthexp illit_f polity2 oprobit ledchan1 grwthgdp hlthexp illit_f polity2 mlogit ledchan1 grwthgdp hlthexp illit_f polity2 tobit ledchan1 grwthgdp hlthexp illit_f polity2, ul ll
Tobit • Assumes a 0 value, and then a scale • E.g., the decision to incarcerate • 0 or 1 • (Imprison or not) • If Imprison, than for how many years?
Other models • This leads to many other models • Count models & Poisson regression • Duration/Survival/hazard models • Censoring and truncation models • Selection bias models