1 / 22

Limited Dependent Variables LPM, Probit and Logit

Limited Dependent Variables LPM, Probit and Logit. P ( y = 1| x ) = G ( b 0 + x b ). Limted Dependent Variables (LDV). An LDV is broadly defined as a dependent variable whose range of values is substantially restricted.

miyo
Télécharger la présentation

Limited Dependent Variables LPM, Probit and Logit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Limited Dependent Variables LPM, Probit and Logit P(y = 1|x) = G(b0 + xb)

  2. Limted Dependent Variables (LDV) • An LDV is broadly defined as a dependent variable whose range of values is substantially restricted. • Participation percentage in a retirement plan must be restricted between 0 and 100 % • The number of times a person has been arrested; 1, 2, 3, … • Did a person smoke marijuana or not (1 = yes, 0 = no)?

  3. LDV, cont. • Many economic variables are limited in some way, often because they must be positive • Wages • Housing prices • Nominal interest rates • If a strictly positive variable takes on many values, a special econometric model is rarely necessary

  4. Binary Dependent Variables • Binary dependent variables generally take the value 1 or 0. • Recall the linear probability model, which can be written as P(y = 1|x) = b0 + xb • A drawback to the linear probability model is that predicted values are not constrained to be between 0 and 1 • An alternative is to model the probability as a function, G(b0 + xb), where 0<G(z)<1

  5. The Probit Model • One choice for G(z) is the standard normal cumulative distribution function (cdf) • G(z) = F(z) ≡ ∫f(v)dv, where f(z) is the standard normal • f(z) = (2p)-1/2exp(-z2/2) • This case is referred to as a probit model • This choice of G ensures that P(y=1|x) will always be between 0 and 1. • Since it is a nonlinear model, it cannot be estimated by our usual methods, instead we will use something called maximum likelihood estimation

  6. The Logit Model • Another common choice for G(z) is the logistic function, which is the cdf for a standard logistic random variable • G(z) = exp(z)/[1 + exp(z)] = L(z) • This case is referred to as a logit model, or sometimes as a logistic regression • Both functions have similar shapes – they are increasing in z, most quickly around 0

  7. Probits and Logits • Both the probit and logit are nonlinear and require maximum likelihood estimation • No real reason to prefer one over the other • Traditionally saw more of the logit, mainly because the logistic function leads to a more easily computed model • Today, probit is easy to compute with standard packages, so more popular

  8. Interpretation of Probits and Logits (vs LPM) • In general we care about the effect of x on P(y = 1|x), that is, we care about ∂p/ ∂x • For the linear case, this is easily computed as the coefficient on x • P(y = 1|x1) = b0 + bx1and ∂p/ ∂x =b • suppose b = .05 and x1 is a variable measured in $. How do we interpret the coefficient on x1?

  9. Interpretation of Probits and Logits (continuous variables) • For the nonlinear probit and logit models with continuous explanatory variables: • ∂p(x)/ ∂xj = g(b0 +xb)bj, where g(z) is dG/dz • This derivative shows that the relative effects of any two continuous explanatory variables do not depend on x: the ratio of the partial effects for xj and xh is • ∂p(x)/ ∂xj = g(b0 +xb)bj =bj ∂p(x)/ ∂xh g(b0 +xb)bhbh

  10. Interpretation of Probits and Logits (binary variables) • Consider the case when you want to evaluate the effect of a job training program on employment (y is binary = 1 if person is employed, zero otherwise and x = 1 if person went to a job training program, zero otherwise). Note: ignore endogeneity of job training program for now • In a linear model, we know just to look at the coefficient on x to obtain a measurement of the effect of job training program • y = b0 + bx • In the probit case, we need to evaluate • G(b0 + b) - G(b0) • What is the interpretation? • In the logit case, we need to evaluate • exp(b0 + b)/[1+ exp(b0 + b)] - exp(b0)/[1+ exp(b0)] • What is the interpretation?

  11. Interpretation (cont.) • Note that knowing the sign of b is enough to know if the effect is positive or negative, but to find the magnitude of the effect, you have to estimate the equations on the previous slide. • With continuous x’s, to compare the magnitude of effects, need to calculate the derivatives, say at the means • Clear that it’s incorrect to just compare the coefficients across the three models (LPM, Logit and Probit) • Can compare sign and significance (based on a standard t test) of coefficients, though • A rule of thumb is to divide probit estimates by 2.5 and logit estimates by 4 to make them comparable to LPM estimates.

  12. Interpretation (cont.) • The biggest difference between the LPM model and the logit/probit models is that the LPM assumes a constant marginal effect, whereas the logit/probit take into account the specific values of the x variables

  13. Maximum Likelihood Estimation (MLE) • Because of the nonlinear nature of these models, OLS is no longer appropriate • The idea behind ML is to provide a way of choosing an asympotically efficient estimator for a set of parameters

  14. Some intuition on ML* • Suppose you have a coin and you want to estimate the probability (call it “p”) that it lands on Heads. You are told that this probability is either 1/2 or 9/10. You toss the coin 10 times and obtain the following sequence: HTHHHTHTHH (7 heads out of 10). What is the maximum likelihood estimate of p? *Credit to Daniele Passerman for this example

  15. Intuition on ML, cont • If the true value of p were 1/2, what is the probability (the likelihood) of observing the sequence that we actually observed? What is the likelihood if p = 9/10? Which of these two values of p is more likely? • The calculations are simple: we just need to use the formula for calculating the probability of observing a given number of successes in a sequence of n Bernoulli trials where the probability of success is equal to p. Let y = 1 if the outcome of trial i was a success and 0 otherwise. The probability of observing k successes is:

  16. Intuition on ML, cont • Plugging in our potential candidates, we get:

  17. Intuition on ML, cont • Thus, it is more likely that the true probability was p = 1/2. This is our MLE. • This simple example shows what we need for ML estimation: • Function for telling us the probability of observing what we actually observe for a given value of the parameters of interest • The parameter space (in our example it was 1/2 and 9/10.) • Then, all we have to do is find the value in the parameter space that maximizes the likelihood function

  18. MLE • Suppose we have a random sample of size n. To obtain the MLE, conditional on the explanatory variables, we need the density of yi given xi: • f(y|xi;b) = [G(xi b)]y[1-G(xi b)]1-y, y = 0,1 • When y = 1, we get • [G(xi b)] • When y = 0, we get • [1-G(xi b)] • The log-likelihood function for observation i is a function of the parameters and the data (xi ,y) and is obtained by taking the log of the equation above: • li(b) = yilog[G(xi b)]+(1-yi)log[1- G(xi b)]

  19. MLE • Because G(.) is between 0 and 1 for logit and probit models, li(b) is well defined for all values of b. • The log-likelihood for a sample size of n is obtained by summing up li(b) across all observations: • L(b) = Σili(b) • We find the b that maximizes this function • Stata Example 8-1

  20. The Likelihood Ratio Test • Unlike the LPM, where we can compute F statistics or LM statistics to test exclusion restrictions, we need a new type of test • Maximum likelihood estimation (MLE), will always produce a log-likelihood, L • Because the MLE maximizes the log-likelihood function, dropping variables generally leads to a smaller log-likelihood. • Just as in an F test, you estimate the restricted and unrestricted model, then form • LR = 2(Lur – Lr) ~ c2q • This will always be positive • Stata Example 8-1, cont.

  21. Goodness of Fit • Unlike the LPM, where we can compute an R2 to judge goodness of fit, we need new measures of goodness of fit • One possibility is a pseudo R2 based on the log likelihood and defined as 1 – Lur/Lr • Can also look at the percent correctly predicted • if predict a probability >.5 then that matches y = 1 and vice versa • Stata Example 8-1, continued

More Related