1 / 87

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01. Professor William Greene Stern School of Business IOMS Department Department of Economics. Part 3 – Estimation Theory. Estimation. Nonparametric population features Mean - income

yetty
Télécharger la présentation

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01 Professor William Greene Stern School of Business IOMS Department Department of Economics

  2. Part 3 – Estimation Theory

  3. Estimation • Nonparametric population features • Mean - income • Correlation – disease incidence and smoking • Ratio – income per household member • Proportion – proportion of ASCAP music played that is produced by Dave Matthews • Distribution – histogram and density estimation • Parameters • Fitting distributions – mean and variance of lognormal distribution of income • Parametric models of populations – relationship of loan rates to attributes of minorities and others in Bank of America settlement on mortgage bias

  4. Measurements as Observations Population Measurement Theory Characteristics Behavior Patterns Choices The theory argues that there are meaningful quantities to be statistically analyzed.

  5. Application – Health and Income German Health Care Usage Data, 7,293 Households, Observed 1984-1995Data downloaded from Journal of Applied Econometrics Archive. Some variables in the file are DOCVIS = number of visits to the doctor in the observation periodHOSPVIS = number of visits to a hospital in the observation period HHNINC =  household nominal monthly net income in German marks / 10000. (4 observations with income=0 were dropped)HHKIDS = children under age 16 in the household = 1; otherwise = 0EDUC =  years of schooling AGE = age in years PUBLIC = decision to buy public health insurance HSAT = self assessed health status (0,1,…,10)

  6. Observed Data

  7. Inference about Population Population Measurement Characteristics Behavior Patterns Choices

  8. Classical Inference The population is all 40 million German households (or all households in the entire world).The sample is the 7,293 German households in 1984-1995. Population Measurement Sample Characteristics Behavior Patterns Choices Imprecise inference about the entire population – sampling theory and asymptotics

  9. Bayesian Inference Population Measurement Sample Characteristics Behavior Patterns Choices Sharp, ‘exact’ inference about only the sample – the ‘posterior’ density is posterior to the data.

  10. Estimation of Population Features • Estimators and Estimates • Estimator = strategy for use of the data • Estimate = outcome of that strategy • Sampling Distribution • Qualities of the estimator • Uncertainty due to random sampling

  11. Estimation • Point Estimator: Provides a single estimate of the feature in question based on prior and sample information. • Interval Estimator: Provides a range of values that incorporates both the point estimator and the uncertainty about the ability of the point estimator to find the population feature exactly.

  12. ‘Repeated Sampling’ - A Sampling Distribution • The true mean is 500. Sample means vary around 500, some quite far off. • The sample mean has a sampling mean and a sampling variance. • The sample mean also has a probability distribution. Looks like a normal distribution. This is a histogram for 1,000 means of samples of 20 observations from Normal[500,1002].

  13. Application: Credit Modeling • 1992 American Express analysis of • Application process: Acceptance or rejection; X = 0 (reject) or 1 (accept). • Cardholder behavior • Loan default (D = 0 or 1). • Average monthly expenditure (E = $/month) • General credit usage/behavior (Y = number of charges) • 13,444 applications in November, 1992

  14. 0.7809 is the true proportion in the population of 13,444 we are sampling from.

  15. Estimation Concepts • Random Sampling • Finite populations • i.i.d. sample from an infinite population • Information • Prior • Sample

  16. Properties of Estimators

  17. Unbiasedness The sample mean of the 100 sample estimates is 0.7844.The population mean (true proportion) is 0.7809.

  18. Consistency N=144 .7 to .88 N=1024 .7 to .88 N=4900 .7 to .88

  19. Competing Estimators of a Parameter Bank costs are normally distributed with mean . Which is a better estimator of , the mean (11.46) or the median (11.27)?

  20. Interval estimates of the acceptance rateBased on the 100 samples of 144 observations

  21. Methods of Estimation • Information about the source population • Approaches • Method of Moments • Maximum Likelihood • Bayesian

  22. The Method of Moments

  23. Estimating a Parameter • Mean of Poisson • p(y)=exp(-λ) λy / y!, y = 0,1,…; λ > 0 • E[y]= λ. • E[(1/N)Σiyi]= λ. This is the estimator • Mean of Exponential • f(y) = exp(-y), y > 0; > 0 • E[y] = 1/. • E(1/N)Σiyi = 1/. • 1/{(1/N)Σiyi} is the estimator of 

  24. Mean and Variance of a Normal Distribution

  25. Proportion for Bernoulli • In the AmEx data, the true population acceptance rate is 0.7809 =  • Y = 1 if application accepted, 0 if not. • E[y] =  • E[(1/N)Σiyi] = paccept = . • This is the estimator

  26. Gamma Distribution

  27. Method of Moments (P) = (P) /(P) = dlog (P)/dP

  28. Estimate One Parameter • Assume  known to be 0.1. • Estimate P • E[y] = P/  = P/.1 = 10P • m1 = mean of y = 31.278 • Estimate of P is 31.278/10 = 3.1278. • One equation in one unknown

  29. Application

  30. Method of Moments Solutions create ; y1=y ; y2=log(y) ; ysq=y*y$ calc ; m1=xbr(y1) ; mlog=xbr(y2); m2=xbr(ysq) $ Minimize; start = 2.0, .06 ; labels = p,l ; fcn= (m1 - p/l)^2 + (mlog – (psi(p)-log(l)))^2 $ ---------------------------------------------------- P| 2.41074 L| .07707 --------+------------------------------------------- Minimize; start = 2.0, .06 ; labels = p,l ; fcn= (m1 - p/l)^2 + (m2 – p*(p+1)/l^2 )^2 $ --------+------------------------------------------- P| 2.06182 L| .06589 --------+-------------------------------------------

  31. Properties of MoM estimator • Unbiased? Sometimes, e.g., normal, Bernoulli and Poisson means • Consistent? Yes by virtue of Slutsky Theorem • Assumes parameters can vary continuously • Assumes moment functions are continuous and smooth • Efficient? Maybe – remains to be seen. (Which pair of moments should be used for the gamma distribution?) • Sampling distribution? Generally normal by virtue of Lindeberg-Levy central limit theorem and the Slutsky theorem.

  32. Estimating Sampling Variance • Exact sampling results – Poisson Mean, Normal Mean and Variance • Approximation based on linearization • Bootstrapping – discussed later with maximum likelihood estimator.

  33. Exact Variance of MoM • Estimate normal or Poisson mean • Estimator is sample mean = (1/N)i Yi. • Exact variance of sample mean is1/N * population variance.

  34. Linearization Approach – 1 Parameter

  35. Linearization Approach – 1 Parameter

  36. Linearization Approach - General

  37. Exercise: Gamma Parameters • m1 = 1/N yi=> P/ • m2 = 1/N yi2=> P(P+1)/ 2 1. What is the Jacobian? (Derivatives) 2. How to compute the variance of m1, the variance of m2 and the covariance of m1 and m2? (The variance of m1 is 1/N times the variance of y; the variance of m2 is 1/N times the variance of y2. The covariance is 1/N times the covariance of y and y2.)

  38. Sufficient Statistics

  39. Sufficient Statistic

  40. Sufficient Statistic

  41. Sufficient Statistics

  42. Gamma Density

  43. Rao Blackwell Theorem • The mean squared error of an estimator based on sufficient statistics is smaller than one not based on sufficient statistics. • We deal in consistent estimators, so a large sample (approximate) version of the theorem is that estimators based on sufficient statistics are more efficient than those that are not.

  44. Maximum Likelihood • Estimation Criterion • Comparable to method of moments • Several virtues: Broadly, uses all the sample and nonsample information available  efficient (better than MoM in many cases)

  45. Setting Up the MLE The distribution of the observed random variable is written as a function of the parameter(s) to be estimated P(yi|) = Probability density of data | parameters. L(|yi) = likelihood of parameter | data The likelihood function is constructed from the density Construction: Joint probability density function of the observed sample of data – generally the product when the data are a random sample. The estimator is chosen to maximize the likelihood of the data (essentially the probability of observing the sample in hand).

  46. Regularity Conditions • What they are • 1. logf(.) has three continuous derivatives wrt parameters • 2. Conditions needed to obtain expectations of derivatives are met. (E.g., range of the variable is not a function of the parameters.) • 3. Third derivative has finite expectation. • What they mean • Moment conditions and convergence. We need to obtain expectations of derivatives. • We need to be able to truncate Taylor series. • We will use central limit theorems • MLE exists for nonregular densities (see text). Questionable statistical properties.

  47. Regular Exponential Density Exponential density f(yi|)=(1/)exp(-yi/) Average time until failure, , of light bulbs. yi = observed life until failure. Regularity (1) Range of y is 0 to  free of  (2) logf(yi|) = -log  – y/ ∂logf(yi|)/∂ = -1/ + yi/2 E[yi]= , E[∂logf()/∂]=0 (3) ∂2logf(yi|)/∂2 = 1/2 - 2yi/3 finite expectation = -1/2 (4) ∂3logf(yi|)/∂3 = -2/3 + 6yi/4 has finite expectation = 4/3 (5) All derivatives are continuous functions of 

  48. Likelihood Function • L()=Πif(yi|) • MLE = the value of  that maximizes the likelihood function. • Generally easier to maximize the log of L. The same  maximizes log L • In random sampling, logL=i log f(yi|)

  49. Poisson Likelihood log and ln both mean natural log throughout this course

More Related