chapter six Basic Ideas of Linear Regression: The Two-Variable Model
Regression Analysis • Study of the relationships between a dependent variable (Y) and one or more independent or explanatory variables (X1, X2,…) • Regression does not necessarily imply causation. Causation must be inferred from the theory underlying the phenomenon that is tested empirically.
Objectives of Regression Analysis • Estimate the mean of Y given the X values, or E(Y|X) • Test hypotheses about the nature of the dependence (is the price elasticity of demand = -1.0?) • To predict or forecast the mean value of Y given values of X beyond the sample range. • Two or more of these combined.
Example • How much money do people at different income levels spend on NY state Lotto each week? • Let Y represent weekly expenditure on Lotto • Let X represent weekly personal disposable income • Assume a population of 100 Lotto players divided into 10 income classes, 10 players in each class • See Table 6-1 and scatter diagram Fig. 6-1.
Table 6-1 Weekly lotto expenditure in relation to weekly personal disposable income.
Figure 6-1 Weekly expenditure on Lotto ($) and weekly personal disposable income ($).
Population Regression Line • The circled values in Fig. 6-1 are the mean values of Y for each X • Called conditional mean values or conditional expected values • Connect the various conditional mean values of Y and the resulting line is the population regression line (PRL) • The PRL gives the mean value of the dependent variable for each value of the independent variable.
Population Regression Function • Since the PRL is approx. linear it can be expressed mathematically as • E(Y|Xi) = B1 + B2Xi . • This is the population regression function • See the conditional mean values in the last row of Table 6-1 • The regression of Y on X is the mean of the distribution of Y values corresponding to the given X. • The PRL is a line that passes through the conditional means of Y.
Conditional Regression Analysis • In the PRF above, B1 and B2 are the parameters or regression coefficients • B1 is called the intercept (coefficient) • B2 is the slope coefficient and measures the rate of change in the conditional mean of Y per unit change in X • This is conditional regression analysis – behavior of Y conditional on given values of X – commonly called just regression analysis • In this context, E(Y) means E(Y|X).
Statistical or Stochastic Specification • Note in Table 6-1 that the mean Lotto expenditure may be $20.90 at a PDI of $150 • BUT individual customer’s expenditures range from $12 to $33 • An individual’s expenditure may be expressed as the group average plus or minus a quantity • Yi = B1 + B2Xi + ui • Where ui is the stochastic, or random, error term, a random variable.
Stochastic PRF • (B1 + B2Xi) the systematic or deterministic component • ui the nonsystematic or random component, sometimes called the noise component • Influence of left-out variables • Inherent randomness in human behavior • Errors in measurement • Ockham’s Razor – intentionally leave out variables if the effects are too small or too unsystematic so that the effects are left in the error term
The Sample Regression Function • How do we estimate the PRF with sample data? • Table 6-2: a random sample from Table 6-1. • Notice we have only one Y value for each X value • Sampling fluctuations or sampling error • Undermine our ability to estimate the PRF • Suppose we have another random sample (Table 6-3) and plot data from both samples (Figure 6-3).
Table 6-2 A random sample from Table 6-1.
Table 6-3 Another random sample from Table 6-1.
Figure 6-3 Sample regression lines based on twoindependent samples
Sample Regression Function • The SRLs plotted in Fig. 6-3 are different and likely not the same as the PRL • More samples give us more SRLs, all different • Analogous to the PRF is the sample regression function
Sample Regression Functions • Yi “hat” is the estimator of E(Y|Xi) • b1 is the estimator of B1 • b2 is the estimator of B2 • Stochastic SRF → → • ei, called the residual, is the estimator of ui, the random error
Objective • Estimate the PRF on the basis of the SRF • What procedure or method will make the approximation (SRF) as close as possible to the PRF? • Remember we do not observe B1, B2, and ui as inFigure 6-4
Figure 6-4 The population and sample regression lines.
“Linear” Regression • Linearity in the variables • The conditonal mean of the dependent variable is a linear function of the independent variables • A function Y = f(X) is linear if • X appears with a power of 1 only (no X2 or √X) • X is not multiplied or divided by another variable • For regression models • The rate of change in the dependent variable for a unit change in the explanatory variable remains constant • Or the slope of Y in X is constant (Fig. 6-5)
Figure 6-5 (a) Linear demand curve; (b) nonlinear demand curve.
“Linear” Regression • Linearity in the parameters • The conditional mean of the dependent variable is a linear function of the parameters • A function is linear in the parameter B2, if B2 appears with a power of 1 only • For our purposes, linear regression means a regression that is linear in the parameters, but not necessarily linear in the explanatory variables.
Multiple Regression • The dependent variable is a function of more than one explanatory variable
Estimation of Parameters • Method of Ordinary Least Squares • Estimate the PRF from the SRF → • Choose b1, b2 so that e is as small as possible • In OLS, choose b1, b2, to minimize the residual sum of squares (RSS), ∑ei2
Properties of OLS Estimators • The SRF passes through the sample mean values of X and Y • The mean of the residuals is zero, ∑ei/n = 0 • X and e are uncorrelated, ∑eiXi = 0 • Similarly
Calculate b1 and b2 • See Table 6-4 • This yields → • See Fig. 6-6 • If PDI goes up by $1, then Lotto expenditure goes up by $0.0814 • If PDI = 0, Lotto expenditure ≈ $7.62?
Table 6-4 Raw data (from Table 6-2) for lotto.
Figure 6-6 Regression line based on data from Table 6-4.
Table 6-5 Average hourly wage by education.
Figure 6-7 S&P 500 composite index and three-month Treasury bill rate.
Table 6-6 Median home price (MHP) and mortgage interest rate (INT) in metropolitan New York area, 1994-2003.
Figure 6-8 Median home prices and interest rates.
Table 6-7 Hypothetical data on weekly consumption expenditure and weekly income.
Table 6-8 Consumer price index (CPI) and S&P 500 index (S&P), United States, 1978-1989.
Table 6-9 Nominal interest rate (Y) and inflation (X) in nine industrial countries for the year 1988.
Table 6-10 Consumer price index (CPI) and S&P 500 index (S&P), United States, 1990-2001.
Table 6-11 Selected data on top business schools in the United States.
Table 6-12 Real gross domestic product and civilian unemployment rate, United States, 1970-1999.
Table 6-13 S&P 500 index (S&P) and three-month Treasury bill rate(3-M T Bill) 1980-1999.
Table 6-14 Auction data on price, age of clock and number of bidders.
Table 6-15 Mean scholastic aptitude test (S.A.T.) verbal and math scores for college-bound seniors, 1967-1990.