Download Presentation
## chapter six

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**chapter six**Basic Ideas of Linear Regression: The Two-Variable Model**Regression Analysis**• Study of the relationships between a dependent variable (Y) and one or more independent or explanatory variables (X1, X2,…) • Regression does not necessarily imply causation. Causation must be inferred from the theory underlying the phenomenon that is tested empirically.**Objectives of Regression Analysis**• Estimate the mean of Y given the X values, or E(Y|X) • Test hypotheses about the nature of the dependence (is the price elasticity of demand = -1.0?) • To predict or forecast the mean value of Y given values of X beyond the sample range. • Two or more of these combined.**Example**• How much money do people at different income levels spend on NY state Lotto each week? • Let Y represent weekly expenditure on Lotto • Let X represent weekly personal disposable income • Assume a population of 100 Lotto players divided into 10 income classes, 10 players in each class • See Table 6-1 and scatter diagram Fig. 6-1.**Table 6-1**Weekly lotto expenditure in relation to weekly personal disposable income.**Figure 6-1**Weekly expenditure on Lotto ($) and weekly personal disposable income ($).**Population Regression Line**• The circled values in Fig. 6-1 are the mean values of Y for each X • Called conditional mean values or conditional expected values • Connect the various conditional mean values of Y and the resulting line is the population regression line (PRL) • The PRL gives the mean value of the dependent variable for each value of the independent variable.**Population Regression Function**• Since the PRL is approx. linear it can be expressed mathematically as • E(Y|Xi) = B1 + B2Xi . • This is the population regression function • See the conditional mean values in the last row of Table 6-1 • The regression of Y on X is the mean of the distribution of Y values corresponding to the given X. • The PRL is a line that passes through the conditional means of Y.**Conditional Regression Analysis**• In the PRF above, B1 and B2 are the parameters or regression coefficients • B1 is called the intercept (coefficient) • B2 is the slope coefficient and measures the rate of change in the conditional mean of Y per unit change in X • This is conditional regression analysis – behavior of Y conditional on given values of X – commonly called just regression analysis • In this context, E(Y) means E(Y|X).**Statistical or Stochastic Specification**• Note in Table 6-1 that the mean Lotto expenditure may be $20.90 at a PDI of $150 • BUT individual customer’s expenditures range from $12 to $33 • An individual’s expenditure may be expressed as the group average plus or minus a quantity • Yi = B1 + B2Xi + ui • Where ui is the stochastic, or random, error term, a random variable.**Stochastic PRF**• (B1 + B2Xi) the systematic or deterministic component • ui the nonsystematic or random component, sometimes called the noise component • Influence of left-out variables • Inherent randomness in human behavior • Errors in measurement • Ockham’s Razor – intentionally leave out variables if the effects are too small or too unsystematic so that the effects are left in the error term**The Sample Regression Function**• How do we estimate the PRF with sample data? • Table 6-2: a random sample from Table 6-1. • Notice we have only one Y value for each X value • Sampling fluctuations or sampling error • Undermine our ability to estimate the PRF • Suppose we have another random sample (Table 6-3) and plot data from both samples (Figure 6-3).**Table 6-2**A random sample from Table 6-1.**Table 6-3**Another random sample from Table 6-1.**Figure 6-3**Sample regression lines based on twoindependent samples**Sample Regression Function**• The SRLs plotted in Fig. 6-3 are different and likely not the same as the PRL • More samples give us more SRLs, all different • Analogous to the PRF is the sample regression function**Sample Regression Functions**• Yi “hat” is the estimator of E(Y|Xi) • b1 is the estimator of B1 • b2 is the estimator of B2 • Stochastic SRF → → • ei, called the residual, is the estimator of ui, the random error**Objective**• Estimate the PRF on the basis of the SRF • What procedure or method will make the approximation (SRF) as close as possible to the PRF? • Remember we do not observe B1, B2, and ui as inFigure 6-4**Figure 6-4**The population and sample regression lines.**“Linear” Regression**• Linearity in the variables • The conditonal mean of the dependent variable is a linear function of the independent variables • A function Y = f(X) is linear if • X appears with a power of 1 only (no X2 or √X) • X is not multiplied or divided by another variable • For regression models • The rate of change in the dependent variable for a unit change in the explanatory variable remains constant • Or the slope of Y in X is constant (Fig. 6-5)**Figure 6-5**(a) Linear demand curve; (b) nonlinear demand curve.**“Linear” Regression**• Linearity in the parameters • The conditional mean of the dependent variable is a linear function of the parameters • A function is linear in the parameter B2, if B2 appears with a power of 1 only • For our purposes, linear regression means a regression that is linear in the parameters, but not necessarily linear in the explanatory variables.**Multiple Regression**• The dependent variable is a function of more than one explanatory variable**Estimation of Parameters**• Method of Ordinary Least Squares • Estimate the PRF from the SRF → • Choose b1, b2 so that e is as small as possible • In OLS, choose b1, b2, to minimize the residual sum of squares (RSS), ∑ei2**Properties of OLS Estimators**• The SRF passes through the sample mean values of X and Y • The mean of the residuals is zero, ∑ei/n = 0 • X and e are uncorrelated, ∑eiXi = 0 • Similarly**Calculate b1 and b2**• See Table 6-4 • This yields → • See Fig. 6-6 • If PDI goes up by $1, then Lotto expenditure goes up by $0.0814 • If PDI = 0, Lotto expenditure ≈ $7.62?**Table 6-4**Raw data (from Table 6-2) for lotto.**Figure 6-6**Regression line based on data from Table 6-4.**Table 6-5**Average hourly wage by education.**Figure 6-7**S&P 500 composite index and three-month Treasury bill rate.**Table 6-6**Median home price (MHP) and mortgage interest rate (INT) in metropolitan New York area, 1994-2003.**Figure 6-8**Median home prices and interest rates.**Table 6-7**Hypothetical data on weekly consumption expenditure and weekly income.**Table 6-8**Consumer price index (CPI) and S&P 500 index (S&P), United States, 1978-1989.**Table 6-9**Nominal interest rate (Y) and inflation (X) in nine industrial countries for the year 1988.**Table 6-10**Consumer price index (CPI) and S&P 500 index (S&P), United States, 1990-2001.**Table 6-11**Selected data on top business schools in the United States.**Table 6-12**Real gross domestic product and civilian unemployment rate, United States, 1970-1999.**Table 6-13**S&P 500 index (S&P) and three-month Treasury bill rate(3-M T Bill) 1980-1999.**Table 6-14**Auction data on price, age of clock and number of bidders.**Table 6-15**Mean scholastic aptitude test (S.A.T.) verbal and math scores for college-bound seniors, 1967-1990.