chapter six

chapter six Basic Ideas of Linear Regression: The Two-Variable Model

Regression Analysis • Study of the relationships between a dependent variable (Y) and one or more independent or explanatory variables (X1, X2,…) • Regression does not necessarily imply causation. Causation must be inferred from the theory underlying the phenomenon that is tested empirically.

Objectives of Regression Analysis • Estimate the mean of Y given the X values, or E(Y|X) • Test hypotheses about the nature of the dependence (is the price elasticity of demand = -1.0?) • To predict or forecast the mean value of Y given values of X beyond the sample range. • Two or more of these combined.

Example • How much money do people at different income levels spend on NY state Lotto each week? • Let Y represent weekly expenditure on Lotto • Let X represent weekly personal disposable income • Assume a population of 100 Lotto players divided into 10 income classes, 10 players in each class • See Table 6-1 and scatter diagram Fig. 6-1.

Table 6-1 Weekly lotto expenditure in relation to weekly personal disposable income.

Figure 6-1 Weekly expenditure on Lotto ($) and weekly personal disposable income ($).

Population Regression Line • The circled values in Fig. 6-1 are the mean values of Y for each X • Called conditional mean values or conditional expected values • Connect the various conditional mean values of Y and the resulting line is the population regression line (PRL) • The PRL gives the mean value of the dependent variable for each value of the independent variable.

Population Regression Function • Since the PRL is approx. linear it can be expressed mathematically as • E(Y|Xi) = B1 + B2Xi . • This is the population regression function • See the conditional mean values in the last row of Table 6-1 • The regression of Y on X is the mean of the distribution of Y values corresponding to the given X. • The PRL is a line that passes through the conditional means of Y.

Conditional Regression Analysis • In the PRF above, B1 and B2 are the parameters or regression coefficients • B1 is called the intercept (coefficient) • B2 is the slope coefficient and measures the rate of change in the conditional mean of Y per unit change in X • This is conditional regression analysis – behavior of Y conditional on given values of X – commonly called just regression analysis • In this context, E(Y) means E(Y|X).

Statistical or Stochastic Specification • Note in Table 6-1 that the mean Lotto expenditure may be $20.90 at a PDI of $150 • BUT individual customer’s expenditures range from $12 to $33 • An individual’s expenditure may be expressed as the group average plus or minus a quantity • Yi = B1 + B2Xi + ui • Where ui is the stochastic, or random, error term, a random variable.

Figure 6-2

Stochastic PRF • (B1 + B2Xi) the systematic or deterministic component • ui the nonsystematic or random component, sometimes called the noise component • Influence of left-out variables • Inherent randomness in human behavior • Errors in measurement • Ockham’s Razor – intentionally leave out variables if the effects are too small or too unsystematic so that the effects are left in the error term

The Sample Regression Function • How do we estimate the PRF with sample data? • Table 6-2: a random sample from Table 6-1. • Notice we have only one Y value for each X value • Sampling fluctuations or sampling error • Undermine our ability to estimate the PRF • Suppose we have another random sample (Table 6-3) and plot data from both samples (Figure 6-3).

Table 6-2 A random sample from Table 6-1.

Table 6-3 Another random sample from Table 6-1.

Figure 6-3 Sample regression lines based on twoindependent samples

Sample Regression Function • The SRLs plotted in Fig. 6-3 are different and likely not the same as the PRL • More samples give us more SRLs, all different • Analogous to the PRF is the sample regression function

Sample Regression Functions • Yi “hat” is the estimator of E(Y|Xi) • b1 is the estimator of B1 • b2 is the estimator of B2 • Stochastic SRF → → • ei, called the residual, is the estimator of ui, the random error

Objective • Estimate the PRF on the basis of the SRF • What procedure or method will make the approximation (SRF) as close as possible to the PRF? • Remember we do not observe B1, B2, and ui as inFigure 6-4

Figure 6-4 The population and sample regression lines.

“Linear” Regression • Linearity in the variables • The conditonal mean of the dependent variable is a linear function of the independent variables • A function Y = f(X) is linear if • X appears with a power of 1 only (no X2 or √X) • X is not multiplied or divided by another variable • For regression models • The rate of change in the dependent variable for a unit change in the explanatory variable remains constant • Or the slope of Y in X is constant (Fig. 6-5)

Figure 6-5 (a) Linear demand curve; (b) nonlinear demand curve.

“Linear” Regression • Linearity in the parameters • The conditional mean of the dependent variable is a linear function of the parameters • A function is linear in the parameter B2, if B2 appears with a power of 1 only • For our purposes, linear regression means a regression that is linear in the parameters, but not necessarily linear in the explanatory variables.

Multiple Regression • The dependent variable is a function of more than one explanatory variable

Estimation of Parameters • Method of Ordinary Least Squares • Estimate the PRF from the SRF → • Choose b1, b2 so that e is as small as possible • In OLS, choose b1, b2, to minimize the residual sum of squares (RSS), ∑ei2

OLS Estimators

Properties of OLS Estimators • The SRF passes through the sample mean values of X and Y • The mean of the residuals is zero, ∑ei/n = 0 • X and e are uncorrelated, ∑eiXi = 0 • Similarly

Calculate b1 and b2 • See Table 6-4 • This yields → • See Fig. 6-6 • If PDI goes up by $1, then Lotto expenditure goes up by $0.0814 • If PDI = 0, Lotto expenditure ≈ $7.62?

Table 6-4 Raw data (from Table 6-2) for lotto.

Figure 6-6 Regression line based on data from Table 6-4.

Table 6-5 Average hourly wage by education.

Figure 6-7 S&P 500 composite index and three-month Treasury bill rate.

Table 6-6 Median home price (MHP) and mortgage interest rate (INT) in metropolitan New York area, 1994-2003.

Figure 6-8 Median home prices and interest rates.

Table 6-7 Hypothetical data on weekly consumption expenditure and weekly income.

Table 6-8 Consumer price index (CPI) and S&P 500 index (S&P), United States, 1978-1989.

Table 6-9 Nominal interest rate (Y) and inflation (X) in nine industrial countries for the year 1988.

Table 6-10 Consumer price index (CPI) and S&P 500 index (S&P), United States, 1990-2001.

Table 6-11 Selected data on top business schools in the United States.

Table 6-12 Real gross domestic product and civilian unemployment rate, United States, 1970-1999.

Table 6-13 S&P 500 index (S&P) and three-month Treasury bill rate(3-M T Bill) 1980-1999.

Table 6-14 Auction data on price, age of clock and number of bidders.

Table 6-15 Mean scholastic aptitude test (S.A.T.) verbal and math scores for college-bound seniors, 1967-1990.

chapter six

chapter six

Presentation Transcript

CHAPTER SIX

CHAPTER SIX

Chapter Six

Chapter Six

Chapter Six

Chapter Six

Chapter Six

CHAPTER SIX

Chapter Six

Chapter Six

Chapter Six

Chapter Six

Chapter Six

Chapter Six

Chapter Six

Chapter Six

Chapter Six

Chapter Six

Chapter Six