Regression Analysis

Regression Analysis • Regression Analysis • Set of statistical techniques that quantify the dependence of a given economic variable on one or more other variables. • Most common technique – ordinary least squares (OLS) regression. • STEPS: • Collect data on variables in question • Specify form of the equation relating the variables • Estimate equation coefficients • Evaluate accuracy of the equation • Interpret results in economic context.

Regression Analysis • Cross-sectional data • Observations at the same time period in different areas, regions, or markets • Example – grade in ECON 3125 as a function of GPA, time studying, attendance, age, junior vs senior, major, etc. • Time-series data • Observations in the same area or market over different time periods • Example – economic growth as a function of income, unemployment rates, K stock, population, education levels.

Plotting observations implies a negative relationship between price and quantity (which makes sense). But where do we draw the demand curve?

OLS attempts to minimize the differences between the actual values of the observations and the estimated equation. SSE (sum of squared errors) quantifies positive and negative differences so we can seek minimum.

Regression Analysis Step 2: Specify form of the equation relating the variables Translate points in scatter plot into the form of a linear demand equation Q = a + bP We are looking for values of a (constant intercept) and b (slope, negative sign expected). Left hand variable – dependent (the one being explained) Right hand variables – independent, or explanatory (the ones doing the explaining)

Regression Analysis Step 2: Specify form of the equation relating the variables Multiple regression Q = a + bP + b1Y + b2Pop + b2P2 + b3Age Price is not the only factor that impacts Q, so a more precise demand equation must include other explanatory variables. “Least squares” will be achieved by minimizing the unexplained portion of the variance from the average.

Regression Analysis • Step 3 • Estimate equation coefficients (regression software) • Q = 28.84 – 2.12P + 3.09Y + 1.03P2 • Example: The data on pg 151 yields this regression equation. • Each $1 increase in price will decrease Q by 2.12 units • Each $1 increase in income will increase Q by 3.09 units (normal or inferior good?) • Each $1 increase in the competitor’s price will increase Q by 1.03 units (sub or comp?)

Step 4: Evaluate accuracy of the equation

N = number of lines of data in the dataset. Generally, the larger the dataset, the better the results.

N – K, where K = # of estimated coefficients. Number of possible permutations you have available to you.

For example, imagine you have four numbers (a, b, c and d) that must add up to a total of m; you are free to choose the first three numbers at random, but the fourth must be chosen so that it makes the total equal to m - thus your degree of freedom is three.

Explanatory variable names

Coefficient estimates for equation.

Also called: “goodness of fit” “Coefficient of determination”

Proportion of variation in Q explained by the regression. A perfect “fit” would yield R2 = 1. If the equation explains nothing, R2 = 0.

Value is sensitive to K, so adding more independent variables typically increases R2, even if they have no explanatory power.

Regular R2 adjusted for degrees of freedom. Removes sensitivity to K. More accurate measure of goodness of fit, always less than R2.

Tests overall statistical significance of the equation, not just each variable, but group of variables. Must be compared to critical value in table (higher = better).

If F-stat is greater than critical value, we can reject hypothesis of zero coefficients at specified confidence level (here at 95%) and say equation has explanatory power.

If F-stat is greater than critical value, we can reject hypothesis of zero coefficients at specified confidence level and say equation has explanatory power.

Standard Error (Standard Deviation) of the Coefficient There is a 95% chance that the true coefficient lies within two standard errors of the estimated coefficient. Example: Our price coefficient estimate is -2.12, with a standard error of .34. Two times the standard error is.68. There is roughly a 95% chance that the true coefficient lies in the range of -2.12 plus or minus .68, or between -2.80 and -1.44.

Standard deviation of the estimated coefficient. The lower the standard error, the more accurate the estimate.

Coefficient estimate divided by the standard error. Tells us how many standard errors the estimate is from zero. Compared to critical value in table.

The t-stat tells us whether the estimated coefficient is statistically significant, or statistically different from zero.

Regression Analysis