Regression and Correlation BUSA 2100, Sect. 12.0 - 12.2, 3.5
Introduction to Regression • Forecasts (predictions) are often based on the relationship between 2 or more variables. • Ex. 1: Advertising expenditures and sales. • Example 2: Daily high temperature and demand for electricity. • X = independent variable, the variable being used to make a forecast; Y = dependent variable, the variable being forecasted. • Identify X and Y in Examples 1 and 2. • Y depends on X.
Straight Lines • A regression line can be used to show mathematically how variables are related.
Regression Example • To determine the equation of a line, all we need are the slope and Y-intercept. • Example: Pizza House builds restaur-ants near college campuses. • Before building another one, it plans to use X = student enrollment (1000s) to estimate Y = quarterly sales ($1000s). • A sample of 6 existing restaurants is chosen.
Pizza Restaurant Problem • Resulting data pairs are shown below. • X Y 4 95 6 155 9 140 11 210 12 250 15 260
Scatter Diagram & Line of Best Fit • Draw a scatter diagram on the board. • Use a hiatus so that the X, Y axes don’t have to begin at zero. All units must be the same size within axes. • By trial and error, draw some lines through the data. The regression line is the one line that fits the data best. (Also called the line of best fit.)
Line of Best Fit (Continued) • As indicated earlier, YF is a forecasted value (on the regression line). Y is an actual value (one of the dots).
Regression Formulas • Based on calculus, the equation of a regression line (line of best fit) can be found using these formulas.
Regression Formulas, Page 2 • Carry out the numerical coefficients (b1 and b0) 3 or 4 decimal places; then round to 2 or fewer places at the end. • Substitute the numbers into the regression equation: YF = b0 + b1X. • We will complete the restaurant prob-lem, using a table to organize the data.
Restaurant Problem, Page 2 • X Y XY X2 Y2 4 95 380 16 9025 6 155 930 36 24025 9 140 1260 81 19600 11 210 2310 121 44100 12 250 3000 144 62500 15 260 3900 225 67600 SUM 57 1110 11780 623 226850
Meaning and Uses of the Regression Equation • Example: Vidalia State University has an enrollment of 9,800. Forecast pizza sales for a restaurant near the campus.
Accuracy of Forecasts Using Regression • The accuracy of forecasts depends on how closely the points in a scatter diagram fit the regression line. • If the linear relationship is too weak (the deviations are too large), there are large forecast errors and there may be no need to pursue use of a regression line.
Evaluating Accuracy of Regression Forecasts • It is best to have an estimate of forecast accuracy before using a regression line. • 3 ways to estimate forecast accuracy:
Introduction to Correlation • Def.: The coefficient of correlation (r) is a numerical measure of the strength of the linear relationship between 2 variables. • Values of r are always between -1 & 1; i.e., between 0 and 1 in absolute value. • r = 0 means no correlation; r = +-1 means perfect correlation; both rare.
Positive and Negative Correlation • Definition:Two variables X, Y have a positive correlation if large values of X tend to be associated with large values of Y; similarly, for small values. • X, Y must be measurable quantitatively. • Example of positive correlation:
Positive and Negative Correlation, Page 2 • Definition:Two variables X, Y have a negative correlation if large values of X tend to be associated with small values of Y, and vice-versa. • Example of negative correlation: • Graph positive and negative correlation.
High and Low Correlation • General guidelines: Degree of Forecast Correlation Accuracy • very high very good high good moderate medium low fair very low poor
Formula for Correlation • Use regression for forecasts only if r is .70 or larger, in absolute value.
Regression Analysis Summary • Steps in regression analysis: • (1) Collect data pairs, using 2 related variables. • (2) Calculate the correlation, r. • (3) (a) If r >= .70, in absolute value, find the regression equation and use it for forecasting. • (b) If r < .70, don’t use regression.
Multiple Regression • Regression analysis with one independ-ent variable (X) is called simple regres-sion. • Regression analysis with 2 or moreindependent variables (X1, X2, etc.) is called multiple regression.
Multiple Regression and Line of Average Relationship • State the multiple regression equation. • A regression equation is also called the line of average relationship. Explain in terms of GPA example. • Correlation does not necessarily imply cause and effect. Illustrate with example.