Linear Regression William P. Wattles, Ph.D. Psychology 302
Correlation • Teen birth rate correlated with our composite religiosity variable with r = 0.73; 95% CI (0.56,0.84); n = 49; p < 0.0005. Thus teen birth rate is very highly correlated with religiosity at the state level, with more religious states having a higher rate of teen birth. A scatter plot of teen birth rate as a function of religiosity is presented in Figure 1.http://www.reproductive-health-journal.com/content/6/1/14
“Victor, when will you stop trying to remember and start trying to think?” --Helen Boyden
You can use linear regression to answer the following questions about the pattern of data points and the significance of a linear equation: • 1. Is a pattern evident in a set of data points? • 2. Does the equation of a straight line describe this pattern? • 3. Are the predictions made from this equation significant?
Dependent and Independent Variables • Dependent Variable-or Criterion Variable The variable whose variation we want to explain. • Independent Variable-or Predictor Variable A variable that is related to or predicts variation in the dependent variable.
Examples • SAT score, college GPA • Alcohol consumed, score on a driving test • type of car, Qualifying speed • level of education, Income • Number of boats registered, deaths of manatees
Correlation • The relationship between two variables X and Y. • In general, are changes in X associated with Changes in Y? • If so we say that X and Y covary. • We can observe correlation by looking at a scatter plot.
Correlation example • Is number of beers consumed associated with blood alcohol level?
Correlation • Correlation coefficient tells us the strength and direction of the relationship between two variables.
Prediction • If two variables are related then knowing a value for one should allow us to predict the value of the other.
Regression • Allows us to predict one variable based on the value of another.
Regression • Using knowledge of the relationship between X and Y to predict Y given X. • X the independent variable (predictor) used to explain changes in Y • Y the dependent variable (criterion)
Linear regression • Regression line-a straight line through the scatter plot that best describes the relationship. • Regression line-predicts the value of Y for a given value of X.
Regression Line • A straight line that describes how a dependentvariable changes as the independentvariable changes.
Least squares regression. • A method of determining the regression line that minimizes the errors (residuals)
Least squares regression • residual is the error or the amount that the observed observation deviates from the regression line. • goal to find a solution that minimizes the squared residuals • Least squares (the smallest possible sum of the squared residuals)
Least squares regression. • a is the intercept the value of y when X=0 • b is the slope the rate of change in Y when X increases by 1
Regression formula • a=Ybar-bXbar • b=sum of deviation products/sum of Xdev squared
The Regression Equation • x-the independent variable, the predictor • y-the dependent variable, what we want to predict • a-the intercept • b-the slope
Population Sample βBeta Slope α Alpha Intercept b Slope a Intercept
Relationship • The scatterplot suggests a relationship between crying and IQ. • Can use knowledge of crying to predict IQ
Steps to Analyze Regression Data • Plot and interpret • Numerical summary • Mathematical model
Plot and Interpret • Plot independent variable on the X axis • Plot dependent variable on the Y axis. • Examine form, direction and strength of relationship
Correlation coefficient tells direction and strength of relationship. r = +.455 Numerical Summary
r squared • r2 percent of variance in Y explained by X. • =21%
Use model to predict IQ based on knowledge of crying Least Squares regression line. Y predict=a + bx a(the intercept) =91.27 b the slope = 1.493 Mathematical Model
The slope and intercept are statistics because they are calculated on the sample. We are really interested in estimating the population parameters Sample Statistics PopulationParameter Sample Statistic
Residuals • Residuals-The difference between the observed value of the dependent variable and and value predicted by the regression line.
Coefficient of determination • R2 the square of the correlation coefficient. • The amount of the variation in Y that can be explained by changes in X
Regression and correlation • correlation tells us about the relationship • regression allows us to predict Y if we know X
Serotonin • 5-HT levels predict mood in healthy males. • SSRI, Zoloft, Prozac
Privitera page 531 • Do levels of serotonin predict positive mood in subjects?