200 likes | 373 Vues
Analysis Tools (TPTM6496). Matthew Beck matthewb@itls.usyd.edu.au Consultation by appointment. Introduction to Regression. Tutorial Outline: What is Regression? Regression in SPSS Assumptions of Regression Testing Assumptions. Introduction to Regression. What is regression?
E N D
Analysis Tools (TPTM6496) Matthew Beck matthewb@itls.usyd.edu.au Consultation by appointment
Introduction to Regression • Tutorial Outline: • What is Regression? • Regression in SPSS • Assumptions of Regression • Testing Assumptions
Introduction to Regression • What is regression? • It allows us to measure the amount of variation in one dependent variable, using the variation in a number of known independent variables. • It fits a straight line to best estimate the data. • It is a predictive tool.
Introduction to Regression • An example: • Can we explain variations in student grades by examining how many hours they study? Dependent Variable Independent Variable
Introduction to Regression (y) Grade What does this line remind you of? 80 70 60 50 40 30 20 Gradient formula? y = mx + b “m” measures the gradient, i.e., in what way does y change when x changes? (x) Hours of Study 1 2 3 4 5
Introduction to Regression • Looking at the math: • We want to solve: grade = b + m(study) • Or in other words: + e 0 + 1 Y = X
Introduction to Regression • The formulas: = = 10.06 = 20.65
Introduction to Regression • Our regression model becomes: • We can use this model to make predictions about peoples grades! Grade = 20.65 + 10.06(hours of study)
Introduction to Regression • We can see how regression works. • We know what it does. • What if the model was more complicated? Sales = 0 + 1($ ad) + 2(# staff) + 3(colour) • This is why we use SPSS!
Introduction to Regression • Open the Program. • Enter the “grades & study” data. • Analyze – Regression – Linear • See how much easier that is? • Open up the Sofa-World dataset.
Introduction to Regression • Assumption 1 - Linear Relationship: • Regression can only model a straight line relationship between the dependent and independent variable(s). • Scatterplot: • Sales vs. Price • Sales vs. Parts • What about Sales vs. Colour – why is colour different?
Introduction to Regression • Examples of nonlinear relationships:
Introduction to Regression • What about this?
Introduction to Regression • Assumption 2: No Multicollinearity: • Independent variables cannot be correlated with each other. • Scatterplot: • Price vs. Parts • What do you see? • What should we see?
Introduction to Regression • Assumption 3: Homoscedaticity: • The variance of the error term must be constant.
Introduction to Regression • Assumption 4: Normal error term: • The error term must follow a normal distribution (i.e., follow a bell curve):
Introduction to Regression • Check for linear relationship and multicollinearity before analysis. • Check for homoscedaticity and normality using commands in regression function. Analyze – Regression – Linear – Plots • Graph ZRESID(ual) vs ZPRED(icted). • Select P-P plot and Histogram
Introduction to Regression • Linearity of Relationship • No Multicollinearity • Homoscedaticity • Normal Distribution • Independence of the variance term
Introduction to Regression • Run the regression using price and colour as the dependent variables. • Lets interpret some output. • Write out the regression model. • How can we use this model? • Remember, the important component of data analysis is providing relevant information.
Introduction to Regression • Significance testing: • p-value/sig-value = probability of the result occuring, given the null is true. • Basically, the smaller the value, the more unlikely it is the result actually occurred. • The more unlikely the result is, the more we pay attention to it…it is a special result…it is a significant result. • Normally we set , or LOS = 0.05. • If sig < 0.05 we have a significant result (we reject the null).