Understanding Statistical Inference in Management: Regression, Correlation, and Hypothesis Testing

MGMT 276: Statistical Inference in ManagementSpring, 2014 Welcome Green sheets

Please click in My last name starts with a letter somewhere between A. A – D B. E – L C. M – R D. S – Z

For our class Due Tuesday April 29th

Remember… In a negatively skewed distribution: mean < median < mode 97 = mode = tallest point 87 = median = middle score 83 = mean = balance point Frequency Score on Exam Note: Always “frequency” Mean Mode Median Note: Label and Numbers

Readings for next exam (Exam 4: May 1st) Lind Chapter 13: Linear Regression and Correlation Chapter 14: Multiple Regression Chapter 15: Chi-Square Plous Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

Exam 4 – Optional Times for Final • Two options for completing Exam 4 • Thursday (5/1/14) – The regularly scheduled time • Tuesday (5/6/14) – The optional later time • Must sign up to take Exam 4 on Tuesday (4/29) • Only need to take one exam – these are two optional times

Homework due – Thursday (April 24th) • On class website: • Please print and complete homework worksheet #18 • Hypothesis Testing with Correlations

Use this as your study guide Next couple of lectures 4/22/14 Logic of hypothesis testing with Correlations Interpreting the Correlations and scatterplots Simple and Multiple Regression Using correlation for predictions r versus r2 Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent)Coefficient of correlation is name for “r”Coefficient of determination is name for “r2”(remember it is always positive – no direction info)Standard error of the estimate is our measure of the variability of the dots around the regression line(average deviation of each data point from the regression line – like standard deviation) Coefficient of regression will “b” for each variable (like slope)

Regression Example Rory is an owner of a small software company and employs 10 sales staff. Rory send his staff all over the world consulting, selling and setting up his system. He wants to evaluate his staff in terms of who are the most (and least)productive sales people and also whether more sales calls actually result in more systems being sold. So, he simply measures the number of sales calls made by each sales person and how many systems they successfully sold.

50 40 Number of systems sold 30 20 10 0 0 1 2 3 4 Number of sales calls made Ava 70 Emily Regression Example Isabella 60 Do more sales calls result in more sales made? Emma Step 1: Draw scatterplot Ethan Step 2: Estimate r Joshua Jacob Dependent Variable Independent Variable

Regression Example Do more sales calls result in more sales made? Step 3: Calculate r Step 4: Is it a significant correlation?

Do more sales calls result in more sales made? • Step 4: Is it a significant correlation? • n = 10, df = 8 • alpha = .05 • Observed r is larger than critical r • (0.71 > 0.632) • therefore we reject the null hypothesis. • Yes it is a significant correlation • r (8) = 0.71; p < 0.05 Step 3: Calculate r Step 4: Is it a significant correlation?

Regression: Predicting sales Step 1: Draw prediction line r = 0.71 b= 11.579 (slope) a = 20.526 (intercept) Draw a regression line and regression equation What are we predicting?

Regression: Predicting sales Step 1: Draw prediction line r = 0.71 b= 11.579 (slope) a = 20.526 (intercept) Draw a regression line and regression equation

Regression: Predicting sales Step 1: Draw prediction line r = 0.71 b= 11.579 (slope) a = 20.526 (intercept) Draw a regression line and regression equation Interpret slope & intercept

You should sell 32.105 systems Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls Madison Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Joshua If make one sales call Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(1) Y’ = 32.105 What should you expect from a salesperson who makes 1 calls? They should sell 32.105 systems If they sell more  over performing If they sell fewer  underperforming

You should sell 43.684 systems Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls Isabella Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x Jacob If make two sales call Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(2) Y’ = 43.684 What should you expect from a salesperson who makes 2 calls? They should sell 43.68 systems If they sell more  over performing If they sell fewer  underperforming

You should sell 55.263 systems Ava Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls Emma Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x If make three sales call Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(3) Y’ = 55.263 What should you expect from a salesperson who makes 3 calls? They should sell 55.263 systems If they sell more  over performing If they sell fewer  underperforming

You should sell 66.84 systems Regression: Predicting sales Step 1: Predict sales for a certain number of sales calls Emily Step 2: State the regression equation Y’ = a + bx Y’ = 20.526 + 11.579x If make four sales calls Step 3: Solve for some value of Y’ Y’ = 20.526 + 11.579(4) Y’ = 66.842 What should you expect from a salesperson who makes 4 calls? They should sell 66.84 systems If they sell more  over performing If they sell fewer  underperforming

Regression: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels Ava Emma Isabella Emily Madison What should you expect from each salesperson Joshua Jacob They should sell x systems depending on sales calls If they sell more  over performing If they sell fewer  underperforming

Residuals: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels 70-55.3=14.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Ava 14.7 How did Ava do? Ava sold 14.7 more than expected taking into account how many sales calls she made over performing

Residuals: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels 20-43.7=-23.7 Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Ava How did Jacob do? -23.7 Jacob sold 23.684 fewer than expected taking into account how many sales calls he made under performing Jacob

Residuals: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Ava 14.7 Emma Isabella Emily Madison -23.7 Joshua Jacob

Residuals: Evaluating Staff Step 1: Compare expected sales levels to actual sales levels Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Ava 14.7 Emma Isabella -6.8 Emily Madison -23.7 7.9 Joshua Jacob

Does the prediction line perfectly the predicted variable when using the predictor variable? No, we are wrong sometimes… How can we estimate how much “error” we have? Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) 14.7 The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions -23.7 Any Residuals? Perfect correlation = +1.00 or -1.00 Each variable perfectly predicts the other No variability in the scatterplot The dots approximate a straight line

Residual scores How do we find the average amount of error in our prediction Ava is 14.7 Jacob is -23.7 Emily is -6.8 Madison is 7.9 The average amount by which actual scores deviate on either side of the predicted score Step 1: Find error for each value (just the residuals) Y – Y’ Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) Step 2: Add up the residuals Big problem Σ(Y – Y’) = 0 Square the deviations Σ(Y – Y’) 2 How would we find our “average residual”? Square root Σ(Y – Y’) 2 Σx The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions N Divide by df n - 2

How do we find the average amount of error in our prediction Deviation scores Diallo is 0” Preston is 2” Mike is -4” Step 1: Find error for each value (just the residuals) Hunter is -2 Y – Y’ Sound familiar?? Step 2: Find average √ Difference between expected Y’ and actual Y is called “residual” (it’s a deviation score) ∑(Y – Y’)2 n - 2 How would we find our “average residual”? Σx The green lines show how much “error” there is in our prediction line…how much we are wrong in our predictions N

Standard error of the estimate (line) = These would be helpful to know by heart – please memorize these formula

Regression Analysis – Least Squares Principle When we calculate the regression line we try to: • minimize distance between predicted Ys and actual (data) Y points (length of green lines) • remember because of the negative and positive values cancelling each other out we have to square those distance (deviations) • so we are trying to minimize the “sum of squares of the vertical distances between the actual Y values and the predicted Y values”

How well does the prediction line predict the predicted variable when using the predictor variable? What if we want to know the “average deviation score”? Finding the standard error of the estimate (line) Standard error of the estimate (line) Standard error of the estimate: • a measure of the average amount of predictive error • the average amount that Y’ scores differ from Y scores • a mean of the lengths of the green lines • Slope doesn’t give “variability” info • Intercept doesn’t give “variability info • Correlation “r” does give “variability info • Residuals do give “variability info

A note about curvilinear relationships and patterns of the residuals How well does the prediction line predict the Ys from the Xs? Residuals • Shorter green lines suggest better prediction – smaller error • Longer green lines suggest worse prediction – larger error • Why are green lines vertical? • Remember, we are predicting the variable on the Y axis • So, error would be how we are wrong about Y (vertical)

Assumptions Underlying Linear Regression • For each value of X, there is a group of Y values • These Y values are normally distributed. • The means of these normal distributions of Y values all lie on the straight line of regression. • The standard deviations of these normal distributions are equal.

Which minimizes error better? Is the regression line better than just guessing the mean of the Y variable?How much does the information about the relationship actually help? How much better does the regression line predict the observed results? r2 Wow!

What is r2? r2 = The proportion of the total variance in one variable that is predictable by its relationship with the other variable Examples If mother’s and daughter’s heights are correlated with an r = .8, then what amount (proportion or percentage) of variance of mother’s height is accounted for by daughter’s height? .64 because (.8)2 = .64

What is r2? r2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable Examples If mother’s and daughter’s heights are correlated with an r = .8, then what proportion of variance of mother’s height is not accounted for by daughter’s height? .36 because (1.0 - .64) = .36 or 36% because 100% - 64% = 36%

What is r2? r2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable Examples If ice cream sales and temperature are correlated with an r = .5, then what amount (proportion or percentage) of variance of ice cream sales is accounted for by temperature? .25 because (.5)2 = .25

What is r2? r2 = The proportion of the total variance in one variable that is predictable for its relationship with the other variable Examples If ice cream sales and temperature are correlated with an r = .5, then what amount (proportion or percentage) of variance of ice cream sales is not accounted for by temperature? .75 because (1.0 - .25) = .75 or 75% because 100% - 25% = 75%

Some useful terms • Regression uses the predictor variable (independent) to make predictions about the predicted variable (dependent) • Coefficient of correlation is name for “r” • Coefficient of determination is name for “r2”(remember it is always positive – no direction info) • Standard error of the estimate is our measure of the variability of the dots around the regression line(average deviation of each data point from the regression line – like standard deviation)

Pop Quiz - 5 Questions 1. What is regression used for? • Include and example 2. What is a residual? How would you find it? 3. What is Standard Error of the Estimate (How is it related to residuals?) 4. Give one fact about r2 5. How is regression line like a mean? r2

Writing Assignment - 5 Questions 1. What is regression used for? • Include and example Regressions are used to take advantage of relationships between variables described in correlations. We choose a value on the independent variable (on x axis) to predict values for the dependent variable (on y axis).

Writing Assignment - 5 Questions 2. What is a residual? How would you find it? Residuals are the difference between our predicted y (y’) and the actual y data points. Once we choose a value on our independent variable and predict a value for our dependent variable, we look to see how close our prediction was. We are measuring how “wrong” we were, or the amount of “error” for that guess. Y – Y’

Writing Assignment - 5 Questions 3. What is Standard Error of the Estimate (How is it related to residuals?) The average length of the residuals The average error of our guess The average length of the green lines The standard deviation of the regression line

Writing Assignment - 5 Questions 4. Give one fact about r2 5. How is regression line like a mean?

Thank you! See you next time!!

Understanding Statistical Inference in Management: Regression, Correlation, and Hypothesis Testing

Understanding Statistical Inference in Management: Regression, Correlation, and Hypothesis Testing

Presentation Transcript

Enterprise Content Management

1.040/1.401 Project Management Spring 2007 Lecture 6 Estimating

MGMT 490 Strategic Management

Thinking About How You Read READING STRATEGIES

Exact and approximate inference in probabilistic graphical models

Practical Statistical Relational Learning

South Carolina ELDA Spring 2014 STC and TA Training Tool

Network Management Protocols

Chemistry 481(01) Spring 2014

Alabama High School Graduation Exam Building Test Coordinator Training Spring 2014

GAA 2014 Spring Workshop

Statistical Inference and Regression Analysis: Stat-GB.3302.30, Stat-UB.0015.01

PCSSD SPRING 2014 SCHOOL TEST COORDINATORS’ TRAINING

Memory Management

Inferential Statistics

Statistical inference for astrophysics

Statistical Studies: Statistical Investigations

Last Time

STK 4600: Statistical methods for social sciences.

Chapter 11

End of Course Tests (EOCT) Pre-Administration Webinar Spring 2014 March 4 – March 6, 2014

SPSS