1 / 35

Improve Phase Process Modeling Regression

Improve Phase Process Modeling Regression. Process Modeling. Welcome to Improve. Correlation. Process Modeling: Regression. Introduction to Regression. Simple Linear Regression. Advanced Process Modeling: MLR. Designing Experiments. Wrap Up & Action Items.

grube
Télécharger la présentation

Improve Phase Process Modeling Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improve PhaseProcess Modeling Regression

  2. Process Modeling Welcome to Improve Correlation Process Modeling: Regression Introduction to Regression Simple Linear Regression Advanced Process Modeling: MLR Designing Experiments Wrap Up & Action Items

  3. The primary purpose of linear correlation analysisis to measure the strength of linear association between two variables (X and Y). If X increases with no definite change in the value of Y, there is no correlationor no association between X and Y. If X increases and there is a shift in the value of Y there is a correlation. The correlation is positive when Y tends to increase with an increase in X and negative when Y tends to decrease with an increase in X. If the ordered pairs (X, Y) tend to follow a straight line path there is a linear correlation. The preciseness of the shift in Y as X increases determines the strength of the linear correlation. To conduct a linear correlation analysis we need: Bivariate Data – Two pieces of data that are variable Bivariate data is comprised of ordered pairs (X/Y) X is the independent variable Y is the dependent variable Correlation

  4. Correlation Coefficient • Ho: No Correlation • Ha: There is Correlation • The Correlation Coefficient always assumes a value between –1 and +1. • The Correlation Coefficient of the population, R, is estimated by the sample Correlation Coefficient, r: Ho ho ho…. Ha ha ha….

  5. Moderate Positive Correlation Weak Positive Correlation Strong Positive Correlation 1 1 0 1 1 0 8 5 1 0 0 1 0 0 9 0 9 0 7 5 8 0 8 0 Output Output Output 7 0 7 0 6 0 6 5 6 0 5 0 5 0 4 0 5 5 4 0 3 0 4 0 5 0 6 0 7 0 8 0 9 0 5 0 6 0 7 0 8 0 9 0 1 0 0 4 0 5 0 6 0 7 0 8 0 9 0 1 0 0 1 1 0 1 2 0 Input Input Input Weak Negative Correlation Strong Negative Correlation Moderate Negative Correlation 1 1 0 1 1 0 8 5 1 0 0 1 0 0 9 0 9 0 7 5 8 0 8 0 Output Output Output 7 0 7 0 6 0 6 5 6 0 5 0 5 0 4 0 5 5 4 0 3 0 0 1 0 2 0 3 0 4 0 5 0 0 1 0 2 0 3 0 4 0 5 0 6 0 7 0 8 0 1 0 2 0 3 0 4 0 5 0 6 0 Input Input Input Types and Magnitude of Correlation

  6. Limitations of Correlation • The magnitude of the Correlation Coefficient is somewhat relative and should be used with caution. • As usual statistical significance is judged by comparing a P-value with the chosen degree of alpha risk. • Guidelines for practical significance are as follows: • If | r | > 0.80, relationship is practically significant • If | r | < 0.20, relationship is not practically significant Area of negative linear correlation Area of positive linear correlation No linear correlation +1.0 -1.0 -0.8 -0.2 0.2 0.8 0

  7. Correlation Example RB Stats Correlation.mtw X valuesY values Payton carriesPayton yards 196679 3111390 3391852 3331359 3691610 3171460 3391222 148 596 3141421 3811684 3241551 3211333 146 586 • The Correlation Coefficient [r]: • Is a positive value if one variable increases as the other variable increases. • Is a negative value if one variable decreases as the other increases. Correlation Formula

  8. Correlation Analysis Graph>Scatter Plot>Simple… Get outta my way!

  9. Correlation Example • Look at the graph. Do you observe any correlation in this graph? Lowess stands for LOcally-WEighted Scatterplot Smoother.

  10. Correlation Example Correlation Coefficient is high and the P-value is low. Reject the null hypothesis; there is a correlation. Results for: RB STATS CORRELATION.MTW Scatterplot of Payton yards vs Payton carries Correlations: Payton carries, Payton yards Pearson correlation of Payton carries and Payton yards = 0.935 P-Value = 0.000

  11. Regression Analysis The last step to proper analysis of Continuous Data is to determine the Regression Equation. The Regression Equation can mathematically predict Y for any given X. MINITABTM gives the BEST FIT for the plotted data. Prediction Equations: Y = a + bx(Linear or 1st order model) Y = a + bx + cx2(Quadratic or 2nd order model) Y = a + bx + cx2 + dx3(Cubic or 3rd order model) Y = a (bx)(Exponential)

  12. Simple versus Multiple Regression • Simple Regression: • One X, One Y • Analyze in MINITABTM using • Stat>Regression>Fitted Line Plot or • Stat>Regression>Regression • Multiple Regression: • Two or More X’s, One Y • Analyze in MINITABTM using: • Stat>Regression>Regression In both cases the R-sq value signifies the input variation contribution on the output variation as explained in the model.

  13. Regression Analysis Graphical Output

  14. Regression Analysis Statistical Output Stat > Regression > Regression Regression Analysis: payton yards versus payton carries The Regression Equation is Payton yards = -163.497 + 4.91622 Payton carries S = 153.985 R-Sq = 87.3 % R-Sq(adj) = 86.2 % Analysis of Variance Source DF SS MS F P Regression 1 1798587 1798587 75.8531 0.000 Error 11 260826 23711 Total 12 2059413 R-Sq value of 87.3% = 1798587 / 2059413 R-Sq (adj) of 86.2% = (1798587 – 23711)/2059413 Mean Squares R-Sq value of 87.3% quantifies the strength of the association between Carries and Yards. In this case our Prediction Equation explains 87.3% of the total variation seen in “Yards”. 12.7% of the variation seen in “Yards” is not explained by our equation.

  15. Regression (Prediction) Equation • The solution: Regression Analysis: Payton yards versus Payton carries The Regression Equation is Payton yards = -163.497 + 4.91622 (Payton carries) Constant Level of X Coefficient

  16. Regression (Prediction) Equation Compare to the Fitted Line. ~1067 yds

  17. Regression Graphical Output For a demonstration check other regression fits. Stat>Regression>Fitted Line Plot Quadratic and Cubic – Check the r2 value against the linear model to determine if the difference between the variance explained by our equation is significant.

  18. Regression Graphical Output Quadratic Cubic If the R-Sq value improves significantly or if the assumptions of the residuals are better met as a result of utilizing the quadratic or cubic equation you will want to use the best fitting equation.

  19. Residuals As in ANOVA the residuals should: • Be Normally Distributed (normal plot of residuals) • Be independent of each other • no patterns (random) • data must be time ordered (residuals vs. order graph) • Have a constant variance (visual, see residuals versus fits chart, should be (approximately) same number of residuals above and below the line, equally spread.)

  20. Residual Plots Residual Plots can be generated from both the Fitted Line Plot and regression selection in MINITABTM. Standardized residual is also known as the Studentized residual or internally Studentized residual. The standardized residual is the residual divided by an estimate of its Standard Deviation. This form of the residual takes into account the residuals may have different variances which can make it easier to detect Outliers.

  21. Residuals Equal variance assumption… Normality assumption… Independence assumption…

  22. Residual Analysis Stat>Regression>Regression Regression Analysis: payton yards versus payton carries The regression equation is payton yards = - 163 + 4.92 payton carries Predictor Coef SE Coef T P Constant -163.5 172.0 -0.95 0.362 payton c 4.9162 0.5645 8.71 0.000 S = 154.0 R-Sq = 87.3% R-Sq(adj) = 86.2% Analysis of Variance Source DF SS MS F P Regression 1 1798587 1798587 75.85 0.000 Residual Error 11 260826 23711 Total 12 2059413 Unusual Observations Obs payton c payton y Fit SE Fit Residual St Resid 3 339 1852.0 1503.1 49.3 348.9 2.39R R denotes an observation with a large standardized residual

  23. Normal Probability Plot of Residuals Normally Distributed response assumption - Residuals should lay near the straight line (to within a fat pencil of each other).

  24. Residuals versus Fitted Values Equal Variance assumption ~ Should be randomly scattered with no patterns.

  25. Residuals versus Order of Data Independence assumption ~ Should show no trends either up or down and should have approximately the same number of points above and below the line (approximately constant variance).

  26. Modeling Y = f(x) Exercise • Exercise objective: To gain an understanding of how to use regression/correlation function in MINITABTM. Examine correlation and regression for the Dorsett data in the RB stats correlation file and answer the following questions. • What is the type and magnitude of the correlation? • a. Strong Positive • b. Moderate Positive • c. Weak Positive • d. Strong Negative • 2.What is the Prediction Equation? • 3.What is the predicted value or yardage if Dorsett carries the football 325 times? • 4.Are all assumptions met? RB Stats Correlation.mtw

  27. Modeling Y = f(x) Exercise: Question 1 Solution • To determine the Type and Magnitude of the relationship we need to run a basic Scatter Plot. • From “Graph” select “Scatterplot” then “Simple”… • For “Y variables” enter ‘dorsett yards’; for “X variables” enter ‘dorsett carries’.

  28. Modeling Y = f(x) Exercise: Question 1 Solution The Scatter Plot demonstrates a “Strong Positive Correlation”.

  29. Modeling Y = f(x) Exercise: Question 2 Solution To determine the Prediction Equation we need to run a Fitted Line Plot. Stat > Regression > Fitted Line Plot… Fitted Line Plot

  30. Modeling Y = f(x) Exercise: Question 2 Solution For “Response (Y):” enter ‘dorsett yards’ For “Predictor (X):” enter ‘dorsett carries’

  31. Modeling Y = f(x) Exercise: Question 2 Solution The Prediction Equation is shown here…

  32. Modeling Y = f(x) Exercise: Question 3 Solution If Dorsett carries the football 325 times the predicted value would be determined as follows… Step 1: Dorsett Yards = -160.1 + 4.993 (Dorsett Carries) Step 2: Dorsett Yards = -160.1 + 4.993 (325) Step 3: Dorsett Yards = -160.1 + 1622.725 Dorsett Yards = 1462.63 Solution:

  33. Modeling Y = f(x) Exercise: Question 4 Solution The Normality Assumptions have been satisfied. The Equal Variance Assumptions have been satisfied. The Independence Assumptions have been satisfied. Ah, so much satisfaction!

  34. At this point you should be able to: Perform the steps in a Correlation and a Regression Analysis Explain when Correlation and Regression is appropriate Summary

More Related