200 likes | 211 Vues
Session on regression in machine learning
E N D
Program Name : B.Tech CSESemester : 5thCourse Name: Machine LearningCourse Code:PEC-CS-D-501 (I)Facilitator Name: Aastha
Introduction to RegressionAnalysis • Regression analysis is used to: • Predict the value of a dependent variable based on the value of at least one independentvariable • Explain the impact of changes in an independent variable on the dependentvariable • Dependentvariable: the variable we wish to predict orexplain • Independentvariable: the variable used toexplain • the dependentvariable
Simple LinearRegression Model • Only one independent variable,X • Relationshipbetween X and Y is described by a linearfunction • Changesin Y are assumed to be caused bychangesin X
Types ofRelationships Linearrelationships Curvilinearrelationships Y Y X X Y Y X X
Types ofRelationships (continued) Strongrelationships Weakrelationships Y Y X X Y Y X X
Types ofRelationships (continued) Norelationship Y X Y X
Simple LinearRegression Model Random Error term Population Slope Coefficient Population Y intercept Independent Variable Dependent Variable Yi β0 β1Xi Linearcomponent • εi • RandomError component
Simple LinearRegression Model (continued) Yi β0 β1Xi εi Y ObservedValue of Y forXi εi Slope =β1 PredictedValue RandomError of Y forXi for this Xvalue i Intercept =β0 X Xi
Simple Linear Regression Equation (PredictionLine) The simple linear regression equation provides an estimate of the population regressionline Estimated (orpredicted) Y valuefor observationi Estimate of theregression Estimate of the regressionslope intercept Value of Xfor observationi Yˆi b0 b1Xi The individual randomerrorterms ei have a mean ofzero
Regression UsingExcel • Tools / Data Analysis / Regression
Assumptions ofRegression • Use the acronymLINE: • Linearity • The underlying relationship between X and Y islinear • Independence ofErrors • Error values are statisticallyindependent • Normality ofError • Error values (ε) are normally distributed for any given value of X • Equal Variance(Homoscedasticity) • The probability distribution of the errors has constantvariance Department of Statistics, ITSSurabaya
Pitfalls of RegressionAnalysis • Lacking an awareness of the assumptions underlying least-squaresregression • Not knowing how to evaluate theassumptions • Not knowing the alternatives to least-squares regression if a particular assumption isviolated • Using a regression model without knowledge of the subject matter • Extrapolating outside the relevantrange Department of Statistics, ITSSurabaya
Aravali College of Engineering And Management Jasana, Tigoan Road, Neharpar, Faridabad, Delhi NCR Toll Free Number : 91- 8527538785 Website : www.acem.edu.in