1 / 15

Regression Methods

Regression Methods. Linear Regression. Simple linear regression (one predictor) Multiple linear regression (multiple predictors) Ordinary Least Squares estimation Computed directly from the data Lasso regression selects features by setting parameters to 0. Coefficient of Determination.

lmonaghan
Télécharger la présentation

Regression Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression Methods

  2. Linear Regression • Simple linear regression (one predictor) • Multiple linear regression (multiple predictors) • Ordinary Least Squares estimation • Computed directly from the data • Lasso regression • selects features by settingparameters to 0

  3. Coefficient of Determination • Indicates how well a model fits the data • R2 (R squared) • R2 = 1−SSres/SStot • SSres = Σ(yi−fi)2 difference between actual and predicted • SStot = Σ(yi−y)2difference between actual and horizontal line • Between 0 and 1, if least squares model. Bigger range if other models are used • Explained variance • what percentage of the variance is explained by the model • Linear least squares regression: R2 = r2

  4. R Squared • visual interpretation of R2 Source Wikipedia CC BY-SA 3.0 SStot SSres

  5. Regression Trees • Regression variant of decision tree • Top-down induction • 2 options: • Constant value in leaf (piecewise constant) • regression trees • Local linear model in leaf (piecewise linear) • model trees

  6. M5 algorithm (Quinlan, Wang) • M5’, M5P in Weka • (classifiers > trees > M5P) • Offers both regression trees and model trees • Model trees are default • -R option (buildRegressionTree) for piecewise constant

  7. M5 algorithm (Quinlan, Wang) • Splitting criterion: Standard Deviation Reduction • SDR = sd(T) – Σsd(Ti)|Ti|/|T| • Stopping criterion: • Standard deviation below some threshold (0.05sd(D)) • Too few examples in node (e.g. ≤ 4) • Pruning (bottom-up): • Estimate error: (n+v)/(n−v)×absolute error in node • n is examples in node, v is parameters in the model

  8. Binary Splits • All splits are binary • Numeric as normal (in C4.5) • Nominal: • order all values according to average (prior to induction) • introduce k-1 indicator variables in this order Example: database of skiing slopes avg(color = green) = 2.5% avg(color = blue) = 3.2% avg(color = red) = 7.7% avg(color = black) = 13.5% binary features: Green, GreenBlue, GreenBlueRed,

  9. Regression tree on Servo dataset (UCI)

  10. Model tree on Servo dataset (UCI) LM1: 0.0833 * motor=B,A + 0.0682 * screw=B,A + 0.2215 * screw=A + 0.1315 * pgain=4,3 + 0.3163 * pgain=3 − 0.1254 * vgain=1,2 + 0.3864

  11. h = 3100 h = 2200 Regression in Cortana • Regression a natural setting in Subgroup Discovery • Local models, no prediction model • Subgroups are piecewise constant subsets

  12. Subgroup Discover: regression • A subgroup is a step-function (inside subgroup vs. outside) • R2 of step function is an interesting quality measure (next to z-score) • available in Cortana as Explained Variance

  13. Other regression models • Functions • LinearRegression • MultiLayerPerceptron (artificial neural network) • SMOreg (Support Vector Machine) • Lazy • IBK (k-Nearest Neigbors) • Rules • M5Rule (decision list)

  14. Approximating a smooth function • Experiment: • take a mathematical function f (with infinite precision) • generate a dataset by sampling x and y, and computing z = f(x,y) • learn f by M5’ (regression tree)

  15. k-Nearest Neighbor • k-Nearest Neighbor can also be used for regression • with all advantages and disadvantages

More Related