1 / 39

Classification, Regression and Other Learning Methods CS240B Presentation

Classification, Regression and Other Learning Methods CS240B Presentation. Peter Huang June 4, 2014. Outline. Motivation Introduction to Data Streams and Concept Drift Survey of Ensemble Methods: Bagging: KDD ’01 : A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification

kita
Télécharger la présentation

Classification, Regression and Other Learning Methods CS240B Presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Classification, Regression and Other Learning MethodsCS240B Presentation Peter Huang June 4, 2014

  2. Outline • Motivation • Introduction to Data Streams and Concept Drift • Survey of Ensemble Methods: • Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification • Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers • Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams • Summary • Conclusion

  3. Motivation • Significant amount of research recently has focused on mining data streams • Real-world applications include: financial data analysis, credit card fraud, network monitoring, sensor networks, and many others • Algorithms for mining data streams have to overcome challenges not seen in traditional data mining, particularly performance and unending data sets • Traditional algorithms must be made non-blocking, fast and light, and must adapt to data stream issues

  4. Data Streams • A Data Stream is a continuous stream of data items, in the form of tuples or vectors, that arrive at a high rate, and are subject to unknown changes such as concept drift or shift • Algorithms that process data streams must be: • Iterative – reading data sequentially • Efficient – fast and light in computation/memory • Single-pass – account for surplus of data • Adaptive – account for concept drift • Any-time – be able to provide best answer continuously

  5. Data Stream Classification • Various type of methods are used to classify data streams • Single classifier • Sliding window on recent data, fixed or variable • Naive Bayes, C4.5, RIPPER • Support vector, neural networks • K-NN, linear regression • Decision Trees • BOAT algorithm • VFDT, Hoeffding tree • CVFDT • Ensemble Methods • Bagging • Boosting • Random Forest

  6. Concept Drift • Concept drift is an implicit property of data streams • Concept may change or drift over time due to sudden or gradual changes of external environment • Mining changes one of the core issues of data mining, useful in many real-world applications • Two types of concept change: gradual and shift • Methods to adapt to concept drift: • Ensemble methods, majority or weight voting • Exponential Forgetting, forgetting factor • Replacement methods, create new classifier

  7. Type of Concept Drift • Two types of concept change: gradual and shift • Shift: change in mean, class/distribution change • Gradual: change in mean and variance, trends

  8. Ensemble Classifiers • Ensemble methods is one method of classification that naturally handles concept drift • Combines the predictions of multiple base models, each learned using a base learner • Known that combining multiple models consistently outperforms individual models • Use either traditional averaging or weighted averaging to classify data stream items

  9. Survey of Ensemble Methods • Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification • Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers • Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams

  10. KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification • Approach problem of large-scale or streaming classification by building committee or ensemble classifiers, each constructed on a subset of available data points • Basically introduces the concept of ensemble classification • Traditional scheme of averaging prediction used • Later improved in KDD ’03, KDD ’04, and more

  11. Ensemble of Classifiers • Fixed ensemble size, up to around 20-25 • New classifier replaces least quality classifier in existing ensemble • Building blocks are decision trees constructed using C4.5 • Operational parameter is whether to prune tree or not • In experiments, pruning decreased overall accuracy because of over-fitting • Adapts to concept drift by changing over time, follows Gaussian-like CDF gradual change

  12. Streaming Ensemble Pseudocode • while more data points are available read d points, create training set D build classifier Ci using D evaluate Ci-1 on D evaluate all classifiers in ensemble E on D if E not full insert Ci-1 into E else if Quality(Ci-1) > Quality(Ej) for some j replace Ej with Ci-1 • Quality is measured by ability to classify points in current test set

  13. Replacement of Existing Classifiers Existing Ensemble of Classifiers Newly Trained Classifier 78 84 75 80 70 85 78 84 75 80 85 68 New Ensemble of Classifiers Next Trained Classifier Average Ensemble Quality: 77.4  80.4

  14. Experimental Results: Adult Data

  15. Experimental Results: SEER Data

  16. Experimental Results: Web Data

  17. Experimental Results: Concept Drift

  18. Survey of Ensemble Methods • Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification • Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers • Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams

  19. KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers • General framework for mining concept-drifting data streams using ensemble of weighted classifiers • Basically improves the concept of ensemble classification by adding weighted averaging instead of traditional averaging • Weight is reversely proportional to classifiers expected error, or MSE, such that wi = MSRr – MSRi • Eliminates the effect of examples representing outdated concepts by assigning lower weight

  20. Ensemble of Classifiers • Fixed ensemble size, top K classifiers kept • New classifiers replaces less weighted classifiers in existing ensemble • Building blocks are decision trees constructed using C4.5 • Adapts to concept drift by removing and/or reducing weight of incorrect classifiers

  21. Streaming Ensemble Pseudocode • while more data points are available read d points, create training set S build classifier C’ from S compute error rate of C’ via cross-validation on S derive weight w’ for C’, w’ = MSEr – MSEi for each classifier Ci in C: apply Ci on S to derive MSEi compute weight wi C  top K weight classifiers in C U {C’} return C • Quality is measured by ability to classify points in current test set

  22. Data Expiration Problem • Identify in a timely manner those data in the training set that are no longer consistent with the current concepts • Discards data after they become old, that is, after a fixed period of time T has passed since their arrival • If T is large, the training set is likely to contain outdated concepts, which reduces classification accuracy • If T is small, the training set may not have enough data, and as a result, the learned model will likely carry a large variance due to over-fitting.

  23. Expiration Problem Illustrated

  24. Replacement of Existing Classifiers Existing Stream of Classifiers Train Example 12 15 19 21 10 X 13 12 15 19 13 10 Ensemble of Classifiers Used New Classifier from Train Newer Classifiers on Right, Numbers Represents MSE Error

  25. Experimental Results: Average Error

  26. Experimental Results: Error Rates

  27. Experimental Results: Concept Drift

  28. Survey of Ensemble Methods • Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification • Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers • Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams

  29. KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams • Novel Adaptive Boosting Ensemble method to solve continuous mining of data stream problem • Basically improves the concept of ensemble classification by boosting incorrectly classified samples • Weight of incorrect samples is wi = (1 – ej)/ej • Traditional scheme of averaging prediction used

  30. Ensemble of Classifiers • Fixed ensemble size, recent M classifiers kept • Boosting of incorrect sample weight provide a number of formal guarantees on performance • Building blocks are decision trees constructed using C4.5 • Adapts to concept drift by change detection, starting ensemble from scratch

  31. Streaming Ensemble Pseudocode • Eb = {C1,…,Cm}, Bj = {(x1,y1),…,(xn,yn)} • while more data points are available read n points, create training block Bj compute ensemble prediction on each n point i change detection: Eb {} if change detected if Eb <> {}: compute error rate of Eb on Bj set new samples weight wi = (1 – ej)/ej else: wi = 1 learn new classifier Cm+1 from Bj update Eb  Cm+1, remove C1 if m = M

  32. Change Detection • To detect change, check null hypothesis H0 and alternative hypothesis H1 • Two-stage method: first check significant test, second check hypothesis test

  33. Replacement of Existing Classifiers Existing Ensemble of Classifiers New Classifier 85 88 90 87 84 78 86 88 90 87 84 86 Boosted Ensemble Boosted Classifier Newer Classifiers on Right, Numbers Represents Accuracy

  34. Experimental Results: Concept Drift

  35. Experimental Results: Comparison

  36. Experimental Results: Time and Space

  37. Summary • Bagging: KDD ’01: A Streaming Ensemble Algorithm (SEA) for Large-Scale Classification • Introduced bagging ensemble for data stream • Weighted Bagging: KDD ’03: Mining Concept-Drifting Data Streams using Ensemble Classifiers • Adds weighting to improve accuracy and handle drift • Adaptive Boosting: KDD ’04: Fast and Light Boosting for Adaptive Mining of Data Streams • Adds boosting to further improve accuracy and speed

  38. Thank You Questions?

  39. Sources • Adams, Niall M., et al. "Efficient Streaming Classification Methods." (2010). • Street, W. Nick, and Yong SeogKim. "A streaming ensemble algorithm (SEA) for large-scale classification." Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2001. • Wang, Haixun, et al. "Mining concept-drifting data streams using ensemble classifiers." Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2003. • Chu, Fang, and Carlo Zaniolo. "Fast and light boosting for adaptive mining of data streams." Advances in Knowledge Discovery and Data Mining. Springer Berlin Heidelberg, 2004. 282-292.

More Related