1 / 24

Debellor Data Mining Platform with Stream Architecture

Debellor Data Mining Platform with Stream Architecture. Marcin Wojnarski. Warsaw University, Poland. Outline. Debellor – data mining platform Motivation Main features Architecture: Cell data streaming multi-threading A vailable in ver . 0. 6 Future releases Summary. Debellor.

scharlie
Télécharger la présentation

Debellor Data Mining Platform with Stream Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DebellorData Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

  2. Outline • Debellor – data mining platform • Motivation • Main features • Architecture: • Cell • data streaming • multi-threading • Available in ver. 0.6 • Future releases • Summary

  3. Debellor • Language:Java • Licence:open source (GPL) • Download:www.debellor.org • Debello – to conquer (latin).Debellor – conqueror of data

  4. Debellor – data mining platform Rseslib LibSVM Debellor Weka TA-Lib own… own…

  5. Motivation Demand for more complex algorithms. Necessity to combine elementary algorithms.

  6. Visualize Load Preprocess Preprocess Predict Save Load Motivation • Data Processing Network (DPN)

  7. Classifier A Classifier B Voting Classifier C Motivation • Committee of algorithms

  8. Motivation • Nested algorithms RBF neural network K-means

  9. Requirements Versatile Efficient Simple

  10. Features of Debellor • All types of data processing algorithms • Extendible data types • Stream architecture  large data sets • Multi-threading • Immutability of data objects  safety

  11. Debellor

  12. Algorithm= Cell Cell cell = new RseslibClassifier("C45"); cell.set("pruning", "true"); cell

  13. Cell – data source cell.open(); Sample s1 = cell.next(), s2 = cell.next(), ... cell.close(); cell

  14. Cell – data receiver cell.setSource(anotherCell); anotherCell cell

  15. Trainable Cell cell.setSource(…); cell.learn(); EMPTY cell TRAINED cell

  16. A B A B Data Streaming BATCH STREAM It’s the cell who is responsible for asking for data

  17. Benefits of streaming training of k-means X X crash!

  18. Multi-threading Thread_1 A B

  19. Multi-threading A.newThread(); Thread_2 Thread_1 A B

  20. Available in version 0.6 • Rseslib algorithms: • classifiers (~20 algorithms) • Weka algorithms: • ARFF reader • classifiers (~60) • filters (47) • Debellor algorithms: • Train&Test evaluation • k-means for large data (stream-based) • Data types: • numeric andsymbolic features • vectors of features, vectors of vectors of …

  21. Future releases • Multi-input & multi-output cells • Composite cells (e.g. meta-learning) • Serialization and copying • …

  22. Summary • Platform • Stream architecture • Extendible • Multi-threaded • Weka & Rseslib partially integrated

  23. Home www.debellor.org

  24. Thank You

More Related