240 likes | 347 Vues
Debellor is an advanced data mining platform designed for efficient processing of large datasets through its innovative stream architecture. Developed in Java and available under the GPL license, Debellor supports multi-threading, extends various data types, and features a comprehensive suite of algorithms from Rseslib and Weka. Version 0.6 introduces numerous powerful functionalities, including the processing and visualizing capabilities of complex algorithms. Future releases promise exciting features like multi-input cells and meta-learning capabilities. Learn more and download at www.debellor.org.
E N D
DebellorData Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland
Outline • Debellor – data mining platform • Motivation • Main features • Architecture: • Cell • data streaming • multi-threading • Available in ver. 0.6 • Future releases • Summary
Debellor • Language:Java • Licence:open source (GPL) • Download:www.debellor.org • Debello – to conquer (latin).Debellor – conqueror of data
Debellor – data mining platform Rseslib LibSVM Debellor Weka TA-Lib own… own…
Motivation Demand for more complex algorithms. Necessity to combine elementary algorithms.
Visualize Load Preprocess Preprocess Predict Save Load Motivation • Data Processing Network (DPN)
Classifier A Classifier B Voting Classifier C Motivation • Committee of algorithms
Motivation • Nested algorithms RBF neural network K-means
Requirements Versatile Efficient Simple
Features of Debellor • All types of data processing algorithms • Extendible data types • Stream architecture large data sets • Multi-threading • Immutability of data objects safety
Algorithm= Cell Cell cell = new RseslibClassifier("C45"); cell.set("pruning", "true"); cell
Cell – data source cell.open(); Sample s1 = cell.next(), s2 = cell.next(), ... cell.close(); cell
Cell – data receiver cell.setSource(anotherCell); anotherCell cell
Trainable Cell cell.setSource(…); cell.learn(); EMPTY cell TRAINED cell
A B A B Data Streaming BATCH STREAM It’s the cell who is responsible for asking for data
Benefits of streaming training of k-means X X crash!
Multi-threading Thread_1 A B
Multi-threading A.newThread(); Thread_2 Thread_1 A B
Available in version 0.6 • Rseslib algorithms: • classifiers (~20 algorithms) • Weka algorithms: • ARFF reader • classifiers (~60) • filters (47) • Debellor algorithms: • Train&Test evaluation • k-means for large data (stream-based) • Data types: • numeric andsymbolic features • vectors of features, vectors of vectors of …
Future releases • Multi-input & multi-output cells • Composite cells (e.g. meta-learning) • Serialization and copying • …
Summary • Platform • Stream architecture • Extendible • Multi-threaded • Weka & Rseslib partially integrated
Home www.debellor.org