Parallelized Boosting

Parallelized Boosting Mehmet Basbug Burcin Cakir Ali Javadi Abhari Date: 10 Jan 2013

Motivating Example • Many examples, many attributes • Can we find a good (strong) hypothesis relating the attributes to the final labels? Attributes Labels Examples Table 1. Example Data Format

User Interface • User specifies the desired options in two ways: • Configuration File:Information about the number of nodes/cores, memory, number of iterations. To be parsed by the preprocessor. • Behavioral Classes:Defining the hypotheses "behaviors" --------------------------------------------------------------------------- <configurations.config> --------------------------------------------------------------------------- [Configuration 1] working_directory = '/scratch/pboost/example' data_files = 'diabetes_train.dat' test_files = 'diabetes_test.dat' fn_behavior = 'behaviors_diabetes.py' boosting_algorithm = 'confidence_rated' max_memory = 2 xval_no = 10 round_no = 1000 --------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------- <behaviors_diabetes.py> ---------------------------------------------------------------------------------------------------- from parallel_boosting.utility.behavior import Behavioral class BGL_Day_Av(Behavioral) def behavior(self, bgl_m, bgl_n, bgl_e) { return (self.data[:,day_m]+self.data[:,day_m], self.data[:,day_m]) / 3 def fn_generator(self){ bgl_m = list() bgl_n = list() bgl_e = list() for k in range(1, (self.data.shape[1]-4)/3+1) bgl_m = 3*k bgl_n = 3*k+1 bgl_e = 3*k +2 self.insert(bgl_m,bgl_n,bgl_e) } ----------------------------------------------------------------------------------------------------

Package Diagram

Pre-Processing • User-defined python classes: to obtain different function behaviors. to obtain a set of hypotheses. • Configuration file: to get the path of the required data and definitions. • Function Definitions Table: to store the hypothesesand make it available to different cores • Hypothesis Result Matrix • Sorting Index Matrix: to save the sorting indices of each example Table 2. Function Definitions Table

Pre-Processing (cont.') Table 3. Function Output Table Applying each function to each example is a parallelizable task. Therefore, another important step that needs to be implemented in the preprocessing part is to read the machine informationfrom the configuration file. Table 4. Sorting Index Table

Training the boosting algorithm slave Sorting index is partitioned Error matrices for each slave Dt h1t Weak Learner (Slave) Calculate error for each combination (hypothesis, labeling, threshold) for the hypothesis in the given set for given distribution over examples(Dt) Return the hypothesis with the least error master Boosting (Master) Start with a distribution over examples(Dt) For each round t=1...T send Dt to each slave receive best hypotheses from each slave(h1t,h2t) find the one with the least error (ht) update Dtusing ht calculate the coefficient at Return the linear combination of hts h2t Dt slave Features - Super fast --- memory based --- single pass through data --- store indexes rather than results(16 bit vs 64 bit) --- LAPACK & numexpr --- embarrassingly parallelized - Several Boosting algorithms - Flexible xval structure

Post-Processing • Combines and reports the collected results • The result after each round of iteration is stored by the master: Set of hypothesis(ht) and their respective coefficients(at), and the error. • Plot training and testing error vs. number of rounds • Plot ROC curve of training and testing error • Confusion matrix showing false/true positives/negatives • Create standalone final classifier • Report running time, amount of memory used, number of cores, ... • Clean up extra intermediary data stored on disk

Post-Processing

Thank You!

Parallelized Boosting

Parallelized Boosting

Presentation Transcript

Boosting

Parallelized Analytic Placer

Parallelized Monte Carlo Raytracing

Relative Reality—Parallelized

Boosting

boosting

Clustrix Parallelized Clustered Database

Boosting

AUTOMATICALLY TUNING PARALLEL AND PARALLELIZED PROGRAMS

PARALLELIZED CONVOLUTION

Boosting

BATTERY BOOSTING

Boosting

Boosting

Parallelized Evolution System

Boosting scores

Boosting

Boosting

Csgo boosting

overwatch boosting

overwatch boosting

Elo Boosting