Performance Improvement for Bayesian Classification on Spatial Data with P-Trees

Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science North Dakota State University Fargo, ND 58105 These notes contain NDSU confidential and Proprietary material. Patents pending on the P-tree technology

Outline • Introduction • P-Tree • P-Tree Algebra • Bayesian Classifier • Calculating Probabilities using P-Trees • Band-based vs. Bit-based approach • Sample Data • Classification Accuracy • Classification Time • Conclusion

Introduction • Classification is a form of data analysis and data mining that can be used to extract models describing important data classes or to predict future data trends. • Some data classification techniques are: • Decision Tree Induction • Bayesian • Neural Networks • K-Nearest Neighbor • Case Based Reasoning • Genetic Algorithm • rough sets • fuzzy logic techniques • A Bayesian classifier is a statistical classifier, which uses Bayes’ theorem to predict class membership as a conditional probability that a given data sample falls into a particular class.

Introduction Cont.. • The P-Tree data structure allows us to compute the Bayesian probability values efficiently, without resorting to the naïve Bayesian assumption. • Bayesian classification with P-Trees has been used successfully in remotely sensed image precision agriculture to predict yield and in genomics (2-yeast hybrid classification) to place in the ACM 02KDD-cup competition. http://www.biostata.wisc.edu/~craven/kddcup/winners.html • To completely eliminate the naïve assumption, a bit-based Bayesian classification is used instead of a band-based approach.

P-Tree • Most spatial data comes in a band format called BSQ. • Each BSQ band is divided into several files, one for each bit position of the data values. This format is called ‘bit Sequential’ or bSQ. • Each bSQ bit file, Bij (file constructed from the jth bits of ith band), into a tree structure, called a Peano Tree (P-Tree). • P-Trees represent tabular data in a lossless, compressed, bit-by-bit, recursive, datamining-ready arrangement.

A bSQ file, its raster spatial file and P-Tree 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 55 55 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 16 16 8 8 15 15 16 16 3 3 0 0 4 4 1 1 4 4 4 4 3 3 4 4 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 1 • Peano or Z-ordering • Pure (Pure-1/Pure-0) quadrant • Root Count • Level • Fan-out • QID (Quadrant ID)

P-Tree Algebra Ptree: 55 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 16 ____8__ _15__ 16 / / | \ / | \ \ 3 0 4 1 4 4 3 4 //|\ //|\ //|\ 1110 0010 1101 • Logical operator • And • Or • Complement • Other (XOR, etc) • Applying this operators we calculate value P-Trees, interval P-Trees, and slice P-Trees. Complement: 9 ____________/ / \ \___________ / ___ / \___ \ / / \ \ 0 ____8__ __1__ 0 / / | \ / | \ \ 1 4 0 3 0 0 1 0 //|\ //|\ //|\ 0001 1101 0010

P-Tree Algebra Cont.. • Basic P-Trees can be combined using logical operations to produce P-Trees for the original values at any level of bit precision. Using 8-bit precision for values, Pb11010011 , which counts the numer of occurrences of 11010011 in each quadrant, can be constructed from the basic P-Trees as: Pb11010011 = Pb1 AND Pb2 AND Pb3’ AND Pb4 AND Pb5’ AND Pb6’ AND Pb7 AND Pb8 ’ indicates COMPLEMENT operation AND operation is simply the pixel-wise AND of the bits • Similarly, any data set in the relational format can be represented as P-Trees. For any combination of values, (v1,v2,…,vn), where vi is from band-i, the quadrant-wise count of occurrences of this combination of values is given by: • P(v1,v2,…,vn) = P1v1 ^ P2v2 ^ … ^ Pnvn

Bayesian Classifier Based on Bayes Theorem: Pr ( X | C ) * Pr ( C ) i i = Pr ( C | X ) i Pr ( X ) • Pr(Ci | X) is the posterior probability • Pr(Ci) is the prior probability • Can find conditional probabilities, Pr(X|Ci). • Classify X with Max Pr(Ci | X) • Since Pr(X) is constant for all classes, therefore, instead maximize Pr(X|Ci) * Pr(Ci).

Calculating Probabilities Pr(X|Ci) Using naïve assumption Pr(X | Ci ) = Pr( X1 | Ci ) × Pr( X2 | Ci )… × Pr( Xn | Ci )× Pr( XC | Ci) Scan the data and calculate Pr(X | Ci ) for given X . Using P-Trees: Pr(X|Ci) = # training samples in Ci having pattern X / # samples in class Ci = RC[ P1(X1) ^ P2(X2) ^ … ^Pn(Xn) ^ PC(Ci) ] / RC[ PC(Ci) ] Problem ? : if RC[ P1(X1) ^ P2(X2) ^ … ^Pn(Xn) ^ PC(Ci) ] = 0 for all i i.e unclassified pattern does not exist in the training set.

Band-based-P-tree Approach • When all RC = 0 for given pattern • Reduce the restrictiveness of the pattern • Removing the attribute with least information gain • Calculate (assume attribute 2 has the least IG) • Pr( X | Ci ) = RC[ P1X1 ^ P3X3 ^ … ^ PnXn ^ PCCi ] / RC[ PCCi ] • Calculation of information gain Using P-trees • 1 time calculation for the entire training data

Bit-based Approach R R 11 11 10 10 01 01 00 00 G 00 01 10 11 G 00 01 10 11 • Search for similar patterns by removing the least significant bits in the attribute space. • The order of the bits to be removed is selected by calculating the info gain (IG). E.g., Calculate the Bayesian conditional probability value for the pattern [G,R] = [10,01] in 2-attribute space. Assume IG for 1st significant bit of R < that of G. Assume IG for 2nd significant bit of G < that of R. Initially, search for the pattern, [10,01] (a). If not found, search for [1_,01] considering IG for the 2nd significant bit. Search space will increase (b). If not found, search for [1_,0_] considering IG for the 2nd significant bit. Search space will increase (c). If not found, search for [1_,_ _] considering IG for the 1st significant bit. Search space will increase (d). (a) (b) R R 11 11 10 10 01 01 00 00 00 01 10 11 G 00 01 10 11 G (c) (d)

Experiments • The experimental data was extracted from two sets of aerial photographs of the Best Management Plot (BMP) of the Oakes Irrigation Test Area (OITA) near Oaks, North Dakota. • The images were taken in 1997 and 1998. • Each image contains 3 bands, red, green and blue reflectance values. • Three other files contain synchronized soil moisture, nitrate and yield values.

Classification Accuracy Classification accuracy for '97 Data 90 80 70 60 50 40 30 20 10 0 1K 4K 16K 65K 260K Training Data Size (pixels) Band-Ptree KNN-Euc. Bit • Accuracy of the proposed bit-based approach is compared with band-based, and KNN with Euclidian distance. • It is clear that our approach out performs the others.

Classification Accuracy Cont.. Accuracy • The accuracy of the approach was also compared to an existing Bayesian belief network classifier. The classifier is J Cheng's Bayesian Belief Network available at http://www.cs.ualberta.ca/~jcheng/ . • This classifier was the winning entry for the KDD Cup 2001 data mining competition. The developer claims that the classifier can perform with or without domain knowledge. • For the comparison smaller training data sets ranging from 4K to 16K pixels were used due to the inability of the implementation to handle larger data sets. The Belief network was built without using any domain knowledge to make it comparable with to P-Tree approach.

Classification Time • P-Tree approach requires no build time (lazy classifier). • In most lazy classifiers the classification time per tuple varies with the number of items in the training set due to the requirement of having to scan the training data. • P-Tree approach does not require a traditional data scan. • The data in figure was collected using 5 significant bits and a threshold probability of 0.85. • The time is given for scalability comparisons. Variation of Classification Time with Training Size for bit-P-tree alg. 300 200 100 0 0 100 200 300 Trainig sample size (pixels)

Conclusion • Naïve assumption reduces the accuracy of the classification in this particular application domain. • Our approach increases accuracy of a P-Tree Bayesian classifier by completely eliminating the naïve assumption. • New approach has a better accuracy than the existing P-Tree based Bayesian classifier. • It was also shown to be better than a Bayesian belief network implementation and a Euclidian distance based KNN approach. • It has the same computational cost with respect to the use of P-Tree operations as the previous P-tree approach, and is scalable with respect to the size of the data set.

Performance Improvement for Bayesian Classification on Spatial Data with P-Trees

Performance Improvement for Bayesian Classification on Spatial Data with P-Trees

Presentation Transcript

Classification with Decision Trees

Spatial data structures – kd -trees

Classification with Multiple Decision Trees

Trees for spatial indexing

Bayesian Classification

Nonparametric Bayesian Classification

Bayesian spatial modelling of disease vector data on Danish farmland

Bayesian Classification of Protein Data

Bayesian Classification Using P-tree

Bayesian Classification

Trees for spatial indexing

Bayesian Classification

Performance improvement webcast harnessing data for powerful performance

Classification Techniques: Bayesian Classification

Classification with Decision Trees I

Classification Bayesian Classifiers

k-Nearest Neighbor Classification on Spatial Data Streams Using P-trees

Bayesian Classification Using P-tree

Bayesian Classification