1 / 21

AAPL Tweetbase and Document pTrees for Sentiment Analysis

This article explores the use of term and position indexed pTrees for sentiment analysis in AAPL tweetbase and documents, and discusses the use of positive sentiment bitmaps and value arrays. It also considers the importance of term position and the advantage of ordering classification training sets by class.

Télécharger la présentation

AAPL Tweetbase and Document pTrees for Sentiment Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AAPL For a Tweetbase, construct Docment pTrees (indexed by Term and Position)., ... Terms (Vocab) What about phrases? For 2 word phrases, use 4D cube. 5 6 .Docs Tweet3 Tweet2 Tweet1 apple always. buy AAPL April an all and are Term pTrees index by Doc, Pos. Pos 1 2… Sentiment analysis (by doc) : PSB: Positive Sentiment BitMap, 1 iff doc has positive AAPL sentiment. PSV: Positive Sentiment ValueArray, measures positive AAPL sentiment level? PSV for each term? Might term context change the sentiment? With term position information we should be able to evaluate PSV in context! PSB, PSV could be derived by hand (humans read tweets and assign a PSB or PSV). Do we need to use PS minus NS measures? (NS=Negative Sentiment) Research literature on Sentiment Analysis (word/doc sentiment assessment software ? Strategy: Each day buy the stock with the greatest Positive Sentiment Tweet Bloom? Doc Tweet3 Tweet2 Tweet1 1 1 0 1 1 1 2nd wd 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 buy, tweet1 1 0 1000 000 buy, tweet1 1 0 1000 000 AAPL, tweet1 1 0 0100 000 stronger PS than AAPL, tweet1 0 1 0000 010 determine by level1 & Docs 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 buy 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Why might the positions of words be important? e.g., “buy” and “AAPL” occur close in tweet position, - a stronger positive sentiment. If multilevel Pos pTrees, a positive sentiment bloom: again 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 all 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 always. 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 an Tweet2 Twt3 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 and 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 apple 0 0 0 0 0 0 0 0 0 April 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Tweet1 Doc pTrees (pTrees are named or index by Term and Position) Position 1 2 3 4 5 6 7 1st wd OR for Existential AAPL pTrees. Sum for AAPL tf 2wdPhraseStartPos pTree index (buy,AAPL,Tweet1) Pre-compute, save and catalog? or compute as needed by shifting AAPL TP pTree right 1-bit, then AND with the buy TP pTree. Multilevel 2wdPhraseStartPos pTrees strides= D,W,W,W-1, D=#docs, W=#wds 2-word phrase start position 1 2 3 4 5 6 1 0 0 0 0 1 Term buy Tweet1 buy 0100… AAPL 0000… AAPL AAPL… AAPL… 1 0 0 0 0 1 buy buy all always 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 0 . . . 0 . . . 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 . . . 0 . . . 0 . . . 0 . . . 0 . . . 0 . . . 0 . . . 0 . . . 0 . . . 1 . . . 0 0 0 0 1 0 0 1 1 0 0 1 0 0 0 0 1 . . . an Doc Tweet3 Tweet2 Tweet1 0 0 0 0 0 0 0 . . . Term a Etc. [Term]Pos pTrees 0 0 0 0 0 0 0 and 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 . . . apple 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . April DocumentTermPosition pTreeSet, e.g., Tweet Sentiment Analysis (1 iff Term in Pos in Doc) OR row gives Term=a ExistentialTerm pTree. Sum gives Term=a DocFreq (df) array 7 1 2 3 4 are Positions Of Terms…

  2. ClassWise Counts: Always use Class Order in every Classification TrainingSet E.g., X=(B1,B2,B3.B4). Let the class be B2. The ordering of a RSI table is usually a spatial pixel ordering (e.g., raster, Peano (Z), or Hilbert) Here we detail the advantage of ordering by class instead (even for spatial datasets). Rule: Always order classification TrainSet sby class. 1. Order the TrainingSet by Class (using a rough bucket sort?) . 2. Create all value pTrees for all feature attributes (e.g., B1, B3 and B4 here) 3. Create CWCT (ClassWiseCountTable) of count(PC=ci&PB=Bj) j=1,3,4, by applying CWC(PB=Bj) (ClassWiseCount operator) to each feature value bitmap (Because the TrainingSet was ordered by class, no ANDing is required any more.) Band4 11 15 11 11 11 11 11 11 15 15 11 11 15 15 11 11 Band1 3 3 7 7 3 3 7 7 2 2 10 15 2 10 15 15 Band2=Class 7 3 3 2 7 3 3 2 11 11 10 10 11 11 10 10 Band3 8 8 4 5 8 8 4 5 8 8 4 4 8 8 4 4 So the only operation is to apply to each feature value pTree, the ClassWiseCount operation. It can be applied in parallel to all feature value pTrees concurrently (on a “pTreeSet replicated” cluster) Thus, time to compute and compare (to threshold ) all Col InfoGains, is approx. time cost of one CWC So, it is paramount to fully optimize the code for the CWC operation! Substituting Pearson Correlation or Chi Square for Info Gain, CFC is all that’s needed. Important point: We never computed pairwise ANDS of class value and feature value pTrees, because we ‘ve bucket sorted the TrainingSet by class (a fast sort) before creating our value pTrees. And, again, once the CWC operator has created the table of class OneCounts for each feature column (e.g., below) the rest of the InfoGain (or Correlation or Chi Square) calculation is scalar arithmetic. Alternatively, use “Variable Length Stride” Multilevel pTrees, where leaf stride lengths are class sizes. For this Classification TrainingSet, we’d use 2-level pTrees with 5 leaf stride of lengths, 2,4,2,4,4 resp: 1. Rough [bucket] sort by class C=B1 Peano Ordered Training TableB1-vmaps B3-vmaps B4-vmaps x,y rrn CL=B2 B1 237af B3 458 B4 bf 0,0 0 7 3 01000 8 001 11 10 0,1 1 3 3 01000 8 001 15 01 1,0 2 7 3 01000 8 001 11 10 1,1 3 3 3 01000 8 001 11 10 0,2 4 3 7 00100 4 100 11 10 0,3 5 2 7 00100 5 010 11 10 1,2 6 3 7 00100 4 100 11 10 1,3 7 2 7 00100 5 010 11 10 2,0 8 11 2 10000 8 001 15 01 2,1 9 11 2 10000 8 001 15 01 3,0 a 11 2 10000 8 001 15 01 3,1 b 11 10 00010 8 001 15 01 2,2 c 10 10 00010 4 100 11 10 2,3 d 10 15 00001 4 100 11 10 X.Class.16:2,4,2,4,4.PB4=f 3,2 e 10 15 00001 4 100 11 10 3,3 f 10 15 00001 4 100 11 10 5:0,1,0,0,4 Terminology: “Class” pTree (ordered by class. List the variable stride lengths in the name. Included in the root node (since not used for anything else) total Count of full pTree followed by the 5 class counts. This should all be pre-computed and store with the pTree(then our table can be read off the root of the pTree). 1000 00 1111 00 0000 Next slides we will give more justification for the pTree rule: CW Rule: Always order any Classification Training Set by Class!(then convert to pTrees). I.e., Training Sets should always be Class Ordered!!! Click megaphones for audio CWC(PB1=7,2,6,8,c) CWC(PB1=a,2,6,8,c) etc... 3. Apply ClassWiseCount to each feature attribute Value Map pTree CWC(P,ClassOffsetList) CWC(PB4=f,2,6,8,c) etc. CWC(PB1=2,2,6,8,c) ClassStartOffsets CWC(PB1=3,2,6,8,c) • Create value pTrees for feature attributes • B1, B3, B4 X (after Class sort)B1vpTrees B3-vpTrees B4-vpTrees x,y rrn class B1 237af B3 458 B4 bf P1=f P1=2 P4=b P1=3 P1=7 P1=a P3=4 P4=f P3=5 P3=8 C= 2 bucket 0 2 0 2 0 0 1 2 3 4 5 6 7 8 9 a b c d e f 0 0 2 0 0 C= 3 bucket 0 3 2 0 2 0 2 2 0 1 C= 7 bucket 0 2 0 0 2 0 2 0 0 0 C=10 bucket 3 4 4 0 0 0 0 0 1 0 C=11 bucket 0 0 0 1 4 3 0 0 1 4 CWCT: ClassValue&FeatureVaue Count table

  3. APPENDIX: RSI2 C=B1, ClassOrdered Band2=Green 7 3 3 2 7 3 3 2 11 11 10 10 11 11 10 10 Band5=NIR: 2 2 7 7 2 2 7 7 2 2 15 15 2 15 15 15 Band3=Red 8 8 4 5 8 8 4 5 8 8 4 4 8 8 4 4 Band4=Blue 11 15 11 11 11 11 11 11 15 15 11 11 15 15 11 11 Band6=MIR 11 11 11 11 11 11 11 11 15 15 11 11 15 15 11 11 Band7=FIR: 8 8 4 5 8 8 4 5 8 8 8 8 8 8 8 8 Band8=TIR 2 3 7 7 3 3 7 7 2 2 15 15 2 15 15 15 Band1=Yield 3 3 7 7 3 3 7 7 2 2 10 15 2 10 15 15 Band9=UV2 15 15 15 15 15 15 15 15 2 2 15 15 2 15 15 15 Bandd=UVd 0 0 0 0 0 0 0 0 0 0 0 15 0 0 15 15 Banda=UVa: 3 3 15 15 3 3 15 15 15 15 15 15 15 15 15 15 Bandb=UVb 15 15 7 7 15 15 7 7 15 15 15 15 15 15 15 15 Bandc=UVc 0 0 0 0 0 0 0 0 0 0 10 0 0 10 0 0 CWCT (Table of OneCounts of all AND of ClassValueBitMaps with FeatureValueBitMaps) FeatureNuber 1 1 1 1 1 2 2 2 2 2 3 3 3 4 4 5 5 5 6 6 FeatureValue 2 3 7 a f 2 3 7 a b 4 5 8 b f 2 7 f b f Class C=3 0 4 0 0 0 0 2 2 0 0 0 0 4 3 1 4 0 0 4 0 Value C=7 0 0 4 0 0 1 2 0 0 0 2 2 0 4 0 0 4 0 4 0 C=2 3 0 0 0 0 0 0 0 0 3 0 0 3 0 3 3 0 0 0 3 C=10 0 0 0 2 0 0 0 0 1 1 1 0 1 1 1 0 0 2 1 1 C=15 0 0 0 0 3 0 0 0 3 0 3 0 0 3 0 0 0 3 3 0 7 7 7 8 8 8 8 9 9 a a b b c c d d 4 5 8 2 3 7 f 2 f 3 f 7 f 0 a 0 f 0 0 4 1 3 0 0 0 4 4 0 0 4 4 0 4 0 2 2 0 0 0 4 0 0 4 0 4 4 0 4 0 4 0 0 0 3 3 0 0 3 0 0 3 0 3 3 0 3 0 0 0 2 0 0 0 2 0 2 0 2 0 2 0 2 2 0 0 0 3 0 0 0 3 0 3 0 3 0 3 3 0 0 3 P1=15 3 P2=11 4 P1=2 3 P2=2 2 P1=3 4 P1=7 4 P1=10 2 P2=3 4 P2=7 2 P2=10 4 P1,3 5 P1,2 7 P1,1 16 P1,0 11 P2,3 8 P2,2 2 P2,1 16 P2,0 10 P4,3 16 P4,2 5 P4,1 16 P5=15 5 P5,3 5 P5,2 9 P5,1 16 P5,0 9 P3,2 10 P3,1 6 P3,0 2 P5=2 7 P4,0 16 P5=7 4 P6,3 16 P6,2 4 P3=4 6 P3=5 2 P3=8 8 P4=11 11 P4=15 5 P6,1 16 P6=15 4 P6,0 16 P6=11 12 CWCT: ClassWise Count Table (Class-value & Feature-value Count table) Feature Band#-> 1 1 1 1 1 2 2 2 2 2 3 3 3 4 4 5 5 5 6 6 Feature Value-> 2 3 7 a f 2 3 7 a b 4 5 8 b f 2 7 f b f Class: C=3 0 4 0 0 0 0 2 2 0 0 0 0 4 3 1 4 0 0 4 0 C=7 0 0 4 0 0 1 2 0 0 0 2 2 0 4 0 0 4 0 4 0 C=2 3 0 0 0 0 0 0 0 0 3 0 0 3 0 3 3 0 0 0 3 C=10 0 0 0 2 0 0 0 0 1 1 1 0 1 1 1 0 0 2 1 1 C=15 0 0 0 0 3 0 0 0 3 0 3 0 0 3 0 0 0 3 3 0 7 3 7 3 3 2 3 2 11 11 11 11 10 10 10 10 B2 =G 8 8 8 8 4 5 4 5 8 8 8 8 4 4 4 4 B3 =R 11 15 11 11 11 11 11 11 15 15 15 15 11 11 11 11 B4 =B 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 3 3 3 3 7 7 7 7 2 2 2 10 10 15 15 15 B1 =Y 0000000000011111 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0000000011111111 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 2 2 2 2 7 7 7 7 2 2 2 15 15 15 15 15 B5 =N 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0000000000011111 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 01 1 1 1 1 0 0 0 0 0 0 0 0 1 0 10 0 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 11 11 11 11 11 11 11 11 15 15 15 15 11 11 11 11 B6 =MIR 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 pTree  rc pTree  rc x,y 0,0 0,1 1,0 1,1 0,2 0,3 1,2 1,3 2,0 2,1 3,0 3,1 2,2 2,3 3,2 3,3 x,y 0,0 0,1 1,0 1,1 0,2 0,3 1,2 1,3 2,0 2,1 3,0 3,1 2,2 2,3 3,2 3,3 Feature Band#-> 7 7 7 8 8 8 8 9 9 a a b b c c d d Feature Value-> 4 5 8 2 3 7 f 2 f 3 f 7 f 0 a 0 f Class: C=3 0 0 4 1 3 0 0 0 4 4 0 0 4 4 0 4 0 C=7 2 2 0 0 0 4 0 0 4 0 4 4 0 4 0 4 0 C=2 0 0 3 3 0 0 0 3 0 0 3 0 3 3 0 3 0 C=10 0 0 2 0 0 0 2 0 2 0 2 0 2 0 2 2 0 C=15 0 0 3 0 0 0 3 0 3 0 3 0 3 3 0 0 3 P7,3 12 P7,2 4 P7,1 0 P9,3 13 P9,2 13 Pa,3 12 Pa,2 12 Pb,3 12 Pb,2 16 Pc,3 2 Pc,2 0 Pd,3 3 Pd,2 3 P7,0 2 P9,1 16 Pa1 16 Pb1 16 Pc1 2 Pd,1 3 P8,3 5 P8,2 9 P8,1 16 P9=f 13 Pa=f 12 Pb=f 12 Pc=0 14 Pc=a 2 P7=4 2 P8,0 12 Pd=f 3 P7=5 2 P9,0 13 P9=2 3 Pa,0 16 Pa=3 4 Pb,0 16 Pb=7 4 Pc,0 0 Pd,0 3 P7=8 12 Pd=0 13 P8=15 5 P8=2 2 P8=3 3 P8=7 4 3 3 3 3 15 15 15 15 15 15 15 15 15 15 15 15 Ba =UVa 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1000 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 8 8 8 8 4 5 4 5 8 8 8 8 8 8 8 8 B7 =FIR 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 10 10 0 0 0 Bc =UVc 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 15 15 Bd =UVd 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 01 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 15 15 15 15 15 15 15 15 2 2 2 15 15 15 15 15 B9 =UV2 15 15 15 15 7 7 7 7 15 15 15 15 15 15 15 15 Bb =UVb 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 3 3 3 7 7 7 7 2 2 2 15 15 15 15 15 B8 =TIR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 01 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 Conjecture: 1. Using IGCi(B), which is the info gained by including feature attribute, B, in the one-class classification {Ci, notCi} picks out the best classification training set feature attributes, BFACi for classifying unclassified samples into C. Then for the overall classifier, might use {BFACi | i=1..c} This example was cooked up to examine the hypothesis (not prove it!) that “Do attribute selection on each individual class, C, using information gained by including feature attribute, B, to the class label attribute {C, notC} by each feature attribute

  4. What’s all this about ORDERINGS? Peano versus RowRaster versus Hilbert ordering for multilevel pTree on spatial datasets? Compare based on compression and processing speed (remembering that we don’t have to decompress to process).Which ordering is best? (for value pTrees as well as bitslicepTrees) We always recommend paying the one-time capture cost of computing and storing all numeric columns as bitslicepTrees and the categorical columns as bitmap pTrees (one bitmap for each extant value).(And/or, possible, do some numeric coding of categorical attributes and then treat the resulting coded column as we do naturally numeric columns). I recommend always also computing all value bitmaps of the individual numeric values and store those pTrees too. This is redundant but useful. In addition, if a value bitmap is sparse (either mostly 0’s or mostly 1’s) also store it as a list? We’d have to have processing procedures to handle both pTrees and lists. Below, we work through an example and establish a notation. Note we always put the one count as the root of the tree. For spatial datasets, the ordering of the table rows (prior to converting the table columns to pTrees) is an important consideration. Is ordering an important consideration for non-spatial datasets? On the next slide I try to sell the idea that for Classification TrainingSets, always order by class before pTree-izing! B1 3 3 7 7 3 3 7 7 2 2 10 15 2 10 15 15 Peano_pTree1,3 Hilbert_pTree1,3 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 1 0 1 1 1 1 0 0 0 0 P1=15 3 P1=2 3 P1=3 4 P1=7 4 P1=10 2 P1,3 5 P1,2 7 P1,1 16 P1,0 11 B1(Peano) RowRaster_pTree1,3 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 RowRaster16:2pTree1,3 x,y 0,0 0,1 1,0 1,1 0,2 0,3 1,2 1,3 2,0 2,1 3,0 3,1 2,2 2,3 3,2 3,3 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 3 3 3 3 7 7 7 7 2 2 2 10 10 15 15 15 0000000000011111 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 Click megaphone for audio B1(Peano) 3 3 7 7 3 3 7 7 2 2 10 15 2 10 15 15 B1(RowRaster) 3 3 7 7 3 3 7 7 2 2 10 15 2 10 15 15 B1(Hilbert) 3 3 7 7 3 3 7 7 2 2 10 15 2 10 15 15 HilbertpTree1,3 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 RowRaster16:4Ptree1,3 anout=4 RowRaster16:16Ptree1,3 back to uncompressed 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 RowRasterpTree1=3 RowRaster16:8Ptree1,3 Peano_pTree1=3 5 Hilbert16:2pTree1,3 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 5 5 RowRaster16:2pTree1=3 Peano16:2pTree1=3 5 0 0 0 0 0 0 5 0 0 0 0 0 1 0011 0 1 0111 0 0 5 0 0 1 0 0 0 00110011 1 0 0000000000110011 0 0 1 0 1 0 pTree ORDERINGS for non-SPATIAL DataSets, e.g., Mother Gooase Rhymes MGR) A pTree RULE: Always order Classification TrainSets by class upon capture, before pTree-izing Why? Assuming classk start offset in the pTrees is sk, the pairwise AND+OneCount operations can all be done in one Classwise Count Operation: CWC(P, s2,…,sm-1) which is programmed to return the counts of one’s between P bit position offsets, 0 and s2; s2+1 and s3, …, sm-1+1 to the end of the pTree. Taking MGR corpus, use the 13 clusters determined by r value as TrainSet (combine into 1 cluster 13 consisting of all with r<.10). Order the table using a rough bucket sort into classes (meaning the internal order of the docs in a given class is immaterial), then pTreeize. Peano16:2pTree1,3 Hilbert16:2pTree1,3 5 5 5 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 1 00 00 0 1 01 0 0 00 01 0 1 00 00 0 0 10 0 1 00 00 0 1 00 00 0 1 00 00 0 1 00 00 Then pTree ANDs are unnecessary! And only one application of the extended Count Op to each attribute value pTree creates the AND table. This should reduce processing from #classes*#features AND + Count operations to just #features Classwise Counts. Thus the (one-time pre-) processing time is reduced markedly! If there are 100,000 words, and 50 classes, this reduces the # of operations from 10,000,000 to 100,000! The trick is to engineer the CCount to a very efficient operation! Note this ordering is a good choice for any TrainingSet (text corpus or RSI or ??). Rather than use this text TrainingSet next we take the simpler onw

  5. ..doc frequency level-1 TermFreqPTrees (E.g., the predicate of tfP0: mod(sum(mdl-stride),2)=1) <--dfP0 ... Terms (Vocabulary) <--dfP2 0 1 0 2 1 0 0 0 0 0 df (cnt) 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 3 1 1 1 8 2 0 1 0 2 1 1 8 1 1 2 3 8 1 1 3 0 0 1 1 0 0 0 1 0 0 1 0 1 1 0 0 1 3 3 0 3 0 . . . 0 0 0 0 0 . . . 0 . . . 0 . . . 0 . . . ... tf0 0 . . . ... tf1 0 . . . ... tf 0 . . . 0 . . . 0 0 0 0 0 d=3 d=3 d=3 t=a t=again t=all ... doc=3 doc=3 doc=3 term=a trm=again term=all ... doc=1 d=1 d=1 term=a t=again t=all doc=1 doc=1 doc=1 term=a trm=again term=all d=2 d=2 d=2 t=a t=again t=all doc=2 doc=2 doc=2 term=a trm=again term=all 0 0 0 0 0 0 0 0 0 0 5 6 ...Term Frequency ... Term Existence ...documents ... tf2 ... tf0 ... tf1 1 0 0 Tweet3 0 0 1 0 . . . Tweet2 Tweet1 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . a apple 3 always. 1 1 an 1 again 1 1 April 1 are 3 and 2 all 7 1 2 3 4 Length of this level-1 TermExistencePTree =VocabLen*DocCount pred is NOTpure0 Length of this level-0 pTree= mdl*VocabLen*DocCount . . . 0 0 0 1 2 3 4 5 6 7 mdl reading-positions for doc=1, term=a (mdl = max doc length) 1 2 3 4 5 6 7 mdl reading-positions: doc=1, term=again 1 2 3 4 5 6 7 mdl reading-positions for doc=1, term=all 3D Document Term Position pTreeSet dfk isn't a level-2 pTree since it's not a predicate on level-1 te strides. Next slides shows how to do it differently so that even the dfk's come out as level-2 pTrees. 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . pTree Text Mining (from 2012_08_04 Data Cube Text Mining ... Positions

  6. level-2 PTree, hdfP?? (Hi Doc Feq): pred=NOTpure0 applied to tfP1 <--dfP0 Vocab Terms ..doc freq hdfP <--dfP3 1 0 2 0 1 doc1 doc2 doc3 0 0 0 0 0 df count 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 doc1 doc2 doc3 1 0 1 1 3 0 1 1 1 0 0 0 0 8 8 0 2 1 8 1 0 1 0 1 1 0 1 0 0 0 . . . 0 0 . . . 2 . . . 0 . . . 3 3 3 . . . . . . ... tfP1 . . . ... tf 0 . . . . . . ... tfP0 . . . 0 . . . . . . 0 0 0 0 0 tePt=all d=1 d=2 d=3 t=all t=all t=all ... doc=1 d=2 d=3 term=a t=a t=a d=1 d=2 d=3 t=again t=again t=again 0 0 0 0 0 0 0 0 0 0 tePt=again tePt=a tr=all t=all t=all doc1 doc2 doc3 ... t=again t=again t=again doc1 doc2 doc3 trm=a trm=a term=a doc1 doc2 doc3 5 6 ...doc ... tf2 ... tf0 ...Term Freq ... tf1 ... Term Ex 0 0 0 JSE 0 0 1 0 . . . HHS LMM 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . 1 always. 1 apple a 1 again 1 an 1 April 1 3 are 1 2 and all 7 1 2 3 4 These level-2 pTrees, dfPk have len= VocabLength level-1 PTrees, tfPk e.g., pred of tfP0: mod(sum(mdl-stride),2)=1 This one, overall, level-1 pTree, teP, has length = DocCount*VocabLength term=a doc2 term=a doc3 term=a doc1 term=again doc1 ... This one, overall, level-0 pTree, corpusP, has length = MaxDocLen*DocCount*VocabLen Corpus pTreeSet 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . pTree Text Mining data Cube layout: ... Pos

  7. level-2 PTree, hdfP?? (Hi Doc Feq): pred=NOTpure0 applied to tfP1 <--dfP0 Vocab Terms ..doc freq hdfP <--dfP3 1 2 1 0 0 doc1 doc2 doc3 0 0 0 0 0 df count 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 1 3 1 0 1 2 0 1 0 0 0 0 1 1 1 8 8 8 0 0 0 0 0 1 1 1 0 1 1 0 . . . 3 0 3 2 . . . . . . ... tfP0 . . . ... tf 3 0 . . . 0 . . . . . . ... tfP1 . . . . . . . . . 0 . . . 2 . . . 0 0 0 0 0 0 0 0 0 0 This overall, level-1 pTree, teP, has length = DocCount*VocabLength 0 0 0 0 0 tePt=again tePt=all tePt=a tr=all t=all t=all doc1 doc2 doc3 ... t=again t=again t=again doc1 doc2 doc3 trm=a trm=a term=a doc1 doc2 doc3 0 0 0 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 0 0 Pt=a,d=3 0 Pt=a,d=2 5 6 Verb pTree Refrncs pTree EndofSentence Preface pTree LastChpt pTree ...doc ... tf1 ... tf2 ... tf ... tf0 ... te 0 0 0 JSE 0 0 0 0 Pt=a,d=1. . . HHS LMM 0 0 0 0 0 0 0 Pt=again,d=1 0 0 0 0 0 0 0 . . . 1 1 1 a apple always. 1 1 April 1 again an all 2 and 3 1 are 7 1 2 3 4 These level-2 pTrees, dfPk have len= VocabLength doc1 doc2 doc3 level-1 PTrees, tfPk e.g., pred of tfP0: mod(sum(mdl-stride),2)=1 d=1 d=2 d=3 t=all t=all t=all ... doc=1 d=2 d=3 term=a t=a t=a d=1 d=2 d=3 t=again t=again t=again This overall level-0 pTree corpusP length MaxDocLen*DocCount*VocabLen term=again doc1 ... term=a doc3 term=a doc2 term=a doc1 Any of these masks can be ANDed into the Pt= , d= pTrees before they are concatenated as above (or repetitions of the mask can be ANDED after they are concatenated). 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . 0 0 0 0 0 0 0 . . . pTree Text Mining data Cube layout: ... Pos

  8. Create the CWCT (table of counts of all Class value bitmaps ANDed with all Feature Value bitmaps) for InfoGain, Correlation. Chi Square, etc. CWCT has #s needed for IG, Corr., Chi2 in attribute selection , DTI, etc. S1,j= 0 0 0 0 3 0 0 3 0 3 ct(Pc=2^PBk=aj) CWCT (Counts of ClassValues&FeatureValues) S2,j= 0 2 2 0 0 0 0 4 3 1 ct(Pc=3^PBk=aj) Th information needed to classify xX(A1,…,An,C), is I(C1..Cc) = i=1..mI(Ci), C={C1,…,Cc} S3,j= 2 2 0 0 0 2 2 0 4 0 ct(Pc=7^PBk=aj) I(Ci)  - pCi log2(pCi) where pCi  |Ci|/|X| = ct(PCi=1)/|X| S4,j= 0 0 0 1 1 1 0 1 1 1 ct(Pc=10^PBk=aj) 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 S5,j= 0 0 0 3 0 3 0 0 3 0 ct(Pc=15^PBk=aj) C1=2 C2=3 C3=7 C4=10 C5=15 IN THIS EXAMPLE: 3 4 4 2 3 |Ci| 0.1875 0.25 0.25 0.125 0.187 pCj=|Ci|/16 -2.4150 -2 -2 -3 -2.41 log2(pCi) -0.4528 -0.5 -0.5 -0.37 -0.45 pCilog2(pCi) 2.280 -SUM(pCilog2(pCi)=I(C1..C5) Click megaphone for audio sij=s1,j+..+s5,j= 2 4 2 4 4 6 2 8 115 The Classification Info Gained by including attribute B={B1..Bb} in the Training Set is: P1j = 0 0 0 0 .75 0 0 .375 0 .6 P2j = 0 .5 .5 0 0 0 0 .5 .273 .2 P3j = 1 .5 0 0 0 .33 1 0 .363 0 P4j = 0 0 0 .25 .25 .17 0 .125 .091 .2 P5j = 0 0 0 .75 0 .5 0 0 .273 0 Using the same notational scheme : |Cij|= ct(PC=Ci&PB=Bj); pCij = ct(PC=Ci&PB=Bj)/|Bi| GainC(B) = I(C1, …, Cc) - i=1..c, j=1..b (pBj)*I(Ci1..Cib) 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 - j=1..b( pBj)*j=1..b I(Cij) 0 0 0 0 .75 0 0 .53 0 .44 0 .5 .5 0 0 0 0 .5 .51 .46 0 .5 0 0 0 .52 0 0 .53 0 0 0 0 .5 .5 .43 0 .37 .31 .46 0 0 0 .31 0 .5 0 0 .51 0 -p1j*log2(p1j) -p2j*log2(p2j) -p3j*log2(p3j) -p4j*log2(p4j) -p5j*log2(p5j) (s1j+..+s5j)*I(s1j..s5j)/16 0 .25 .13 .2 .31 .54 0 .7 .127 .43 GAIN(B2)=2.81-.89 =1.92 GAIN(B3)=2.81-1.24 =1.57 GAIN(B4)=2.81-1.70 =1.11 Pc=2 Pc=2 Pc=2 Pc=2 Pc=2 Pc=2 Pc=2 Pc=2 Pc=2 Pc=2 + j=1..b (|Bj|/|X|)*j=1..b(pCij)*(log2pCij) - i=1..c[ (ctPCi=1/|X|) log2(ctPCi=1/|X|) GainC(B) = + j=1..b(ctPB=Bj/|X|) * ct(PB=Bj&PC=Ci)/|Bj| ) j=1..b(ct(PB=Bj&PC=Ci)/|Bj|*(log2( P3,3 8 P1,2 7 P2,1 16 P3,2 8 P2,0 10 P3,0 2 P2,3 8 P4,1 16 P2,2 2 P1,3 ct=5 P3,1 0 Pc=7 4 P4,2 5 P1,1 16 P1,0 11 Pc=3 4 P4,0 16 P4,3 16 Pc=10 2 Pc=15 3 Pc=2 3 Pc=15 Pc=15 Pc=15 Pc=15 Pc=15 Pc=15 Pc=15 Pc=15 Pc=15 Pc=15 Pc=7 Pc=7 Pc=7 Pc=7 Pc=7 Pc=7 Pc=7 Pc=7 Pc=7 Pc=7 Pc=3 Pc=3 Pc=3 Pc=3 Pc=3 Pc=3 Pc=3 Pc=3 Pc=3 Pc=3 Pc=10 Pc=10 Pc=10 Pc=10 Pc=10 Pc=10 Pc=10 Pc=10 Pc=10 Pc=10 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0000000000110111 1 1 1 1 1 1 1 1 0 0 0 1 0 0 1 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 1 0 1 1 1 0 1 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 0 1 1 16 pixel spatial dataset with values[0,15]. The class is C=BandB1. Shown is the spatial pixel arrangement and the resulting table, S, where the pixels are Peano ordered. Band B1: Band B2: Band B3: Band B4: 3 3 7 7 7 3 3 2 8 8 4 5 11 15 11 11 3 3 7 7 7 3 3 2 8 8 4 5 11 11 11 11 2 2 10 15 11 11 10 10 8 8 4 4 15 15 11 11 2 10 15 15 11 11 10 10 8 8 4 4 15 15 11 11 S=TrainingSet with Peano ordering of pixels and values expressed in 4 bit binary. ValueMap (or value) pTrees BitSlice pTrees (or just bit pTrees) For C=B1 For B2 For B3 For B4 PB3=4 PB3=4 6 PB2=3 4 PB2=3 PB2=7 2 PB2=7 PB2=10 PB2=10 4 PB2=11 PB2=11 4 PB3=5 PB3=5 2 PB3=8 8 PB3=8 PB4=11 11 PB4=11 PB4=15 PB4=15 5 PB2=2 PB2=2 2 S: X-Y B1 B2 B3 B4 0,0 0011 0111 1000 1011 0,1 0011 0011 1000 1111 0,2 0111 0011 0100 1011 0,3 0111 0010 0101 1011 1,0 0011 0111 1000 1011 1,1 0011 0011 1000 1011 1,2 0111 0011 0100 1011 1,3 0111 0010 0101 1011 2,0 0010 1011 1000 1111 2,1 0010 1011 1000 1111 2,2 1010 1010 0100 1011 2,3 1111 1010 0100 1011 3,0 0010 1011 1000 1111 3,1 1010 1011 1000 1111 3,2 1111 1010 0100 1011 3,3 1111 1010 0100 1011 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 1 0 0 0 1 0 0 0 1 1 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0 1 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0

  9. RSI-2 C=B1, ClassOrdered Band2=Green 7 3 3 2 7 3 3 2 11 11 10 10 11 11 10 10 Band5=NIR: 2 2 7 7 2 2 7 7 2 2 15 15 2 15 15 15 Band3=Red 8 8 4 5 8 8 4 5 8 8 4 4 8 8 4 4 Band4=Blue 11 15 11 11 11 11 11 11 15 15 11 11 15 15 11 11 Band6=MIR 11 11 11 11 11 11 11 11 15 15 11 11 15 15 11 11 Band7=FIR: 8 8 4 5 8 8 4 5 8 8 8 8 8 8 8 8 Band8=TIR 2 3 7 7 3 3 7 7 2 2 15 15 2 15 15 15 Band1=Yield 3 3 7 7 3 3 7 7 2 2 10 15 2 10 15 15 Band9=UV2 15 15 15 15 15 15 15 15 2 2 15 15 2 15 15 15 Bandd=UVd 0 0 0 0 0 0 0 0 0 0 0 15 0 0 15 15 Banda=UVa: 3 3 15 15 3 3 15 15 15 15 15 15 15 15 15 15 Bandb=UVb 15 15 7 7 15 15 7 7 15 15 15 15 15 15 15 15 Bandc=UVc 0 0 0 0 0 0 0 0 0 0 10 0 0 10 0 0 CWCT: ClassWise Count Table (Class-value & Feature-value Count table) Feature Band#-> 1 1 1 1 1 2 2 2 2 2 3 3 3 4 4 5 5 5 6 6 Feature Value-> 2 3 7 a f 2 3 7 a b 4 5 8 b f 2 7 f b f Class: C=3 0 4 0 0 0 0 2 2 0 0 0 0 4 3 1 4 0 0 4 0 C=7 0 0 4 0 0 1 2 0 0 0 2 2 0 4 0 0 4 0 4 0 C=2 3 0 0 0 0 0 0 0 0 3 0 0 3 0 3 3 0 0 0 3 C=10 0 0 0 2 0 0 0 0 1 1 1 0 1 1 1 0 0 2 1 1 C=15 0 0 0 0 3 0 0 0 3 0 3 0 0 3 0 0 0 3 3 0 P1=15 3 P2=11 4 P1=2 3 P2=2 2 P1=3 4 P1=7 4 P1=10 2 P2=3 4 P2=7 2 P2=10 4 P1,3 5 P1,2 7 P1,1 16 P1,0 11 P2,3 8 P2,2 2 P2,1 16 P2,0 10 P4,3 16 P4,2 5 P4,1 16 P5=15 5 P5,3 5 P5,2 9 P5,1 16 P5,0 9 P3,2 10 P3,1 6 P3,0 2 P5=2 7 P4,0 16 P5=7 4 P6,3 16 P6,2 4 P3=4 6 P3=5 2 P3=8 8 P4=11 11 P4=15 5 P6,1 16 P6=15 4 P6,0 16 P6=11 12 7 3 7 3 3 2 3 2 11 11 11 11 10 10 10 10 B2 =G 8 8 8 8 4 5 4 5 8 8 8 8 4 4 4 4 B3 =R 11 15 11 11 11 11 11 11 15 15 15 15 11 11 11 11 B4 =B 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 3 3 3 3 7 7 7 7 2 2 2 10 10 15 15 15 B1 =Y 0000000000011111 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0000000011111111 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 2 2 2 2 7 7 7 7 2 2 2 15 15 15 15 15 B5 =N 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0000000000011111 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 0 1 01 1 1 1 1 0 0 0 0 0 0 0 0 1 0 10 0 0 0 0 1 1 1 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 11 11 11 11 11 11 11 11 15 15 15 15 11 11 11 11 B6 =MIR 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 pTree  rc pTree  rc x,y 0,0 0,1 1,0 1,1 0,2 0,3 1,2 1,3 2,0 2,1 3,0 3,1 2,2 2,3 3,2 3,3 x,y 0,0 0,1 1,0 1,1 0,2 0,3 1,2 1,3 2,0 2,1 3,0 3,1 2,2 2,3 3,2 3,3 Feature Band#-> 7 7 7 8 8 8 8 9 9 a a b b c c d d Feature Value-> 4 5 8 2 3 7 f 2 f 3 f 7 f 0 a 0 f Class: C=3 0 0 4 1 3 0 0 0 4 4 0 0 4 4 0 4 0 C=7 2 2 0 0 0 4 0 0 4 0 4 4 0 4 0 4 0 C=2 0 0 3 3 0 0 0 3 0 0 3 0 3 3 0 3 0 C=10 0 0 2 0 0 0 2 0 2 0 2 0 2 0 2 2 0 C=15 0 0 3 0 0 0 3 0 3 0 3 0 3 3 0 0 3 P7,3 12 P7,2 4 P7,1 0 P9,3 13 P9,2 13 Pa,3 12 Pa,2 12 Pb,3 12 Pb,2 16 Pc,3 2 Pc,2 0 Pd,3 3 Pd,2 3 P7,0 2 P9,1 16 Pa1 16 Pb1 16 Pc1 2 Pd,1 3 P8,3 5 P8,2 9 P8,1 16 P9=f 13 Pa=f 12 Pb=f 12 Pc=0 14 Pc=a 2 P7=4 2 P8,0 12 Pd=f 3 P7=5 2 P9,0 13 P9=2 3 Pa,0 16 Pa=3 4 Pb,0 16 Pb=7 4 Pc,0 0 Pd,0 3 P7=8 12 Pd=0 13 P8=15 5 P8=2 2 P8=3 3 P8=7 4 3 3 3 3 15 15 15 15 15 15 15 15 15 15 15 15 Ba =UVa 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1000 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 8 8 8 8 4 5 4 5 8 8 8 8 8 8 8 8 B7 =FIR 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 10 10 0 0 0 Bc =UVc 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 15 15 15 Bd =UVd 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 01 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 15 15 15 15 15 15 15 15 2 2 2 15 15 15 15 15 B9 =UV2 15 15 15 15 7 7 7 7 15 15 15 15 15 15 15 15 Bb =UVb 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 3 3 3 7 7 7 7 2 2 2 15 15 15 15 15 B8 =TIR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 01 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1

  10. RSI-2 C=B1 ClassWise order B1=Y 3 3 7 7 3 3 7 7 2 2 a f 2 a f f Bd=UVd 0 0 0 0 0 0 0 0 0 0 0 f 0 0 f f B5=NIR: 2 2 7 7 2 2 7 7 2 2 f f 2 f f f B3=R 8 8 4 5 8 8 4 5 8 8 4 4 8 8 4 4 B4=B b f b b b b b b f f b b f f b b B6=MIR b b b b b b b b f f b b f f b b B7=FIR: 8 8 4 5 8 8 4 5 8 8 8 8 8 8 8 8 B8=TIR 2 3 7 7 3 3 7 7 2 2 f f 2 f f f B2=G 7 3 3 2 7 3 3 2 b b a a b b a a Ba=UVa: 3 3 f f 3 3 f f f f f f f f f f Bb=UVb f f 7 7 f f 7 7 f f f f f f f f Bc=UVc 0 0 0 0 0 0 0 0 0 0 a 0 0 a 0 0 B9=UV2 f f f f f f f f 2 2 f f 2 2 f f CWCT (Table of OneCounts of all AND of ClassValueBitMaps with FeatureValueBitMaps) FeatureNuber 1 1 1 1 1 2 2 2 2 2 3 3 3 4 4 5 5 5 6 6 FeatureValue 2 3 7 a f 2 3 7 a b 4 5 8 b f 2 7 f b f Class C=3 0 4 0 0 0 0 2 2 0 0 0 0 4 3 1 4 0 0 4 0 Value C=7 0 0 4 0 0 1 2 0 0 0 2 2 0 4 0 0 4 0 4 0 C=2 3 0 0 0 0 0 0 0 0 3 0 0 3 0 3 3 0 0 0 3 C=10 0 0 0 2 0 0 0 0 1 1 1 0 1 1 1 0 0 2 1 1 C=15 0 0 0 0 3 0 0 0 3 0 3 0 0 3 0 0 0 3 3 0 7 7 7 8 8 8 8 9 9 a a b b c c d d 4 5 8 2 3 7 f 2 f 3 f 7 f 0 a 0 f 0 0 4 1 3 0 0 0 4 4 0 0 4 4 0 4 0 2 2 0 0 0 4 0 0 4 0 4 4 0 4 0 4 0 0 0 3 3 0 0 3 0 0 3 0 3 3 0 3 0 0 0 2 0 0 0 2 0 2 0 2 0 2 0 2 2 0 0 0 3 0 0 0 3 0 3 0 3 0 3 3 0 0 3 Calculation of entropy over all attributes wrt 1-class classification, C2=3, notC2 Calculation of entropy over all attributes wrt 1-class classification, C3=7 notC3 Calculation of entropy over all attributes wrt 1-class classification, C1=2, notC1 Calculation of entropy over all attributes wrt 1-class classification, C5=15, notC5 This is the calculation of entropy over all attributes with respect to all classes Calculation of entropy over all attributes wrt 1-class classification, C4=10 notC4 1 1 3 1 2 3 2 1 3 2 2 3 1 3 7 1 4 7 2 3 7 2 4 7 3 1 2 3 2 2 4 1 2 4 2 10 3 3 10 3 4 15 4 3 15 4 4 15 X Y B1 E(Bj)= 1 1 3 1 2 3 2 1 3 2 2 3 1 3 7 1 4 7 2 3 7 2 4 7 3 1 2 3 2 2 4 1 2 4 2 10 3 3 10 3 4 15 4 3 15 4 4 15 X Y B1 E(Bj)= 1 1 3 1 2 3 2 1 3 2 2 3 1 3 7 1 4 7 2 3 7 2 4 7 3 1 2 3 2 2 4 1 2 4 2 10 3 3 10 3 4 15 4 3 15 4 4 15 X Y B1 E(Bj)= 1 1 3 1 2 3 2 1 3 2 2 3 1 3 7 1 4 7 2 3 7 2 4 7 3 1 2 3 2 2 4 1 2 4 2 10 3 3 10 3 4 15 4 3 15 4 4 15 X Y B1 E(Bj)= 1 1 3 1 2 3 2 1 3 2 2 3 1 3 7 1 4 7 2 3 7 2 4 7 3 1 2 3 2 2 4 1 2 4 2 10 3 3 10 3 4 15 4 3 15 4 4 15 X Y B1 E(Bj)= 7 8 11 2 11 8 2 15 3 15 0 0 3 8 15 2 11 8 3 15 3 15 0 0 7 8 11 2 11 8 3 15 3 15 0 0 3 8 11 2 11 8 3 15 3 15 0 0 3 4 11 7 11 4 7 15 15 7 0 0 2 5 11 7 11 5 7 15 15 7 0 0 3 4 11 7 11 4 7 15 15 7 0 0 2 5 11 7 11 5 7 15 15 7 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 15 15 8 15 15 15 15 10 0 10 4 11 15 11 8 15 15 15 15 10 0 10 4 11 15 11 8 15 15 15 15 0 15 10 4 11 15 11 8 15 15 15 15 0 15 10 4 11 15 11 8 15 15 15 15 0 15 B2 B3 B4 B5 B6 B7 B8 B9 Ba Bb Bc Bd 0 .02 .03 0 .03 0 0 0 0 0 .03 0 7 8 11 2 11 8 2 15 3 15 0 0 3 8 15 2 11 8 3 15 3 15 0 0 7 8 11 2 11 8 3 15 3 15 0 0 3 8 11 2 11 8 3 15 3 15 0 0 3 4 11 7 11 4 7 15 15 7 0 0 2 5 11 7 11 5 7 15 15 7 0 0 3 4 11 7 11 4 7 15 15 7 0 0 2 5 11 7 11 5 7 15 15 7 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 15 15 8 15 15 15 15 10 0 10 4 11 15 11 8 15 15 15 15 10 0 10 4 11 15 11 8 15 15 15 15 0 15 10 4 11 15 11 8 15 15 15 15 0 15 10 4 11 15 11 8 15 15 15 15 0 15 B2 B3 B4 B5 B6 B7 B8 B9 Ba Bb Bc Bd 0 0 0 .02 0 0 .01 0 0 0 .03 .03 7 8 11 2 11 8 2 15 3 15 0 0 3 8 15 2 11 8 3 15 3 15 0 0 7 8 11 2 11 8 3 15 3 15 0 0 3 8 11 2 11 8 3 15 3 15 0 0 3 4 11 7 11 4 7 15 15 7 0 0 2 5 11 7 11 5 7 15 15 7 0 0 3 4 11 7 11 4 7 15 15 7 0 0 2 5 11 7 11 5 7 15 15 7 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 15 15 8 15 15 15 15 10 0 10 4 11 15 11 8 15 15 15 15 10 0 10 4 11 15 11 8 15 15 15 15 0 15 10 4 11 15 11 8 15 15 15 15 0 15 10 4 11 15 11 8 15 15 15 15 0 15 B2 B3 B4 B5 B6 B7 B8 B9 Ba Bb Bc Bd 0 0 .04 .02 .03 0 .01 0 0 0 .04 .04 7 8 11 2 11 8 2 15 3 15 0 0 3 8 15 2 11 8 3 15 3 15 0 0 7 8 11 2 11 8 3 15 3 15 0 0 3 8 11 2 11 8 3 15 3 15 0 0 3 4 11 7 11 4 7 15 15 7 0 0 2 5 11 7 11 5 7 15 15 7 0 0 3 4 11 7 11 4 7 15 15 7 0 0 2 5 11 7 11 5 7 15 15 7 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 15 15 8 15 15 15 15 10 0 10 4 11 15 11 8 15 15 15 15 10 0 10 4 11 15 11 8 15 15 15 15 0 15 10 4 11 15 11 8 15 15 15 15 0 15 10 4 11 15 11 8 15 15 15 15 0 15 B2 B3 B4 B5 B6 B7 B8 B9 Ba Bb Bc Bd 0 .01 .03 0 .03 0 0 0 0 0 .04 .04 7 8 11 2 11 8 2 15 3 15 0 0 3 8 15 2 11 8 3 15 3 15 0 0 7 8 11 2 11 8 3 15 3 15 0 0 3 8 11 2 11 8 3 15 3 15 0 0 3 4 11 7 11 4 7 15 15 7 0 0 2 5 11 7 11 5 7 15 15 7 0 0 3 4 11 7 11 4 7 15 15 7 0 0 2 5 11 7 11 5 7 15 15 7 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 15 15 8 15 15 15 15 10 0 10 4 11 15 11 8 15 15 15 15 10 0 10 4 11 15 11 8 15 15 15 15 0 15 10 4 11 15 11 8 15 15 15 15 0 15 10 4 11 15 11 8 15 15 15 15 0 15 B2 B3 B4 B5 B6 B7 B8 B9 Ba Bb Bc Bd 0 .02 .01 0 .01 0 0 0 0 0 0 .01 1 1 3 1 2 3 2 1 3 2 2 3 1 3 7 1 4 7 2 3 7 2 4 7 3 1 2 3 2 2 4 1 2 4 2 10 3 3 10 3 4 15 4 3 15 4 4 15 X Y B1 E(Bj)= 7 8 11 2 11 8 2 15 3 15 0 0 3 8 15 2 11 8 3 15 3 15 0 0 7 8 11 2 11 8 3 15 3 15 0 0 3 8 11 2 11 8 3 15 3 15 0 0 3 4 11 7 11 4 7 15 15 7 0 0 2 5 11 7 11 5 7 15 15 7 0 0 3 4 11 7 11 4 7 15 15 7 0 0 2 5 11 7 11 5 7 15 15 7 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 2 15 8 2 2 15 15 0 0 11 8 15 15 15 8 15 15 15 15 10 0 10 4 11 15 11 8 15 15 15 15 10 0 10 4 11 15 11 8 15 15 15 15 0 15 10 4 11 15 11 8 15 15 15 15 0 15 10 4 11 15 11 8 15 15 15 15 0 15 B2 B3 B4 B5 B6 B7 B8 B9 Ba Bb Bc Bd 0 .11 .19 .04 0 0 .02 0 0 0 .21 .19 Entropies of 1-class << 5class. ALG1, Use 0 entr attrs1-class {C1=2, notC1} B2,3,4,6,7,B,a,b Entr 1-class << 5class 0 entropy attr 1-class. C2=3, B 2,3,7,9,a,b Entr 1-class << 5class 0 entropy attr 1-class. C4=a Entr 1-class << 5class 0 entropy attr 1-class. C5=f Entr 1-class << 5class 0 entropy attr 1-class. C3=7 7 2 8 2 15 3 15 0 3 2 8 3 15 3 15 0 7 2 8 3 15 3 15 0 3 2 8 3 15 3 15 0 3 7 4 7 15 15 7 0 2 7 5 7 15 15 7 0 3 7 4 7 15 15 7 0 2 7 5 7 15 15 7 0 11 2 8 2 2 15 15 0 11 2 8 2 2 15 15 0 11 2 8 2 2 15 15 0 11 15 8 15 15 15 15 0 10 15 8 15 15 15 15 0 10 15 8 15 15 15 15 15 10 15 8 15 15 15 15 15 10 15 8 15 15 15 15 15 B2 B5 B7 B8 B9 Ba Bb Bd 0 0 0 0 0 0 0 0 7 2 8 2 15 3 15 0 3 2 8 3 15 3 15 0 7 2 8 3 15 3 15 0 3 2 8 3 15 3 15 0 3 7 4 7 15 15 7 0 2 7 5 7 15 15 7 0 3 7 4 7 15 15 7 0 2 7 5 7 15 15 7 0 11 2 8 2 2 15 15 0 11 2 8 2 2 15 15 0 11 2 8 2 2 15 15 0 11 15 8 15 15 15 15 10 10 15 8 15 15 15 15 10 10 15 8 15 15 15 15 0 10 15 8 15 15 15 15 0 10 15 8 15 15 15 15 0 B2 B5 B7 B8 B9 Ba Bb Bc 0 0 0 0 0 0 0 0 7 2 8 2 15 3 15 3 2 8 3 15 3 15 7 2 8 3 15 3 15 3 2 8 3 15 3 15 3 7 4 7 15 15 7 2 7 5 7 15 15 7 3 7 4 7 15 15 7 2 7 5 7 15 15 7 11 2 8 2 2 15 15 11 2 8 2 2 15 15 11 2 8 2 2 15 15 11 15 8 15 15 15 15 10 15 8 15 15 15 15 10 15 8 15 15 15 15 10 15 8 15 15 15 15 10 15 8 15 15 15 15 B2 B5 B7 B8 B9 Ba Bb 0 0 0 0 0 0 0 7 8 8 15 3 15 3 8 8 15 3 15 7 8 8 15 3 15 3 8 8 15 3 15 3 4 4 15 15 7 2 5 5 15 15 7 3 4 4 15 15 7 2 5 5 15 15 7 11 8 8 2 15 15 11 8 8 2 15 15 11 8 8 2 15 15 11 8 8 15 15 15 10 4 8 15 15 15 10 4 8 15 15 15 10 4 8 15 15 15 10 4 8 15 15 15 B2 B3 B7 B9 Ba Bb 0 0 0 0 0 0 7 8 11 11 8 15 3 15 3 8 15 11 8 15 3 15 7 8 11 11 8 15 3 15 3 8 11 11 8 15 3 15 3 4 11 11 4 15 15 7 2 5 11 11 5 15 15 7 3 4 11 11 4 15 15 7 2 5 11 11 5 15 15 7 11 8 15 15 8 2 15 15 11 8 15 15 8 2 15 15 11 8 15 15 8 2 15 15 11 8 15 15 8 15 15 15 10 4 11 11 8 15 15 15 10 4 11 11 8 15 15 15 10 4 11 11 8 15 15 15 10 4 11 11 8 15 15 15 B2 B3 B4 B6 B7 B9 Ba Bb 0 0 0 0 0 0 0 0 Bc gen ≥10 gap on C4=a has slight internal gaps. Bd gen ≥15 gap on C5=f No internal gaps. B9 will open a gap≥13 on C1=2, which has no internal gaps. Ba gap ≥12 on C2=3, has slight internal gaps. B5,7,8,b gen gaps on C3=7 has slight internal gaps.

  11. Attribute Entropy for all Class versus One Class, on IRIS Value pTrees-> PL= PL= cl sl sw pl pw 10 11 s1j=s&PL=k 1 1 s2j=e&PL=k 0 0 s3j=i&PL=k 0 0 Aj 1 1 p1j 1 1 p2j 0 0 p3j 0 0 I=-sum 0 0 (ss+se+si)*I/150 0.13 0 0 Value pTrees-> PW= PW= … cl sl sw pl pw 1 2 s1j=s&PW=k 6 28 s2j=e&PW=k 0 0 s3j=i&PW=k 0 0 Aj 6 28 p1j 1 1 p2j 0 0 p3j 0 0 I=-sum 0 0 (ss+se+si)*I/150 0.14 0 0 Value pTrees-> SW= SW= cl sl sw pl pw 20 22 s1j=s&SW=k 0 0 s2j=e&SW=k 1 2 s3j=i&SW=k 0 1 Aj 1 3 p1j 0 0 p2j 1 0.66 p3j 0 0.33 I=-sum 0 0.91 (ss+se+si)*I/150 1.07 0 0.01 Value pTrees-> SL= SL= cl sl sw pl pw 3 44 s1j=s&SL=k 1 3 s2j=e&SL=k 0 0 s3j=i&SL=k 0 0 Aj 1 3 p1j 1 1 p2j 0 0 p3j 0 0 I=-sum 0 0 (ss+se+si)*I/150 0.708 0 0 Value pTrees-> PL= PL= cl sl sw pl pw 10 11 s1j=set&PL=k 1 1 s2j=NOTset&PL=k 0 0 Aj 1 1 p1j 1 1 p2j 0 0 p1jlog2p1j 0 0 p2jlog2p2j 0 0 I=-sum 0 0 (ss+sNOTs)*I/150 0 0 0 Value pTrees-> SL= SL= cl sl sw pl pw 43 44 s1j=set&SL=k 1 3 s2j=NOTset&SL=k 0 0 Aj 1 3 p1j 1 1 p2j 0 0 p1jlog2p1j 0 0 p2jlog2p2j 0 0 I=-sum 0 0 (ss+sNOTs)*I/150 0.27 0 0 Value pTrees-> PW= PW= cl sl sw pl pw 1 2 s1j=set&PW=k 6 28 s2j=NOTset&PW=k 0 0 Aj 6 28 p1j 1 1 p2j 0 0 p1jlog2p1j 0 0 p2jlog2p2j 0 0 I=-sum 0 0 (ss+sNOTs)*I/150 0 0 0 Value pTrees-> SW= SW= cl sl sw pl pw 20 22 s1j=set&SW=k 0 0 s2j=NOTset&SW=k 1 3 Aj 1 3 p1j 0 0 p2j 1 1 p1jlog2p1j 0 0 p2jlog2p2j 0 0 I=-sum 0 0 (ss+sNOTs)*I/150 0.50 0 0 Value pTrees-> PL= PL= cl sl sw pl pw 10 11 s1j=ver&PL=k 0 0 s2j=NOTver&PL=k 1 1 Aj 1 1 p1j 0 0 p2j 1 1 p1jlog2p1j 0 0 p2jlog2p2j 0 0 I=-sum 0 0 (s1j_s2j)*I/150 0.13 0 0 Value pTrees-> SL= SL= cl sl sw pl pw 43 44 s1j=ver&SL=k 0 0 s2j=NOTver&SL=k 1 3 Aj 1 3 p1j 0 0 p2j 1 1 p1jlog2p1j 0 0 p2jlog2p2j 0 0 I=-sum 0 0 (s1j+s2j)*I/150 0.64 0 0 Value pTrees-> PW= PW= cl sl sw pl pw 1 2 s1j=ver&PW=k 0 0 s2j=NOTver&PW=k 6 28 Aj 6 28 p1j 0 0 p2j 1 1 p1jlog2p1j 0 0 p2jlog2p2j 0 0 I=-sum 0 0 (s1j+s2j)*I/150 0.14 0 0 Value pTrees-> SW= SW= cl sl sw pl pw 20 22 s1j=ver&SW=k 1 2 s2j=NOTver&SW=k 0 1 Aj 1 3 p1j 1 0.66 p2j 0 0.33 p1jlog2p1j 0 -0.3 p2jlog2p2j 0 -0.5 I=-sum 0 0.91 (s1j+s2j)*I/150 0.68 0 0.01 Value pTrees-> PL= PL= cl sl sw pl pw 10 11 s1j=vir&PL=k 0 0 s2j=NOTvir&PL=k 1 1 Aj 1 1 p1j 0 0 p2j 1 1 p1jlog2p1j 0 0 p2jlog2p2j 0 0 I=-sum 0 0 (s1j_s2j)*I/150 0.13 0 0 Value pTrees-> SL= SL= cl sl sw pl pw 43 44 s1j=vir&SL=k 0 0 s2j=NOTvir&SL=k 1 3 Aj 1 3 p1j 0 0 p2j 1 1 p1jlog2p1j 0 0 p2jlog2p2j 0 0 I=-sum 0 0 (s1j+s2j)*I/150 0.45 0 0 Value pTrees-> PW= PW= cl sl sw pl pw 1 2 s1j=vir&PW=k 0 0 s2j=NOTvir&PW=k 6 28 Aj 6 28 p1j 0 0 p2j 1 1 p1jlog2p1j 0 0 p2jlog2p2j 0 0 I=-sum 0 0 (s1j+s2j)*I/150 0.14 0 0 Value pTrees-> SW= SW= cl sl sw pl pw 20 22 s1j=vir&SW=k 0 1 s2j=NOTvir&SW=k 1 2 Aj 1 3 p1j 0 0.33 p2j 1 0.66 p1jlog2p1j 0 -0.5 p2jlog2p2j 0 -0.3 I=-sum 0 0.91 (s1j+s2j)*I/150 0.78 0 0.01 Value pTrees-> PL= PL= cl sl sw pl pw 10 11 s1j=vir&PL=k 0 0 s2j=ver&PL=k 0 0 Aj 0 0 p1j 0 0 p2j 0 0 p1jlog2p1j 0 0 p2jlog2p2j 0 0 I=-sum 0 0 (s1j_s2j)*I/100 0.19 0 0 Value pTrees-> SL= SL= cl sl sw pl pw 43 44 s1j=vir&SL=k 0 0 s2j=ver&SL=k 0 0 Aj 0 0 p1j 0 0 p2j 0 0 p1jlog2p1j 0 0 p2jlog2p2j 0 0 I=-sum 0 0 (s1j+s2j)*I/100 0.65 0 0 Value pTrees-> PW= PW= cl sl sw pl pw 1 2 s1j=vir&PW=k 0 0 s2j=ver&PW=k 0 0 Aj 0 0 p1j 0 0 p2j 0 0 p1jlog2p1j 0 0 p2jlog2p2j 0 0 I=-sum 0 0 (s1j+s2j)*I/100 0.22 0 0 Value pTrees-> SW= SW= cl sl sw pl pw 20 22 s1j=vir&SW=k 0 1 s2j=ver&SW=k 1 2 Aj 1 3 p1j 0 0.33 p2j 1 0.66 p1jlog2p1j 0 -0.5 p2jlog2p2j 0 -0.3 I=-sum 0 0.91 (s1j+s2j)*I/100 0.85 0 0.02

  12. Using Attribute Entropy on MGR Text Corpus for Clustering: (Take one doc at a time and find all others that provide high Info Gain to it – cluster them together.) IGdh(dk) d=1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 25 26 27 28 29 30 32 33 35 36 37 38 39 41 42 43 44 45 46 47 48 49 50 d1.09 .1 .1 .06 .13 .12 .12 .13 .09 d2 .1 .13 .1 d3 .12 .1 .13 .1 d4 .09 .1 .12 .16 .13 .18 .12 d5 .12 .04 .12 .05 .04 .16 .03 .12 .06 .06 .09 d6 .09 .06 .09 .06 d7 .09 .12 .09 .09 .16 .13 .06 .06 .09 .1 .12 .09 .26 .12 .06 .06 .09 d8 .1 .06.06 .13 .1 .16 .13 .12 d9 .09 .13 .13 .09 .1 .16 .12 .1 .09 10 .09 .13 .09 .09 .1 .18 .12 .1 11 .09 .13 .12 .09 .1 .1 .06 .09 .16 .03 .1 .12 12 .13 .12 .09 .13 .12 13 .06 .20 .18 .06 .09 .06 .09 .13 .12 14 .1 .09 .12 .09 .1 .09 .06 .13 15 .06 .06 .09 .16 .1 16 .13 .09 .1 17 .1 .09 .09 .1 .12 .09 .06 .13 .13 .1 .09 18 .06 .09 .09 .1 21 .13 .16 .16 .16 .16 .16 .12 22 .16 .12 23 .1 .13 .12 .13 .12 .09 25 .13 .12 .09 .09 .12 26 .09.09 .09 .09 .13 .1 .09 .13 .09 27 .13 .09 .13 .13 .1 .09 .09 .13 28 .1 .06 .06 .13 .09 .16 .13 .12 .13 .09 .12 29 .06.06 .13 .09 .13 .09 .12 .13 .09 .13 .1 30 .1 .06 .13 .1 .1 .09 .16 .13 .12 .06 .18 32 .09.06 .09 .1 .09 .1 .06 .12 .06 .12 33 .13 .12 .06 .1 .09 .06 35 .06.09 .2 .09 .09 .09 .13 .06 .13 .12 ,06 .1 .09 .13 .12 .06 .13 .09 .16 36 .09 .09 .1 .1 .09 .16 37 .06 .09 .06 .09 .16 .13 38 .13 .09 .13 .16 .13 39 .06 .1 .1 .13 .13 .1 .12 .16 .12 .1 .1 .16 41 .1 .09 .13 .12 .09 .09 .06 .06 .13 .12 .09 .16 .13 ,09 42 .13 .13 .16 43 .13 .12 .13 .12 44 .09 .12 .09 .06 45 .2 .09 .12 .13 .09 .1 .09 46 .1.06.06 .09 .12 .1 .09 .13 .1 .16 .16 .09 .13 .12 47 .06.06 .1 .1 .13 .13 48 .1 .1 .06 .13 .16 .12 49 .06 .09 50 .12 .13 .06 .13 .13 .2 .12

  13. Using Attribute Entropy on MGR Text Corpus for Clustering: (Take one doc at a time and find all others that provide high Info Gain to it – cluster them together.) IGdh(dk) d=1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 25 26 27 28 29 30 32 33 35 36 37 38 39 41 42 43 44 45 46 47 48 49 50 d1.09 .1 .1 .06 .13 .12 .12 .13 .09 d2 .1 .13 .1 d3 .12 .1 .13 .1 d4 .09 .1 .12 .16 .13 .18 .12 d5 .12 .04 .12 .05 .04 .16 .03 .12 .06 .06 .09 d6 .09 .06 .09 .06 d7 .09 .12 .09 .09 .16 .13 .06 .06 .09 .1 .12 .09 .26 .12 .06 .06 .09 d8 .1 .06.06 .13 .1 .16 .13 .12 d9 .09 .13 .13 .09 .1 .16 .12 .1 .09 10 .09 .13 .09 .09 .1 .18 .12 .1 11 .09 .13 .12 .09 .1 .1 .06 .09 .16 .03 .1 .12 12 .13 .12 .09 .13 .12 13 .06 .20 .18 .06 .09 .06 .09 .13 .12 14 .1 .09 .12 .09 .1 .09 .06 .13 15 .06 .06 .09 .16 .1 16 .13 .09 .1 17 .1 .09 .09 .1 .12 .09 .06 .13 .13 .1 .09 18 .06 .09 .09 .1 21 .13 .16 .16 .16 .16 .16 .12 22 .16 .12 23 .1 .13 .12 .13 .12 .09 25 .13 .12 .09 .09 .12 26 .09.09 .09 .09 .13 .1 .09 .13 .09 27 .13 .09 .13 .13 .1 .09 .09 .13 28 .1 .06 .06 .13 .09 .16 .13 .12 .13 .09 .12 29 .06.06 .13 .09 .13 .09 .12 .13 .09 .13 .1 30 .1 .06 .13 .1 .1 .09 .16 .13 .12 .06 .18 32 .09.06 .09 .1 .09 .1 .06 .12 .06 .12 33 .13 .12 .06 .1 .09 .06 35 .06.09 .2 .09 .09 .09 .13 .06 .13 .12 ,06 .1 .09 .13 .12 .06 .13 .09 .16 36 .09 .09 .1 .1 .09 .16 37 .06 .09 .06 .09 .16 .13 38 .13 .09 .13 .16 .13 39 .06 .1 .1 .13 .13 .1 .12 .16 .12 .1 .1 .16 41 .1 .09 .13 .12 .09 .09 .06 .06 .13 .12 .09 .16 .13 ,09 42 .13 .13 .16 43 .13 .12 .13 .12 44 .09 .12 .09 .06 45 .2 .09 .12 .13 .09 .1 .09 46 .1.06.06 .09 .12 .1 .09 .13 .1 .16 .16 .09 .13 .12 47 .06.06 .1 .1 .13 .13 48 .1 .1 .06 .13 .16 .12 49 .06 .09 50 .12 .13 .06 .13 .13 .2 .12

  14. run 1 30 three cut two old 10 fall brown 5 14 33 21 48 men three fall crown buy girl king buy plum town old bread old bake buy back back 7 13 42 6 23 49 men tree maid bad bake sing men 36 day bake bread house town run 35 15 43 dog cloth 37 cloth 11 hill morn way dog day old nose 32 22 eat dish three day high fiddle round old son plum thumb king 18 44 47 bright wife 2 bed cock way 41 pig 3 8 son round men fiddle pie eat merry mother 17 three clean bed run cock always child away 26 29 4 old woman 12 away away 30 39 25 away 46 mother eat cry green baby pie eat boy 9 50 boy 28 16 bag lamb cry money mother baby 27 lamb 38 cry baby full mother 45 lady IGdh(dk) d=1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 21 22 23 25 26 27 28 29 30 32 33 35 36 37 38 39 41 42 43 44 45 46 47 48 49 50 50 .12 .13 .06 .13 .13 .2 .12

  15. MG44d60w: 44 MOTHER GOOSE RHYMES with a synonymized vocabulary of 60 WORDS 1. Three blind mice! See how they run! They all ran after the farmer's wife, who cut off their tails with a carving knife. Did you ever see such a thing in your life as three blind mice? 2. This little pig went to market. This little pig stayed at home. This little pig had roast beef. This little pig had none. This little pig said Wee, wee. I can't find my way home. 3. Diddle diddle dumpling, my son John. Went to bed with his breeches on, one stocking off, and one stocking on. Diddle diddle dumpling, my son John. 4. Little Miss Muffet sat on a tuffet, eating of curds and whey. There came a big spider and sat down beside her and frightened Miss Muffet away. 5. Humpty Dumpty sat on a wall. Humpty Dumpty had a great fall. All the Kings horses, and all the Kings men cannot put Humpty Dumpty together again. 6. See a pin and pick it up. All the day you will have good luck. See a pin and let it lay. Bad luck you will have all the day. 7. Old Mother Hubbard went to the cupboard to give her poor dog a bone. When she got there cupboard was bare and so the poor dog had none. She went to baker to buy him some bread. When she came back dog was dead. 8. Jack Sprat could eat no fat. His wife could eat no lean. And so between them both they licked the platter clean. 9. Hush baby. Daddy is near. Mamma is a lady and that is very clear. 10. Jack and Jill went up the hill to fetch a pail of water. Jack fell down, and broke his crown and Jill came tumbling after. When up Jack got and off did trot as fast as he could caper, to old Dame Dob who patched his nob with vinegar and brown paper. 11. One misty moisty morning when cloudy was the weather, I chanced to meet an old man clothed all in leather. He began to compliment and I began to grin. How do you do And how do you do? And how do you do again 12. There came an old woman from France who taught grown-up children to dance. But they were so stiff she sent them home in a sniff. This sprightly old woman from France. 13. A robin and a robins son once went to town to buy a bun. They could not decide on plum or plain. And so they went back home again. 14. If all the seas were one sea, what a great sea that would be! And if all the trees were one tree, what a great tree that would be! And if all the axes were one axe, what a great axe that would be! And if all the men were one man what a great man he would be! And if the great man took the great axe and cut down the great tree and let it fall into the great sea, what a splish splash that would be! 15. Great A. little a. This is pancake day. Toss the ball high. Throw the ball low. Those that come after may sing heigh ho! 16. Flour of England, fruit of Spain, met together in a shower of rain. Put in a bag tied round with a string. If you'll tell me this riddle, I will give you a ring. 17. Here sits the Lord Mayor. Here sit his two men. Here sits the cock. Here sits the hen. Here sit the little chickens. Here they run in. Chin chopper, chin chopper, chin chopper, chin! 18. I had two pigeons bright and gay. They flew from me the other day. What was the reason they did go? I can not tell, for I do not know. 21. The Lion and the Unicorn were fighting for the crown. The Lion beat the Unicorn all around the town. Some gave them white bread and some gave them brown. Some gave them plum cake, and sent them out of town. 22. I had a little husband no bigger than my thumb. I put him in a pint pot, and there I bid him drum. I bought a little handkerchief to wipe his little nose and a pair of little garters to tie his little hose. 23. How many miles is it to Babylon? Three score miles and ten. Can I get there by candle light? Yes, and back again. If your heels are nimble and light, you may get there by candle light. 25. There was an old woman, and what do you think? She lived upon nothing but victuals, and drink. Victuals and drink were the chief of her diet, and yet this old woman could never be quiet. 26. Sleep baby sleep. Our cottage valley is deep. Llittle lamb is on the green with woolly fleece so soft and clean. Sleep baby sleep. Sleep baby sleep, down where woodbines creep. Be always like lamb so mild, kind and sweet and gentle child. Sleep baby sleep. 27. Cry baby cry. Put your finger in your eye and tell your mother it was not I. 28. Baa baa black sheep, have you any wool? Yes sir yes sir, three bags full. One for my master and one for my dame, but none for the little boy who cries in the lane. 29. When little Fred went to bed, he always said his prayers. He kissed his mamma and then his papa, and straight away went upstairs. 30. Hey diddle diddle! The cat and the fiddle. The cow jumped over the moon. The little dog laughed to see such sport, and the dish ran away with the spoon. 32. Jack come and give me your fiddle, if ever you mean to thrive. No I will not give my fiddle to any man alive. If I should give my fiddle they will think that I've gone mad. For many a joyous day my fiddle and I have had 33. Buttons, a farthing a pair! Come, who will buy them of me? They are round and sound and pretty and fit for girls of the city. Come, who will buy them of me? Buttons, a farthing a pair! 35. Sing a song of sixpence, a pocket full of rye. Four and twenty blackbirds, baked in a pie. When the pie was opened, the birds began to sing. Was not that a dainty dish to set before the king? The king was in his counting house, counting out his money. The queen was in the parlor, eating bread and honey. The maid was in the garden, hanging out the clothes. When down came a blackbird and snapped off her nose. 36. Little Tommy Tittlemouse lived in a little house. He caught fishes in other mens ditches. 37. Here we go round mulberry bush, mulberry bush, mulberry bush. Here we go round mulberry bush, on a cold and frosty morning. This is way we wash our hands, wash our hands, wash our hands. This is way we wash our hands, on a cold and frosty morning. This is way we wash our clothes, wash our clothes, wash our clothes. This is way we wash our clothes, on a cold and frosty morning. This is way we go to school, go to school, go to school. This is the way we go to school, on a cold and frosty morning. This is the way we come out of school, come out of school, come out of school. This is the way we come out of school, on a cold and frosty morning. 38. If I had as much money as I could tell, I never would cry young lambs to sell. Young lambs to sell, young lambs to sell. I never would cry young lambs to sell. 39. A little cock sparrow sat on a green tree. And he chirped and chirped, so merry was he. A naughty boy with his bow and arrow, determined to shoot this little cock sparrow. This little cock sparrow shall make me a stew, and his giblets shall make me a little pie, too. Oh no, says the sparrow, I will not make a stew. So he flapped his wings and away he flew. 41. Old King Cole was a merry old soul. And a merry old soul was he. He called for his pipe and he called for his bowl and he called for his fiddlers three. And every fiddler, he had a fine fiddle and a very fine fiddle had he. There is none so rare as can compare with King Cole and his fiddlers three. 42. Bat bat, come under my hat and I will give you a slice of bacon. And when I bake I will give you a cake, if I am not mistaken. 43. Hark hark, the dogs do bark! Beggars are coming to town. Some in jags and some in rags and some in velvet gowns. 44. The hart he loves the high wood. The hare she loves the hill. The Knight he loves his bright sword. The Lady loves her will. 45. Bye baby bunting. Father has gone hunting. Mother has gone milking. Sister has gone silking. And brother has gone to buy a skin to wrap the baby bunting in. 46. Tom Tom the piper's son, stole a pig and away he run. The pig was eat and Tom was beat and Tom ran crying down the street. 47. Cocks crow in the morn to tell us to rise and he who lies late will never be wise. For early to bed and early to rise, is the way to be healthy and wealthy and wise. 48. One two, buckle my shoe. Three four, knock at the door. Five six, ick up sticks. Seven eight, lay them straight. Nine ten. a good fat hen. Eleven twelve, dig and delve. Thirteen fourteen, maids a courting. Fifteen sixteen, maids in the kitchen. Seventeen eighteen. maids a waiting. Nineteen twenty, my plate is empty. 49. There was a little girl who had a little curl right in the middle of her forehead. When she was good she was very very good and when she was bad she was horrid. 50. Little Jack Horner sat in the corner, eating of Christmas pie. He put in his thumb and pulled out a plum and said What a good boy am I! D O C 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 21 2 3 5 6 7 8 9 30 2 3 5 6 7 8 9 41 2 3 4 5 6 7 8 9 0 D O C 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 21 2 3 5 6 7 8 9 30 2 3 5 6 7 8 9 41 2 3 4 5 6 7 8 9 0 a l w a y s 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 a w a y 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 b a b y 3 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 b a c k 4 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a d 5 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 b a g 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b a k e 7 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 b e d 8 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 b o y 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 b r e a d 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b r i g ht 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 b r o w n 1 2 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 b u y 1 3 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 c a k e 1 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 c h i l d 1 5 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c l e a n 1 6 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c l o t h 1 7 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 c o c k 1 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 c r o w n 1 9 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 c r y 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 c u t 2 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d a y 2 2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d i s h 2 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 d o g 2 4 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 e a t 2 5 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 1 f a l l 2 6 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 f i d d le 2 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 f u l l 2 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 g i r l 2 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 g r e e n 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 h i g h 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 h i l l 3 2 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 h o u s e 3 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 k i n g 3 4 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 l a d y 3 5 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 l a m b 3 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 m a i d 3 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 m e n 3 8 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 m e r r y 3 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 m o n e y 4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 m o r n 4 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 m o t h er 4 2 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 n o s e 4 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 o l d 4 4 0 0 0 0 0 0 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 p i e 4 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 p i g 4 6 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 p i u m 4 7 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 r o u n d 4 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 r u n 4 9 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 s i n g 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 s o n 5 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 t h r e e 5 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 t h u m b 5 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 t o w n 5 4 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 t r e e 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 t w 0 5 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 w a y 5 7 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 w i f e 5 8 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 w o m a n 5 9 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 w o o l 6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 wc 4 2 2 2 3 2 7 3 3 5 3 3 5 4 3 2 2 2 6 2 2 2 7 3 6 4 5 3 3 1 2 4 3 7 5 2 2 4 3 6 4 3 2 5 wc 4 2 2 2 3 2 7 3 3 5 3 3 5 4 3 2 2 2 6 2 2 2 7 3 6 4 5 3 3 1 2 4 3 7 5 2 2 4 3 6 4 3 2 5 2 2 3 2 2 2 6 2 2 2 5 2 6 3 2 3 3 4 2 3 5 2 3 2 2 3 2 2 2 Df>2 5 4 3 2 2 3 3 3 3 2 2 4 2 2 2 3 3 2 4 2 4 2 3 5 3 3 2 2 2 2

  16. SCW: Secure Classwise technology 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 FOG Similar to the cloud, but can see only a few feet (only your authorized pTrees . Use with Id Fusion to solve Snowden problem? RSI1_16 Band1 3 3 7 7 3 3 7 7 2 2 10 15 2 10 15 15 1 2 3 4 5 Band4 11 15 11 11 11 11 11 11 15 15 11 11 15 15 11 11 Band2=Class 7 3 3 2 7 3 3 2 11 11 10 10 11 11 10 10 Band3 8 8 4 5 8 8 4 5 8 8 4 4 8 8 4 4 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 1 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 TOC ct cls-ct’s FOG relative pTree number RSI1._16(24244).Offset rpn FOG Offset column is the key for authorized users. Authorization by band=column? Store separately by pTree and class so authorization can be restricted to feature col and class Another security layer would pre-pend to each pTree a random length pad, Add-A-Pad. This is shown on the next slide. Is it necessary? Is it useful?? Then there’d be 2 keys, the FOG.offset sequence and PAD.length sequence. 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 P1,3.4(00031).3402 0 P1,2.6(02040).1934 1 0 1 2 3 4 5 6 7 8 9 0 P1,1.e(24234).4981 2 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 P1,0.a(24240).0045 3 P1=2.3(00030).2219 4 P1=3.4(02200).5011 5 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 P1=7.4(22000).2480 6 P1=a.2(00011).0023 7 0 1 2 3 4 5 6 7 8 9 0 P1=f.3(00030).4056 8 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 P2,3.8(00044).4683 9 P2,2.g(00200).1710 10 P2,1.g(24244).1781 11 0 1 2 3 4 5 6 7 8 9 0 P2,0.a(04204).0750 12 P2=2.2(20000).1102 13 P2=3.4(04000).2609 14 P2=7.2(00200).0629 15 P2=a.4(00040).5155 16 P2=f.4(00004).3822 17 0 1 2 3 4 5 6 7 8 9 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 P3,3.8(02204).1556 18 P3,2.8(22040).2975 19 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 P3,1.0(00000).1034 20 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 P2,0.2(20000).4440 21 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 10 1 1 1 1 1 1 1 P3=4.6(02040).3377 22 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 00 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 P3=5.2(20000).3480 23 P3=8.8(02204).0784 24 P4,3.g(24244).4510 25 P4,2.6(01004).2583 26 P4,1.e(24244).2740 27 The CWCT (Class&Feature Count Table) is available in TOC. P4,0.g(24244).0503 28 P4=b.b(23240).3218 29 P4=f .5(01004).3430 30

  17. NO! SCW-A with Add-A-Pad? 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 The FOG (Similar to cloud, but can see only a few feet (your authorized pTrees ), AKA pTreesoup 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 RSI1_16 Band1 3 3 7 7 3 3 7 7 2 2 10 15 2 10 15 15 Band4 11 15 11 11 11 11 11 11 15 15 11 11 15 15 11 11 Band2=Class 7 3 3 2 7 3 3 2 11 11 10 10 11 11 10 10 Band3 8 8 4 5 8 8 4 5 8 8 4 4 8 8 4 4 0 10 0 0 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 101 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 0 0 10 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 00 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 2 3 4 5 6 7 8 9 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 TOC Ct ClassCt FOG PAD length RSI1._16(24244).offset rpn pl 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 P1,3.4(00031).3402 0 2 P1,2.6(02040).1934 1 0 AAP: Add-A-Pad security layer Pre-pend to each pTree a random length pad. Useful? There are 2 keys: FOG offset array, PAD.length array. Add-A-Pad strengthen security? Or do random background bits already do what AAP does? I.e., Iff you know both keys, you have pTree start offsets. I’m going to assume AAP does not enhance security, so I ‘ll go back to the previous slide where the FOG offset array is the key. CWCT is available in the TOC 0 1 2 3 4 5 6 7 8 9 0 P1,1.e(24234).4981 2 3 00 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 P1,0.a(24240).0045 3 2 P1=2.3(00030).2219 4 2 2 P1=3.4(02200).5011 5 1 0 0 1 1 1 1 1 1 0 0 0 0 1 1 1 1 P1=7.4(22000).2480 6 1 P1=a.2(00011).0023 7 2 0 1 2 3 4 5 6 7 8 9 0 P1=f.3(00030).4056 8 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 P2,3.8(00044).4683 9 0 P2,2.g(00200).1710 10 3 P2,1.g(24244).1781 11 1 0 1 2 3 4 5 6 7 8 9 0 P2,0.a(04204).0750 12 0 P2=2.2(20000).1102 13 1 P2=3.4(04000).2609 14 1 P2=7.2(00200).0629 15 2 P2=a.4(00040).5155 16 0 P2=f.4(00004).3822 17 3 0 1 2 3 4 5 6 7 8 9 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 0 0 P3,3.8(02204).1556 18 0 P3,2.8(22040).2975 19 1 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0 P3,1.0(00000).1034 20 2 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 P2,0.2(20000).4440 21 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 11 1 1 1 1 1 1 10 1 1 1 1 1 1 1 P3=4.6(02040).3377 22 1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 1 1 0 1 2 3 4 5 6 7 8 9 0 00 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1 1 1 P3=5.2(20000).3480 23 2 P3=8.8(02204).0784 24 0 P4,3.g(24244).4510 25 3 P4,2.6(01004).2583 26 0 P4,1.e(24244).2740 27 1 P4,0.g(24244).0503 28 0 P4=b.b(23240).3218 29 2 P4=f .5(01004).3430 30 0

  18. SCW-M: Secure ClassWise Multilevel pTrees 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 SCOM pTree FOG Like a cloud but one can see only a few feet (only your authorized pTrees A really exciting thesis topic would be to develop this with the so-called Bell Lapadula Model of Multilevel Security in the US Defense Department Orange Book!!!! (Easily modified to fit the Bell Lapadula Model?) Columns (+rows) can have different security levels, which is what Bell-Lapadula is all about. In traditional horizontal DBs that is hard to implement. It seems easier using SCOM. 1 2 3 4 5 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 TOC Ct Cls Strides (PRE-COMPUTE THIS!) RSI1._16(24244) C1 C2 C3 C4 C5<- Class 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 P1,3.4(00031).3402.3206.1706.2060.4040 <-pTree offsets 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 P1,2.6(02040).1934.0520.2050.4568.2966 0 0 0 0 0 0 0 0 P1,1.e(24234).4981.5080.5171.2450.4940 1 1 0 1 1 1 P1,0.a(24240).0045.2530.0060.1540.3561 0 0 0 0 0 1 1 1 0 0 0 1 0 0 1 1 0 0 0 0 P1=2.3(00030).2219.2306.3906.1629.3980 0 0 P1=3.4(02200).5011.4530.4538.4604.0070 0 0 0 1 2 3 4 5 6 7 8 9 0 P1=7.4(22000).2480.3050.2460.1077.2871 P1=a.2(00011).0023.0010.1120.0430.0420 0 0 0 0 P1=f.3(00030).4056.5036.5128.5119.5047 0 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 1 0 0 P2,3.8(00044).4683.4590.4690.4071.5090 P2,2.g(00200).1710.1502.1900.2601.2906 0 1 2 3 4 5 6 7 8 9 0 1 1 P2,1.g(24244).1781.2080.0098.0080.0470 0 0 P2,0.a(04204).0750.0450.0558.0051.0145 0 0 0 0 0 0 0 0 0 0 0 0 1 1 P2=2.2(20000).1102.1406.1401.1916.2017 0 0 0 1 1 1 1 1 1 1 1 1 1 1 P2=3.4(04000).2609.3009.3510.2310.2325 1 1 P2=7.2(00200).0629.0001.0310.0321.0336 P2=a.4(00040).5155.3547.3568.0369.0090 0 1 2 3 4 5 6 7 8 9 0 P2=f.4(00004).3822.3925.3930.4010.4000 1 1 1 1 1 1 0 0 0 0 0 0 0 0 P3,3.8(02204).1556.0450.1060.1963.2065 P3,2.8(22040).2975.3077.2570.1065.0636 0 0 0 1 1 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 P3,1.0(00000).1034.0636.1034.0636.0636 P2,0.2(20000).4440.1120.1034.0636.0636 0 1 2 3 4 5 6 7 8 9 0 0 0 P3=4.6(02040).3377.3187.3170.0476.0580 0 0 0 0 P3=5.2(20000).3480.0001.0023.0001.0001 1 1 P3=8.8(02204).0784.0189.1089.0988.1580 1 1 0 1 1 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1 1 1 P4,3.g(24244).4510.0145.0045.0145.0145 0 0 0 0 0 1 2 3 4 5 6 7 8 9 0 P4,2.6(01004).2583.2686.2588.2990.2980 1 1 1 1 1 1 1 1 0 0 0 0 P4,1.e(24244).2740.0145.0160.0145.0145 1 1 1 1 P4,0.g(24244).0503.0145.0160.0145.0145 0 0 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 1 P4=b.b(23240).3218.2920.0045.0080.0001 0 0 0 0 1 1 P4=f .5(01004).3430.2686.0023.0010.0369 0 1 2 3 4 5 6 7 8 9 0 The CWCT is available in the TOC. 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 Bell Lapadula says a user can read down and write up with respect to the security hierarchy. Suppose Feature Columns 3,4 are secret 1 is confidential and in 2 (class column) C1,C2 are unclassified , C3 is confidential and C4,C5 are secret. Then a Confidential user would be given as a read key: 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 1 2 3 4 5 6 7 8 9 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 How big is a big data spatial data set these days? Cover a forest with 10K gridded ultrahi res images (3840x2160) 8.29B pixels At 1 micrsec/pixel, VPHD takes 8,294sec=2.3 hrs. 1 1

  19. Decision Tree CLASSIFICATION: Each inode is a test on a feature attribute, each test outcome is assigned a link to next level (outcome=a value or range of values or each leaf represent a class. The leaf holds the class prediction for the sample. Some branches may represent noise or outliers (and should be pruned?) The ID3 algorithm for inducing a decision tree from training tuples is: 1. The tree starts as a single node containing the entire TRAINING SET. 2. If all TRAINING TUPLES have same class, DONE. 3. else, use info gain, as heuristic for selecting best decision attribute for that node. 4. Create branch  value [interval]. Partition trainingset accordingly. 5. Recurse on 2,3,4, until a Stop Condition is true. Stop Conditions? All samples in same class. ∃ no remaining candidate decision attributes (label with plurality class). Other? Info Gain: S ={s1,...,sm} be a TRAINSET. S[C]={C1,...,Cc} classes. EXPECTED INFORMATION needed to classify a sample given S is: I{s1,...,sm} = -∑i=1..mpi*log2(pi) pi= |S∩Ci|/|S| Select decision attribute, A, Expected Classification Info Gain is Gain(A) = I(s1..sm) - E(A), where E(A) = ∑j=1..v; i=1..m ( si,j /|S| * I{sij..smj} ) , skh=SA=ak∩Ch is the entropy taken out by A. The algorithm computes info gain of each attribute and selects the the highest. Tree pruning address "overfitting“ using statistics to remove least reliable branches. More on Decision Tree Induction (powerpoint Introduction) "near" requires a distance/similarity. Metrics (distance functions on feature space). One can selectTrainingSet attributes based on Gain(A). Select A iff Gain(A)>Threshold? Or? Note, for big data (e.g., m=100T), I(s1..sm) has 100Tcalculations (big delay!). Just calculate the E(A)’s and select those for which E(A)>Threshold instead? B11 B12 B13 B14 BASIC PTREES band1 0000 0011 1111 1111 P1,1 P1,2 P1,3 P1,4 0000 0011 1111 1111 5 7 16 11 0011 0001 1111 0001 0 0 1 4 0 4 0 3 4 4 0 3 0111 0011 1111 00113 0 0 P1(00) P1(01) P1(10) P1(11) VALUE_PTREES_band1 7 4 2 3 4 0 3 0 0 4 0 0 0 0 1 1 0 0 0 3 3 3 0 0 P1(000) P1(010) P1(100) P1(110) P1(001) P1(011) P1(101) P1(111) 0 0 0 0 7 4 2 3 4 0 3 0 0 4 0 0 0 0 1 1 0 0 0 3 3 3 0 0 P1(0000 P1(0100 P1(1000 P1(1100 P1(0010 P1(0110 P1(1010 P1(1110 0 0 0 0 3 0 2 0 0 0 3 0 0 0 1 1 3 3 0 P1(0001 P1(0101 P1(1001 P1(1101 P1(0011 P1(0111 P1(1011 P1(1111 0 0 0 0 4 4 0 3 4 0 0 0 0 4 0 0 0 0 0 3 0 B21 B22 B23 B24 BASIC PTREES band2 0000 1000 1111 1110 P2,1 P2,2 P2,3 P2,4 0000 1000 1111 1110 8 2 16 10 1111 0000 1111 1100 0 0 4 4 2 0 0 0 4 2 4 0 1111 0000 1111 1100 02 02 P2(00) P2(01) P2(10) P2(11) VALUE_PTREES_band2 6 2 8 0 2 4 0 0 2 0 0 0 0 0 4 4 13 02 P2(000 P2(010 P2(100 P2(110 P2(001 P2(011 P2(101 P2(111 0 0 0 0 6 2 8 0 2 4 0 0 2 0 0 0 0 0 4 4 13 02 P2(0000 P2(0100 P2(1000 P2(1100 P2(0010 P2(0110 P2(1010 P2(1110 0 0 0 0 2 0 4 0 0 2 0 0 0 0 0 4 13 P2(0001 P2(0101 P2(1001 P2(1101 P2(0011 P2(0111 P2(1011 P2(1111 0 0 0 0 4 2 4 0 2 2 0 0 2 0 0 0 0 0 4 0 1302 02 B31 B32 B33 B34 BASIC_PTREES_band3 1100 0011 0000 0001 P3,1 P3,2 P3,3 P3,4 1100 0011 0000 0001 8 8 0 2 1100 0011 0000 0000 4 0 4 0 0 4 0 4 0 2 0 0 1100 0011 0000 0000 13 VALUE_PTREES_band3 P3(00) P3(01) P3(10) P3(11) 0 8 8 0 0 4 0 4 4 0 4 0 P3(000) P3(010) P3(100) P3(110) P3(001) P3(011) P3(101) P3(111) 0 8 8 0 0 0 0 0 0 4 0 4 4 0 4 0 P3(0000 P3(0100 P3(1000 P3(1100 P3(0010 P3(0110 P3(1010 P3(1110 0 6 8 0 0 0 0 0 0 2 0 4 4 0 4 0 02 P3(0001 P3(0101 P3(1001 P3(1101 P3(0011 P3(0111 P3(1011 P3(1111 0 2 0 0 0 0 0 0 0 2 0 0 13 Band B1: Band B2: Band B3: Band B4: 3 3 7 7 7 3 3 2 8 8 4 5 11 15 11 11 3 3 7 7 7 3 3 2 8 8 4 5 11 11 11 11 2 2 10 15 11 11 10 10 8 8 4 4 15 15 11 11 2 10 15 15 11 11 10 10 8 8 4 4 15 15 11 11 S X,Y B1 B2 B3 B4 0,0 0011 0111 1000 1011 0,1 0011 0011 1000 1111 1,0 0011 0111 1000 1011 1,1 0011 0011 1000 1011 0,2 0111 0011 0100 1011 0,3 0111 0010 0101 1011 1,2 0111 0011 0100 1011 1,3 0111 0010 0101 1011 2,0 0010 1011 1000 1111 2,1 0010 1011 1000 1111 3,0 0010 1011 1000 1111 3,1 1010 1011 1000 1111 2,2 1010 1010 0100 1011 2,3 1111 1010 0100 1011 3,2 1111 1010 0100 1011 3,3 1111 1010 0100 1011 B1=class label (2, 3, 7, 10, 15 = (C1,..,C5). Need to know count of # pixels (rows in table) containing each value in each attr. B41 B42 B43 B44 BASIC_PTREES_band4 1111 0100 1111 1111 P4,1 P4,2 P4,3 P4,4 1111 0000 1111 1111 16 5 16 16 1111 1100 1111 1111 1 0 4 0 1111 1100 1111 1111 1 VALUE_PTREES_band4 P4(00 P4(01 P4(10 P4(11 0 0 11 5 3 4 0 4 1 0 4 0 1 1 P4(000 P4(010 P4(100 P4(110 P4(001 P4(011 P4(101 P4(111 0 0 0 0 0 0 11 5 3 4 0 4 1 0 4 0 1 1 P4(0000 P4(0100 P4(1000 P4(1100 P4(0010 P4(0110 P4(1010 P4(1110 0 0 0 0 0 0 0 0 P4(0001 P4(0101 P4(1001 P4(1101 P4(0011 P4(0111 P4(1011 P4(1111 0 0 0 0 0 0 11 5 3 4 0 4 1 0 4 0 1 1 Also need to know count of pixels containing pairs of values, from a descriptive attr and from class.

  20. Suppose we take this relation as training set (4-bit values). B1=class label attrib. Then the classes are: { C1,C2,C3,C5,C5 } = { 2, 3, 7,10,15 } where Ci={ci}. The ID3 alg for inducing a decision tree from training samples: S:X,Y B1 B2 B3 B4 0,0 0011 0111 1000 1011 0,1 0011 0011 1000 1111 0,2 0111 0011 0100 1011 0,3 0111 0010 0101 1011 1,0 0011 0111 1000 1011 1,1 0011 0011 1000 1011 1,2 0111 0011 0100 1011 1,3 0111 0010 0101 1011 2,0 0010 1011 1000 1111 2,1 0010 1011 1000 1111 2,2 1010 1010 0100 1011 2,3 1111 1010 0100 1011 3,0 0010 1011 1000 1111 3,1 1010 1011 1000 1111 3,2 1111 1010 0100 1011 3,3 1111 1010 0100 1011 1. Tree starts as one node representing the training samples, S. 2. If all samples are in same class (B1val)S is a leaf with that class label. Not true 3. Else, use entropy-based, info gain as a heuristic for selecting first decision attr j=1 j=2 j=3 j=4 j=5 0 .5 .5 0 0 <- p2j 1 .5 0 0 0 <- p3j 0 0 0 .25 .25 <- p4j 0 0 0 .75 0 <- p5j and 0* 0 0 0 -.311 <- p1j*log2(p1j) 0 -.5 -.5 0 0 <- p2j*log2(p2j) 0 -.5 0 0 0 <- p3j*log2(p3j) 0 0 0 -.5 -.5 <- p4j*log2(p4j) 0 0 0 -.311 0 <- p5j*log2(p5j) 0 1 -.5 .811 .811 <- I(s1j..s5j) 2 4 4 4 4 <-s1j+..+s5j so, 0 .25 -.125 .203 .203 (s1j+..s5j)*I(s1j..s5j)/16 .531=E(B2), 2.281=I(s1..sm), GAIN(B2)=1.75=I(s1..sm)-E(B2 (pij=0 p1j*log2(p1j)=0) B3={a1,a2,a3} = {4,5,8}. Aj={t:t(B3)=aj}, a1=0100, a2=0101, a3=1000, sij=#ClassSamples, Ci, in Aj., sij = rc( P1(ci)^P2(aj) ), where ci is in {2,3,7,10,15} and aj is in {4,5,8}. || P3(4) | P3(5) | P3(8) | || 6 | 2 | 8 | -------------|| 0 2 0 4 | 0 2 0 0 | 4 0 4 0 | ci| P1(ci) || 02 | 13 | | ==+==========++=========+==========+==========+ 2| 3 || 0 | 0 | 3 | | 0 0 3 0 || | | 0 0 3 0 | | 3 || | | 3 | --+----------++---------+----------+----------+ 3| 4 || 0 | 0 | 4 | | 4 0 0 0 || | | 4 0 0 0 | | || | | | --+----------++---------+----------+----------+ 7| 4 || 2 | 2 | 0 | | 0 4 0 0 || 0 2 0 0 | 0 2 0 0 | | | || 02 | 13 | | --+----------++---------+----------+----------+ 10| 2 || 1 | 0 | 1 | | 0 0 1 1 || 0 0 0 1 | | 0 0 1 0 | | 3 0 || 0 | | 3 | --+----------++---------+----------+----------+ 15| 3 || 3 | 0 | 0 | | 0 0 0 3 || 0 0 0 3 | | | | 0 || 0 | | | I=I(s1..sm)=-SUM(i=1..m)[pi*l(pi)], m=5, s=16, si=3,4,4,2,3 (rootcts for class labels, rc(P1(ci))'s), pi=s1/s =3/16,1/4,1/4,1/8,3/16 I=-(3/16*l(3/16)+4/16*l(4/16)+4/16*l(4/16)+2/16*l(2/16)+3/16*l(3/16)) = -(-.453-.5-.5-.375-.453) = -(-2.281)=2.281 E(B3)=SUM(j=1..v)[(s1j+..+smj)*I(s1j..smj)/s] where I(s1j..smj)=-SUM(i=1..m)[pij*log2(pij)], pij=sij/|Aj| j=1 j=2 j=3 0 0 3 <-- s1j 0 0 4 <-- s2j 2 2 0 <-- s3j 1 0 1 <-- s4j 3 0 0 <-- s5j 6 2 8 <- s1j+..+s5j 6 2 8 <- |Aj| (divisors) 0 0 .375 <- p1j 0 0 .5 <- p2j .67 1 0 <- p3j .167 0 .125 <- p4j .5 0 0 <- p5j 0 0 -.531 <- p1j*log2(p1j) 0 0 -.5 <- p2j*log2(p2j) -.387 0 0 <- p3j*log2(p3j) -.431 0 -.375 <- p4j*log2(p4j) -.5 0 0 <- p5j*log2(p5j) 1.318 0 1.406 <- I(s1j..s5j)=- sum of above 3 2 8 <- s1j+..+s5j .247 0 .703 <-(s1j+..+s5j)*I(s1j..s5j)/16, .950<-E(B3) 2.281=I(s1..sm), GAIN(B3)=1.331=I(s1..sm)-E(B3) Take B2 = (a1,a2,a3,a4,a5} = { 2, 3, 7,10,11 } as the first candidate attribute. Aj={t:t(B2)=aj}, where a1=0010, a2=0011, a3=0111, a4=1010, a5=1011. sij is number of samples of class, Ci, in a subset, Aj. so sij = rc( P1(ci)^P2(aj) ), where ci is in {2,3,7,10,15} and aj is in {2,3,7,10,11}. |P2(2) | P2(3) | P2(7) |P2(10) |P2(11) |2 | 4 | 2 | 4 | 4 --.----------|0 2 0 0 | 2 2 0 0 | 2 0 0 0 | 0 0 0 4 | 0 0 4 0 ci| P1(ci) | 13 | 1302 | 02 | | ==+==========++=========+==========+==========+==========+====== 2| 3 |0 | 0 | 0 | 0 | 3 | 0 0 3 0 | | | | | 0 0 3 0 | 3 | | | | | 3 --+----------+--------+----------+----------+----------+-------- 3| 4 |0 | 2 | 2 | 0 | 0 | 4 0 0 0 | | 2 0 0 0 | 2 0 0 0 | | | | | 13 | 02 | | --+----------+--------+----------+----------+----------+-------- 7| 4 |2 | 2 | 0 | 0 | 0 | 0 4 0 0 |0 2 0 0 | 0 2 0 0 | | | | | 13 | 02 | | | --+----------+--------+----------+----------+----------+-------- 10| 2 |0 | 0 | 0 | 1 | 1 | 0 0 1 1 | | | | 0 0 0 1 | 0 0 1 0 | 3 0 | | | | 0 | 3 --+----------+--------+----------+----------+----------+-------- 15| 3 |0 | 0 | 0 | 3 | 0 | 0 0 0 3 | | | | 0 0 0 3 | | 0 | | | | 0 | EXPECTED INFO needed to classify the sample is: I=I(s1..sm)= -SUM(i=1..m)[pi*log2(pi)], m=5, s=16 si = 3,4,4,2,3 (rootcounts for the class labels, rc(P1(ci))'s) pi = s1/s = 3/16, 1/4, 1/4, 1/8, 3/16 I=-(3/16*l(3/16)+4/16*l(4/16)+4/16*l(4/16)+2/16*l(2/16)+3/16*l(3/16)) = -(-.453 -.5 -.5 -.375 -.453) = -( -2.281) =2.281 E(B2)=SUM(j=1..v)[ (s1j+..+smj)*I(s1j..smj)/s ] where Ij = I(s1j..smj)=-SUM(i=1..m)[pij*log2(pij)], pij=sij/|Aj| Since m=5, the sij's are: j=1 j=2 j=3 j=4 j=5 0 0 0 0 3 <-- s1j 0 2 2 0 0 <-- s2j 2 2 0 0 0 <-- s3j 0 0 0 1 1 <-- s4j 0 0 0 3 0 <-- s5j 2 4 2 4 4 <- s1j+..+s5j 2 4 2 4 4 <- |Aj| where Aj's are the rootcounts of P2(aj)'s.

  21. B4=A={a1..av} used to classify S to {A1..Sv}. Take B4={a1,a2}={11,15} as 3rd candidate attribute. Aj={t:t(B4)=aj}, where a1=1101, a2=1111, sij is number of samples of class, Ci, in a subset, Aj. so sij = rc( P1(ci)^P2(aj) ), where ci is in {2,3,7,10,15} and aj is in {11,15}. || P4(11) | P4(15) | || 11 | 5 | -------------|| 3 4 0 4 | 1 0 4 0 | ci| P1(ci) || 1 | 1 | ==+==========++=========+==========+ 2| 3 || 0 | 3 | | 0 0 3 0 || | 0 0 3 0 | | 3 || | 3 | --+----------++---------+----------+ 3| 4 || 3 | 1 | | 4 0 0 0 || 3 0 0 0 | 1 0 0 0 | | || 1 | 1 | --+----------++---------+----------+ 7| 4 || 4 | 0 | | 0 4 0 0 || 0 4 0 0 | | | || | | --+----------++---------+----------+ 10| 2 || 1 | 1 | | 0 0 1 1 || 0 0 0 1 | 0 0 1 0 | | 3 0 || 0 | 3 | --+----------++---------+----------+ 15| 3 || 3 | 0 | | 0 0 0 3 || 0 0 0 3 | | | 0 || 0 | | I=I(s1..sm)=-SUM(i=1..m)[pi*log2(pi)], m=5, s=16, si=3,4,4,2,3 (rootcts for the class labels, rc(P1(ci))'s) pi = s1/s = 3/16, 1/4, 1/4, 1/8, 3/16, I= -(3/16*l(3/16)+4/16*l(4/16)+4/16*l(4/16)+2/16*l(2/16)+ 3/16*l(3/16)) = -(-.453 -.5 -.5 -.375 -.453)= -( -2.281)=2.281 ENTROPY based on the partition into subsets by B4 is E(B4)=SUM(j=1..v)[ (s1j+..+smj)*I(s1j..smj)/s ] where I(s1j..smj)=-SUM(i=1..m)[pij*log2(pij)], pij=sij/|Aj| The sij's are: j=1 j=2 0 3 <-- s1j 3 1 <-- s2j 4 0 <-- s3j 1 1 <-- s4j 3 0 <-- s5j 11 5 <- s1j+..+s5j 11 5 <- |Aj| (divisors) 0 .6 <- p1j .273 .2 <- p2j .364 0 <- p3j .091 .2 <- p4j .273 0 <- p5j 0 -.442 <- p1j*log2(p1j) -.511 -.464 <- p2j*log2(p2j) -.531 0 <- p3j*log2(p3j) -.315 -.464 <- p4j*log2(p4j) -.511 0 <- p5j*log2(p5j) 1.868 1.37 <- I(s1j..s5j)= - sum of above 11 5 <- s1j+..+s5j 1.284 .428 <- (s1j+..+s5j)*I(s1j..s5j)/16 1.712 <- E(B4) (sum of above) 2.281 <- I(s1..sm) GAIN(B4) > .568 <- I(s1..sm) - E(B4) and GAIN(B3) > 1.331 <- I(s1..sm) - E(B3) GAIN(B2) > 1.750 <- I(s1..sm) - E(B2) Select B2 as 1st level dec attr. 4. Branches are created for each value of B2 and samples are partitioned accordingly (If a partition is empty, generate a leaf and label it with most common class, C2, label 0011 .--- B2=0000 - > C2:0011 |--- B2=0001 - > C2:0011 |--- B2=0010 - > Sample_Set_1 |--- B2=0011 - > Sample_Set_2 |--- B2=0100 - > C2:0011 |--- B2=0101 - > C2:0011 |--- B2=0110 - > C2:0011 B2 --|--- B2=0111 - > Sample_Set_3 |--- B2=1000 - > C2:0011 |--- B2=1001 - > C2:0011 |--- B2=1010 - > Sample_Set_4 |--- B2=1011 - > Sample_Set_5 |--- B2=1100 - > C2:0011 |--- B2=1101 - > C2:0011 |--- B2=1110 - > C2:0011 `--- B2=1111 - > C2:0011 Sample_Set_1 X-Y B1 B3 B4 0,3 0111 0101 1011 1,3 0111 0101 1011 Sample_Set_2 X-Y B1 B3 B4 0,1 0011 1000 1111 0,2 0111 0100 1011 1,1 0011 1000 1011 1,2 0111 0100 1011 Sample_Set_3 X-Y B1 B3 B4 0,0 0011 1000 1011 1,0 0011 1000 1011 Sample_Set_4 X-Y B1 B3 B4 2,2 1010 0100 1011 2,3 1111 0100 1011 3,2 1111 0100 1011 3,3 1111 0100 1011 Sample_Set_5 X-Y B1 B3 B4 2,0 0010 1000 1111 2,1 0010 1000 1111 3,0 0010 1000 1111 3,1 1010 1000 1111

More Related