Nurjahan Begum , Bing Hu , Thanawin Rakthanmanon , and Eamonn Keogh

Towards a Minimum Description Length Based Stopping Criterion for Semi-Supervised Time Series Classification Nurjahan Begum, Bing Hu, ThanawinRakthanmanon, and Eamonn Keogh

Outline • Introduction • Motivation of Stopping Criterion for Semi-Supervised Classification • Proposed Stopping Criterion • Minimum Description Length (MDL) technique • Our Approach • Experimental Results • Conclusion

Introduction We have developed a Minimum Description Length based Stopping Criterion for Semi-supervised Time Series Classification • Why Semi-Supervised Learning? • Why do we need a Stopping Criterion?

Why Semi-Supervised Learning? • Labeled data • Scarce and extremely expensive* • Human intervention • Unlabeled data • Abundant. • PhysioBank archive* has more than 700 GB of digitized signals and time series freely available. • Semi Supervised classification • Less labeled data • Less human effort and usually obtains higher accuracy* • *F. Florea, et. al., Medical image categorization with MedIC and MedGIFT(2006) * A. L. Goldberger, et. al. PhysioBank,PhysioToolkit, and PhysioNet: Components of a New Research Resource for Complex Physiologic Signals (2000) * L. Wei et. al., Semi-Supervised Time Series Classification (2006)

Why do we need a Stopping Criterion? Cardiac Tamponade Patient Normal Patient Use Semi-Supervised Classification

Why do we need a Stopping Criterion? Cardiac Tamponade Patient Normal Patient Oops… We are adding false positives!

Our Contribution • A novel, parameter free stopping criterion using Minimum Description Length (MDL) for semi-supervised time series classification • Allows easy adaptation by experts in medical community

Minimum Description Length (MDL) • MDL is a formalization of Occam's Razor • The best hypothesis for a given set of data is the one that leads to the best compression of the data.

Minimum Description Length (MDL) • MDL is a formalization of Occam's Razor • The best hypothesis for a given set of data is the one that leads to the best compression of the data. • Why MDL? • Intrinsically parameter free • Leverages the true underlying structure of data • Avoids needing to explain all of the data • Has recently shown great potential for real-valued time series data

Our Approach Given Positive Instance Original Time Series

Our Approach • Discretize the Time Series • Repeat • Find the Nearest Neighbor of the Positive Instance set • Calculate the BitCount Until BitCount increases Given Positive Instance Original Time Series

Discrete Normalization (Why?) • MDL is defined in discrete space • Time series are real-valued • Need to normalize real-valued data in a space of reduced cardinality Won’t drastic information reduction loose meaningful information?

Will Discrete Normalization loose meaningful information? • The answer is NO! • Justification? • A time series clustering experiment*… (REF: [1][2]) Real valued time series Discretized time series (cardinality = 16) [1] B. Hu, et. al. Discovering the Intrinsic Cardinality and Dimensionality of Time Series using MDL.(2011) [2] T. Rakthanmanon, et. al.Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data (2011) *Incartdb dataset (Record I70, Signal II) [www.physionet.org]

Our Approach • Discretize the Time Series • Repeat • Find the Nearest Neighbor of the Positive Instance set • Calculate the BitCount Until BitCount increases Given Positive Instance Original Time Series

Our Approach H = Iteration 0

Our Approach 3500 No. of instances encoded H = Iteration 0 Bit Count = 100 * log216 + 6* 100 * log216 = 2800 2500 BitCount 0 2 4 6

Our Approach 3500 No. of instances encoded H = 3500 Iteration 0 2500 0 2 4 6 2500 BitCount Iteration 1 0 2 4 6 Bit Count =100 * log216 +6* (ceil(log2100)+log216) + 5 * 100* log216 = 2466

Our Approach 3500 3500 No. of instances encoded H = 3500 Iteration 0 2500 2500 0 2 4 6 0 2 4 6 2500 BitCount Iteration 1 0 2 4 6 Iteration 2 Bit Count = 100 * log216+ 22 * (ceil(log2100)+log216) +4* 100 * log216 = 2242

Our Approach 3500 No. of instances encoded H = 2500 Iteration 3 0 2 4 6 Bit Count = 100 * log216 + 37 * (ceil(log2100)+log216) + 3 * 100 * log216 = 2007 BitCount

Our Approach 3500 3500 No. of instances encoded H = 2500 2500 Iteration 3 0 0 2 2 4 4 6 6 Iteration 4 BitCount Bit Count = 100 * log216 + 115 *(ceil(log2100)+log216) + 2 * 100 * log216 = 2465

Our Approach 3500 3500 3500 No. of instances encoded H = 2500 2500 2500 Iteration 3 0 0 0 2 2 2 4 4 4 6 6 6 Iteration 4 BitCount Iteration 5 Stopping point Bit Count = 100 * log216 + 192 * (ceil(log2100)+log216) + 1*100 * log216 = 2912

Experimental Results * We worked with ~1 hour long data

Ideal Bad, adds false positives Interpreting the plots BitCount BitCount negative negative positive positive Stopping Point Stopping Point Number of instances encoded Number of instances encoded Bad, misses true positives Really bad negative negative BitCount BitCount positive positive Stopping Point Stopping Point Number of instances encoded Number of instances encoded

Experimental Results 5 2.85 X 10 5 svdb incartdb 3.6 X 10 2.75 3.2 negative BitCount BitCount 2.65 negative 2.8 positive positive 2.55 Stopping Point Stopping Point 2.4 100 300 500 700 100 300 500 700 Number of instances encoded Number of instances encoded 5 2.6x10 sddb negative 2.3 BitCount 2 positive 1.7 Stopping Point 100 200 300 400 Number of instances encoded

5 Experimental Results (Contd.) 6.4 X 10 5 1.9 X 10 1.8 6.1 5 2X 10 Fish_test Swedish_leaf 1.7 1.9 BitCount 5.8 Stopping Point BitCount 1.6 Stopping Point 1.8 Stopping Point 1.7 1.5 5.5 0 20 40 60 80 100 0 100 200 300 0 100 200 300 Number of instances encoded Number of instances encoded FaceAll_test BitCount Number of instances encoded

Comparison with the state-of-the-art algorithm Fish_test 1.2 0.6 Minimal Distance Too Early Stopping (Li et. al’smethod*) 0.4 0 0 10 20 30 40 50 60 70 80 5 2X 10 Number of instances classified 1.9 BitCount 1.8 Stopping Point 1.7 0 10 20 30 40 50 60 70 80 Number of instances encoded * L. Wei et. al., Semi-Supervised Time Series Classification (2006)

Conclusions • Novel way of semi-supervised classification with only one labeled instance. • Previous approaches of stopping the semi-supervised classification required – • extensive parameter tuning, • remained something of a black art. • Stoppingcriterionfor semi-supervised classification based on MDL. • To our knowledge, our stopping criterion is the • firstparameter free criterion that mitigates the early stopping problem, • leverages the inherent structure of the data.

Thank you! If you have any question, please contact me: Name:Nurjahan Begum Email: nbegu001@ucr.edu

Nurjahan Begum , Bing Hu , Thanawin Rakthanmanon , and Eamonn Keogh