ShiftTree : an Interpretable Model-Based Approach for Time Series Classification

ShiftTree: an InterpretableModel-BasedApproachfor Time Series Classification Balázs Hidasi - hidasi@tmit.bme.hu Csaba Gáspár-Papanek - gaspar@tmit.bme.hu Budapest University of Technology and Economics, Hungary ECML/PKDD, 6th September 2011, Athens

Outline • General concept • Task & adaptation • Labeling & learning • Evaluation • Forest methods • Conclusion

General Concept • Not limited totime series • Time series • Semi-structureddata • Graphs • Twoquestions: • „Wheretolookat?” • „Whattoobserve?” • Operator families • EyeShifter Operator(s) (ESO) • ConditionBuilder Operators (CBO) • Dynamicattributes • Model: sequence of simpleoperatos / rules F(x)

Time Series Classification Model • Time series • Withoneobservedvariable • Evenlysampled • The algorithm has no suchlimitations • Standard classificationtask • Focusingonaccuracy • Goals • Satisfyingaccuracy • Model-basedinstead of memory-based • Interpretablemodels Learning Model Labeling

Explainingthegoals • Whymodel-based? • Labeling is muchfaster • Canlearn more generalproperties of theclasses • Needs more trainingsamples • What is interpretabilitygoodfor? • Gettingusertrust • Helpinginunderstandingthedata

ConceptAppliedto Time Series • Dynamicattributes of time series • EyeShifter Operator (ESO): „Wheretolookat?” • Moves a cursoralongthetimeaxis • To a specifiedpoint of the series • E.g.: „Tothenext local minimum”, „100 stepsahead”, etc. • Note: result of operator sequencedepensontheorder • ConditionBuilder Operator (CBO): „Whattoobserve?” • Computesthedynamicattribute • E.g.: „Valueatthegiventime”, „Length of thejump”, etc. • Decisiontreeasmodel • Works wellwiththedynamicattributeconcept • Operator sequencefromroottoleaf F(x) Weightedaverage

Labeling Dynamicattributevalue:0,767139 Dynamicattributevalue:0,836848 Level 0 ESO: Toglobalmax CBO: Valueattime Threshold: 1,20775 Level 1 Level 1 ESO: Stepforward 100 ESO: Tonext local max CBO: Valueattime CBO: Valueattime Threshold : 0,390093 Threshold : -0,700739 Level 2 Level 2 Level 2 Level 2 Leaf Leaf Leaf Leaf

Learning (1/3) • Operator pooldefinition: ESOs, CBOs • Rootnode • Cursorstartsatthebeginning • May be overridden

F(x) Learning (2/3) • Availabledynamicattributes (CBO pool) • At/aroundreachablepositions • Selecting a… • Attribute • Thresholdvalue Minimizeweightedchildnodeentropy

Learning (3/3) • Best splitselected • Cursor is movedby ESO • Reachablepositionswillchange • Trainsetsplit • Childnodescontinuethelearningprocess • Stop: homogenousnodes

Interpretability • Dependsonthe operators • Helpsundepstandingthedata • E.g.: CBF dataset • Z-normalized, 3 class: Cylinder, Bell, Funnel • Distinguishcylinderfrombell and funnelbyglobal maxima • The data is z-normalized (standardized) • Distinguishbellfromfunnelbystepping back 25 steps + noisefilteringthroughweightedaverage • Onwhichside is thepeak

Performance of thealgorithm • Databases • UCR: mostlysmallerdatasets (20) • Ford: largerdatasets (2) • No optimizationsintheexperiments • Performance • Betteronlargerdatasets (model-based)

Advantages and drawbacks • Advantages • Advantages of being modelbased (fastlabeling, etc) • Interpretable (depenson operators) • Can be usedindependently of applicationdomain • Expert’sknowledgecan be builtinthrough operators • Disadvantages • Optimization of the operators is nottrivial • Learningmaytake a longtimewhentoomany operators used • Disadvantages of themodelbasedmethods (largertrainingsetrequired, etc)

ShiftForest: combiningmultiplemodels • Combiningmodelsenhancesaccuracy • Boosting • Someissuesonsmallerdatasets • The „XV” method • Splittingtrainingset (random splits) • Building modelonthefirst part, evaluatingonthesecond • Modelweight: themeasuredaccuracy

ShiftForestresults • Accuracyincreases • Interpretability is oftenlost • May theintersection of thetreescan be interpreted

Conclusion • Novelmodel-basedalgorithm • Usescursors and operators • Cursormovement • Attributecomputation • Sequence of simple operators asthemodel • Decisiontree • Base of a wholealgorithmfamily

Thankyouforyourattention! Balázs Hidasi (hidasi@tmit.bme.hu) Csaba Gáspár-Papanek (gaspar@tmit.bme.hu) For more ShiftTree related research visit my side:http://www.hidasi.eu

Time of model building, scalability • O(N*log(N)) per node(ordering) • Time of model building • Dependsontreestructure • Acceptable • Scaleslinearly • Experiments

Learningthemodel EyeShifter – Cursorposition ESONextMax ESONext100 ESOMax CBOSimple ShiftTree ConditionBuilder - Ordering F(x) Best splitinnode (so far) ESO: ESONext100 - ESOMax - ESONextMax - - ESONextMax ESONextMax Dynamicattributeselection - CBOSimple - - CBOSimple - CBOSimple CBO: M Threshold: - - 1,02775 -0,332858 -0,739969 - -0,700739 - 0,390093 Best thresholdbycurrentdynamicattribute: - 1,02775 2,48622 -1,16155 -0,332858 -0,700739 - -0,739969 0,77183 0,390093 - 0,369887 Score of thebestsplitbycurrentdyn. attribute : 0,805777 0 0 Inf Inf 0,985078 1,32624 0,600131 Inf 0,983634 Best scoreinthenode (so far): Inf 0 0,805777 0 0,983634 0,985078 Inf Inf Best ordering:

ShiftTree : an Interpretable Model-Based Approach for Time Series Classification