190 likes | 307 Vues
ShiftTree presents an innovative model-based strategy for time series classification, utilizing cursor movements and operator sequences to achieve interpretable models. It emphasizes faster labeling and the ability to learn general properties of classes from data. The approach integrates EyeShifter Operators (ESO) and ConditionBuilder Operators (CBO) to dynamically compute attributes and guide the classification process. Evaluating model performance across various datasets shows promising accuracy, particularly in larger datasets, establishing ShiftTree as a valuable framework for time series analysis.
E N D
ShiftTree: an InterpretableModel-BasedApproachfor Time Series Classification Balázs Hidasi - hidasi@tmit.bme.hu Csaba Gáspár-Papanek - gaspar@tmit.bme.hu Budapest University of Technology and Economics, Hungary ECML/PKDD, 6th September 2011, Athens
Outline • General concept • Task & adaptation • Labeling & learning • Evaluation • Forest methods • Conclusion
General Concept • Not limited totime series • Time series • Semi-structureddata • Graphs • Twoquestions: • „Wheretolookat?” • „Whattoobserve?” • Operator families • EyeShifter Operator(s) (ESO) • ConditionBuilder Operators (CBO) • Dynamicattributes • Model: sequence of simpleoperatos / rules F(x)
Time Series Classification Model • Time series • Withoneobservedvariable • Evenlysampled • The algorithm has no suchlimitations • Standard classificationtask • Focusingonaccuracy • Goals • Satisfyingaccuracy • Model-basedinstead of memory-based • Interpretablemodels Learning Model Labeling
Explainingthegoals • Whymodel-based? • Labeling is muchfaster • Canlearn more generalproperties of theclasses • Needs more trainingsamples • What is interpretabilitygoodfor? • Gettingusertrust • Helpinginunderstandingthedata
ConceptAppliedto Time Series • Dynamicattributes of time series • EyeShifter Operator (ESO): „Wheretolookat?” • Moves a cursoralongthetimeaxis • To a specifiedpoint of the series • E.g.: „Tothenext local minimum”, „100 stepsahead”, etc. • Note: result of operator sequencedepensontheorder • ConditionBuilder Operator (CBO): „Whattoobserve?” • Computesthedynamicattribute • E.g.: „Valueatthegiventime”, „Length of thejump”, etc. • Decisiontreeasmodel • Works wellwiththedynamicattributeconcept • Operator sequencefromroottoleaf F(x) Weightedaverage
Labeling Dynamicattributevalue:0,767139 Dynamicattributevalue:0,836848 Level 0 ESO: Toglobalmax CBO: Valueattime Threshold: 1,20775 Level 1 Level 1 ESO: Stepforward 100 ESO: Tonext local max CBO: Valueattime CBO: Valueattime Threshold : 0,390093 Threshold : -0,700739 Level 2 Level 2 Level 2 Level 2 Leaf Leaf Leaf Leaf
Learning (1/3) • Operator pooldefinition: ESOs, CBOs • Rootnode • Cursorstartsatthebeginning • May be overridden
F(x) Learning (2/3) • Availabledynamicattributes (CBO pool) • At/aroundreachablepositions • Selecting a… • Attribute • Thresholdvalue Minimizeweightedchildnodeentropy
Learning (3/3) • Best splitselected • Cursor is movedby ESO • Reachablepositionswillchange • Trainsetsplit • Childnodescontinuethelearningprocess • Stop: homogenousnodes
Interpretability • Dependsonthe operators • Helpsundepstandingthedata • E.g.: CBF dataset • Z-normalized, 3 class: Cylinder, Bell, Funnel • Distinguishcylinderfrombell and funnelbyglobal maxima • The data is z-normalized (standardized) • Distinguishbellfromfunnelbystepping back 25 steps + noisefilteringthroughweightedaverage • Onwhichside is thepeak
Performance of thealgorithm • Databases • UCR: mostlysmallerdatasets (20) • Ford: largerdatasets (2) • No optimizationsintheexperiments • Performance • Betteronlargerdatasets (model-based)
Advantages and drawbacks • Advantages • Advantages of being modelbased (fastlabeling, etc) • Interpretable (depenson operators) • Can be usedindependently of applicationdomain • Expert’sknowledgecan be builtinthrough operators • Disadvantages • Optimization of the operators is nottrivial • Learningmaytake a longtimewhentoomany operators used • Disadvantages of themodelbasedmethods (largertrainingsetrequired, etc)
ShiftForest: combiningmultiplemodels • Combiningmodelsenhancesaccuracy • Boosting • Someissuesonsmallerdatasets • The „XV” method • Splittingtrainingset (random splits) • Building modelonthefirst part, evaluatingonthesecond • Modelweight: themeasuredaccuracy
ShiftForestresults • Accuracyincreases • Interpretability is oftenlost • May theintersection of thetreescan be interpreted
Conclusion • Novelmodel-basedalgorithm • Usescursors and operators • Cursormovement • Attributecomputation • Sequence of simple operators asthemodel • Decisiontree • Base of a wholealgorithmfamily
Thankyouforyourattention! Balázs Hidasi (hidasi@tmit.bme.hu) Csaba Gáspár-Papanek (gaspar@tmit.bme.hu) For more ShiftTree related research visit my side:http://www.hidasi.eu
Time of model building, scalability • O(N*log(N)) per node(ordering) • Time of model building • Dependsontreestructure • Acceptable • Scaleslinearly • Experiments
Learningthemodel EyeShifter – Cursorposition ESONextMax ESONext100 ESOMax CBOSimple ShiftTree ConditionBuilder - Ordering F(x) Best splitinnode (so far) ESO: ESONext100 - ESOMax - ESONextMax - - ESONextMax ESONextMax Dynamicattributeselection - CBOSimple - - CBOSimple - CBOSimple CBO: M Threshold: - - 1,02775 -0,332858 -0,739969 - -0,700739 - 0,390093 Best thresholdbycurrentdynamicattribute: - 1,02775 2,48622 -1,16155 -0,332858 -0,700739 - -0,739969 0,77183 0,390093 - 0,369887 Score of thebestsplitbycurrentdyn. attribute : 0,805777 0 0 Inf Inf 0,985078 1,32624 0,600131 Inf 0,983634 Best scoreinthenode (so far): Inf 0 0,805777 0 0,983634 0,985078 Inf Inf Best ordering: