1 / 32

The Automatic Explanation of Multivariate Time Series (MTS)

The Automatic Explanation of Multivariate Time Series (MTS). Allan Tucker. The Problem - Data. Datasets which are Characteristically: High Dimensional MTS Large Time Lags Changing Dependencies Little or No Available Expert Knowledge. The Problem - Requirement.

fai
Télécharger la présentation

The Automatic Explanation of Multivariate Time Series (MTS)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Automatic Explanation of Multivariate Time Series (MTS) Allan Tucker

  2. The Problem - Data • Datasets which are Characteristically: • High Dimensional MTS • Large Time Lags • Changing Dependencies • Little or No Available Expert Knowledge

  3. The Problem - Requirement • Lack of Algorithms to Assist Users in Explaining Events where: • Model Complex MTS Data • Learnable from Data with Little or No User Intervention • Transparency Throughout the Learning and Explaining Process is Vital

  4. Contribution to Knowledge • Using a Combination of Evolutionary Programming (EP) and Bayesian Networks (BNs) to Overcome Issues Outlined • Extending Learning Algorithms for BNs to Dynamic Bayesian Networks (DBNs) with Comparison of Efficiency • Introduction of an Algorithm for Decomposing High Dimensional MTS into Several Lower Dimensional MTS

  5. Contribution to Knowledge (Continued) • Introduction of New EP-Seeded GA Algorithm • Incorporating Changing Dependencies • Application to Synthetic and Real-World Chemical Process Data • Transparency Retained Throughout Each Stage

  6. Framework Pre-processing Data Preparation Variable Groupings Model Building Search Methods Synthetic Data Evaluation Real Data Changing Dependencies Explanation

  7. Key Technical Points 1Comparing Adapted Algorithms • New Representation • K2/K3 [Cooper and Herskovitz] • Genetic Algorithm [Larranaga] • Evolutionary Algorithm [Wong] • Branch and Bound [Bouckaert] • Log Likelihood / Description Length • Publications: • International Journal of Intelligent Systems, 2001

  8. Key Technical Points 2Grouping • A Number of Correlation Searches • A Number of Grouping Algorithms • Designed Metrics • Comparison of All Combinations • Synthetic and Real Data • Publications: • IDA99 • IEEE Trans System Man and Cybernetics 2001 • Expert Systems 2000

  9. Key Technical Points 3EP-Seeded GA • Approximate Correlation Search Based on the One Used in Grouping Strategy • Results Used to Seed Initial Population of GA • Uniform Crossover • Specific Lag Mutation • Publications: • Genetic Algorithms and Evolutionary Computation Conference 1999 (GECCO99) • International Journal of Intelligent Systems, 2001 • IDA2001

  10. Key Technical Points 4Changing Dependencies • Dynamic Cross Correlation Function for Analysing MTS • Extend Representation Introduce a Heuristic Search - Hidden Controller Hill Climb (HCHC) • Hidden Variables to Model State of the System • Search for Structure and Hidden States Iteratively

  11. Future Work • Parameter Estimation • Discretisation • Changing Dependencies • Efficiency • New Datasets • Gene Expression Data • Visual Field Data

  12. DBN Representation a0(t) (3,1,4) (4,2,3) (2,3,2) (3,0,2) (3,4,2) a1(t) a2(t-2) a2(t) a3(t-4) a3(t-2) a3(t) a4(t-3) a4(t) t-4 t-3 t-2 t-1 t

  13. Sample DBN Search Results N = 5, MaxT = 10 N = 10, MaxT = 60

  14. 1. Correlation Search (EP) 2. Grouping Algorithm (GGA) Several Lower Dimensional MTS Grouping One High Dimensional MTS (A) List 1 2 R (a, b, lag) (a, b, lag) (a, b, lag) G {0,3} {1,4,5} {2}

  15. Original Synthetic MTS Groupings Groupings Discovered from Synthetic Data Sample of Variables from a Discovered Oil Refinery Data Group 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 0 6 1 2 3 4 5 7 8 9 10 11 12 13 14 15 20 21 22 16 17 18 19 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 Sample Grouping Results

  16. Parameter Estimation • Simulate Random Bag (Vary R, s and c, e) • Calculate Mean and SD for Each Distribution (the Probability of Selecting e from s) • Test for Normality (Lilliefors’ Test) • Symbolic Regression (GP) to Determine the Function for Mean and SD from R, s and c (e will be Unknown) • Place Confidence Limits on the P(Number of Correlations Found e)

  17. Final EPList EP-Seeded GA 0: (a,b,l) 1: (a,b,l) 2: (a,b,l) EPListSize: (a,b,l) EP DBN Initial GAPopulation 0: ((a,b,l),(a,b,l)…(a,b,l)) 1: ((a,b,l),(a,b,l)…(a,b,l)) 2: ((a,b,l),(a,b,l)…(a,b,l)) GAPopsize: ((a,b,l) … (a,b,l)) GA

  18. EP-Seeded GA Results N = 10, MaxT = 60 N = 20, MaxT = 60

  19. Varying the value of c

  20. Time Explanation t t-1 t-11 t-13 t-16 t-20 t-60 P(TT instate_0) = 1.0 P(TGF instate_0) = 1.0 P(BPF instate_3) = 1.0 P(TGF instate_3) = 1.0 P(TT instate_1) = 0.446 P(SOT instate_0) = 0.314 P(C2% instate_0) = 0.279 P(T6T instate_0) = 0.347 P(RinT instate_0) = 0.565

  21. 50 10.5 10 45 9.5 40 9 A/M_GB Variable Magnitude 35 TGF 8.5 30 8 25 7.5 20 7 1 501 1001 1501 2001 2501 3001 3501 Time (Minutes) Changing Dependencies

  22. Dynamic Cross- Correlation Function

  23. Hidden Variable - OpState a0(t-4) a2(t-1) a2(t) OpState2 a3(t-2) t-4 t-3 t-2 t-1 t

  24. < DBN_List > < Segment_Lists > Update Segment_Lists through Op_State Parameter Estimation Score Update DBN_List through DBN Structure Search Hidden Controller Hill Climb

  25. HCHC Results - Oil Refinery Data

  26. HCHC Results - Synthetic Data Generate Data from Several DBNs Append each Section of Data Together to Form One MTS with Changing Dependencies Run HCHC

  27. Time Explanation t t-1 t-3 t-5 t-6 t-9 P(OpState1 is 0) = 1.0 P(a1 is 0) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0 P(OpState1 is 0) = 1.0 P(a1 is 1) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0 P(a2 is 0) = 0.758 P(OpState0 is 0) = 0.519 P(a0 is 0) = 0.968 P(OpState0 is 0) = 0.720 P(a0 is 1) = 0.778 P(a2 is 0) = 0.545 P(a0 is 1) = 0.517

  28. Time Explanation t t-1 t-3 t-5 t-6 t-7 t-9 P(OpState1 is 4) = 1.0 P(a1 is 0) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0 P(OpState1 is 4) = 1.0 P(a1 is 1) = 1.0 P(a0 is 0) = 1.0 P(a2 is 1) = 1.0 P(a2 is 1) = 0.570 P(a0 is 0) = 0.506 P(OpState2 is 3) = 0.210 P(a2 is 1) = 0.974 P(OpState2 is 4) = 0.222 P(a2 is 0) = 0.882 P(a0 is 1) = 0.549

  29. TGF %C3 Process Diagram TT T6T PGM PGB SOTT11 SOFT13 RINT C11/3T T36T AFT FF RBT BPF %C2

  30. TGF %C3 Typical Discovered Relationships PGM TT T6T PGB SOTT11 SOFT13 RINT C11/3T T36T AFT FF RBT BPF %C2

  31. Parameters DBN SearchGA EP PopSize 100 10 MR0.1 0.8 CR0.8 --- GenBased on FC Based on FC Correlation Search c - Approx. 20% of s R - Approx. 2.5% of s Grouping GA Synth. 1 Synth. 2-6 Oil PopSize150 100 150 CR 0.8 0.8 0.8 MR0.1 0.1 0.1 Gen 150 100 (1000 for GPV) 150

  32. Parameters EP-Seeded GA c - Approx. 20% of s EPListSize - Approx. 2.5% of s GAPopSize - 10 MR - 0.1 CR - 0.8 LMR -0.1 Gen - Based on FC HCHC Oil Synthetic DBN_Iterations 1×106 5000 Winlen 1000 200 Winjump 500 50

More Related