1 / 44

Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series. X Liu, S Swift & A Tucker Department of Computer Science Birkbeck College University of London. MTS Applications at Birkbeck. Screening Forecasting Explanation. Forecasting.

Télécharger la présentation

Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bridging the Gap between Applications and Tools: Modeling Multivariate Time Series X Liu, S Swift & A Tucker Department of Computer Science Birkbeck College University of London

  2. MTS Applications at Birkbeck • Screening • Forecasting • Explanation

  3. Forecasting • Predicting Visual Field Deterioration of Glaucoma Patients • Function Prediction for Novel Proteins from Multiple Sequence/Structure Data

  4. Explanation Input (observations): t - 0 : Tail Gas Flow in_state 0 t - 3 : Reboiler Temperature in_state 1 Output (explanation): t - 7 : Top Temperature in_state 0 with probability=0.92 t - 54 : Feed Rate in_state 1 with probability=0.71 t - 75 : Reactor Temperature in_state 0 with probability=0.65

  5. The Gaps • Screening • Automatic / Semi- Automatic Analysis of Outliers • Forecasting • Analysing Short Multivariate Time Series • Explanation • Coping with Huge Search Spaces

  6. The Problem - What/Why/How • Short-Term Forecasting of Visual Field Progression • Using a Statistical MTS Model • The Vector Auto-Regressive Process - VAR(P) • There Could be Problems if the MTS is Short • A Modified Genetic Algorithm (GA) can be Used • VARGA The Prediction of Visual Field Deterioration Plays anImportant Role in the Management of the Condition

  7. Points used in this paper (Right Eye) x Background - The Dataset The interval between tests is about 6 months 76 75 18 19 74 73 72 15 16 17 Typically, 76 points are measured 71 70 69 68 11 12 13 14 67 66 65 64 63 6 7 8 9 10 Values Range Between 60 =very good, 0 = blind 62 61 60 59 58 1 2 3 4 5 43 42 41 40 39 20 21 22 23 24 The number of tests can range between 10 and 44 48 47 46 45 44 25 26 27 28 29 52 51 50 49 30 31 32 33 55 54 53 34 35 36 57 56 37 38 Usual Position of Blind Spot (Right Eye) x

  8. Background - The VAR Process Vector Auto-Regressive Process of Order P: VAR(P) x(t) VF Test for Data Points at Time t (K1)Ai Parameter Matrix at Lag i (KK)x(t-i) VF Test for Data Points at lag i from t (K1)(t) Observational Noise at time t (K1)

  9. The Genetic Algorithm “A Search/Optimisation method that solves a problem through maintaining and improving a population of suitable candidate solutions using biological metaphors” Generate a Population of random Chromosomes (Solutions) Repeat for a number of Generations Cross Over the current Population Mutate the current Population Select the Fittest for the next Population Loop The best solution to the problem is the Chromosome in the last generation which has the highest Fitness

  10. Y X 0-127 0-31 0000000-1111111 00000-11111 GAs - Chromosome Example 0000000.00000-1111111.11111

  11. GAs - Mutation • Each Bit (gene) of a Chromosome is Given a Chance MP of inverting • A ‘1’ becomes a ‘0’, and a ‘0’ becomes a 1’ 01101101 00101111 These Ones!

  12. GAs - Crossover (2) A B 01011101 11101010 X=4 11101101 01011010 C D

  13. ap11 … … a111 … … am11 … … a211 … … … a1ij … … amij … … apij … … a2ij … … … a1KK … … amKK … … a1KK … … apKK VARGA - Representation Chromosome ... ... A1 A2 Am Ap

  14. VARGA - The Genetic Algorithm • GA With Extra Mutation • Order Mutation After Gene Mutation • Parents and Children Mutate (Both) • Genes are Bound Natural Numbers • Fitness is -ve Forecast Error • Minimisation Problem - Roulette Wheel • Run for EACH Patient

  15. Evaluation - Methods for Comparison • SPlus: Yule Walker Equations, AIC and Whittles Recursion, NK(P+1), Standard Package • Holt-Winters Univariate Forecasting Method, Is the Data Univariate? (GA Solution) • Pure Noise Model, VAR(0), Worst Case Forecast, (Non-Differenced = 0) • 54 out of the Possible 82 Patients VF Records • Could not be Used : SPlus Implementation

  16. Results - Graph Comparison • The Lower the Score - the Better • Score is the One Step Ahead Forecast Error

  17. Results - Table Summary Average = The Average One Step Forecast ErrorFor the 28 Patients (Both GA’s Fitness) (The Lower - The Better)

  18. Conclusion - Results • VARGA Has a Better Performance • VARGA Can Model Short MTS • The Visual Field Data is Definitely Multivariate • Data Has a High Proportion of Noise

  19. Conclusion - Remarks • Non-Linear Methods and Transformations • Performance Enhancements for the GA • Improve Crossover • Irregularly Spaced Methods • Space-Time Series Methods • Time Dependant Relationships Between Variables

  20. Generating Explanations in MTS • Useful to know probable explanations for a given set of observations within a time series • E.g. Oil Refinery: ‘Why a temperature has become high whilst a pressure has fallen below a certain value?’ • Possible paradigm which facilitates Explanation is the Bayesian Network • Evolutionary Methods to learn BNs • Extend work to Dynamic Bayesian Networks

  21. Dynamic Bayesian Networks • Static BNs repeated over t time slices • Contemporaneous / Non-Contemporaneous Links • Used for Prediction / Diagnosis within dynamic systems

  22. Assumptions - 1 • Assume all variables take at least one time slice to impose an effect on another. • The more frequently a system generates data, the more likely this will be true. • Contemporaneous Links can be excluded from the DBN • Each variable at time, t, will be considered independent of one another

  23. Representation • P pairs of the form (ParentVar, TimeLag) • Each pair represents a link from a node at a previous time slice to the node in question at time t. Examples : Variable 1: { (1,1); (2,2); (0,3)} Variable 4: { (4,1); (2,5)}

  24. Search Space • Given the first assumption and proposed representation the Search Space for each variable will be:

  25. Algorithm Structure Search : Evolutionary Algorithms, Hill Climbing etc. Multivariate Time Series Parameter Calculation given structure Dynamic Bayesian Network Library for Different Operating States User Explanation Algorithm (e.g. using Stochastic Simulation)

  26. Generating Synthetic Data (1) (2)

  27. Oil Refinery Data • Data recorded every minute • Hundreds of variables • Selected 11 interrelated variables • Discretised each variable into k states • Large Time Lags (up to 120 minutes between some variables) • Different Operating States

  28. Results SOT FF TGF TT RinT

  29. Explanations - using Stochastic Simulation

  30. Explanations - using Stochastic Simulation

  31. Explanation Input (observations): t - 0 : Tail Gas Flow in_state 0 t - 3 : Reboiler Temperature in_state 1 Output (explanation): t - 7 : Top Temperature in_state 0 with probability=0.92 t - 54 : Feed Rate in_state 1 with probability=0.71 t - 75 : Reactor Temperature in_state 0 with probability=0.65

  32. Future Work • Exploring the use of different searches and metrics • Improving accuracy (e.g. different discretisation policies, continuous DBNs) • Using the library of DBNs in order to quickly classify the current state of a system • Automatically Detecting Changing Dependency Structure

  33. Acknowledgements BBSRC BP-AMOCO British Council for Prevention of Blindness EPSRC Honeywell Hi-Spec Solutions Honeywell Technology Center Institute of Opthalmology Moorfields Eye Hospital MRC

  34. Intelligent Data Analysis X Liu Department of Computer Science Birkbeck College University of London

  35. Intelligent Data Analysis • An interdisciplinary study concerned with effective analysis of data • Intelligent application of data analytic tools • Application of “intelligent” data analytic tools

  36. IDA Requires • Careful thinking at every stage of an analysis process (strategic aspects) • Intelligent application of relevant domain knowledge • Assessment and selection of appropriate analysis methods

  37. IDA Conferences • IDA-95, Baden-Baden • IDA-97, London • IDA-99, Amsterdam • IDA-2001, Lisbon

  38. IDA in Medicine and Pharmacology • IDAMAP-96, Budapest • IDAMAP-97, Nagoya • IDAMAP-98, Brighton • IDAMAP-99, Washington DC • IDAMAP-2000, Berlin

  39. Other IDA Activities • IDA Journal (Elsevier 1997) • Journal Special Issues (1997 -) • Introductory Books (Springer 1999) • The Dagstuhl Seminar (Germany 2000) • European Summer School (Italy 2000) • Special Sessions at Conferences

  40. Concluding Remarks • Strategies for data analysis and mining • Strategies for human-computer collaboration in IDA • Principles for exploring and analysing “big data” • Benchmarking interesting real-world data-sets as well as computational methods • A long term interdisciplinary effort

  41. The Screening Architecture

  42. Results from a GP Clinic

More Related