1 / 21

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data. Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel University West London. Cross-Section Data. Studies often involve data sampled from a cross-section of a population

akina
Télécharger la présentation

Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Making Time: Pseudo Time-Series for the Temporal Analysis of Cross-Section Data Emma Peeling, Allan Tucker Centre for Intelligent Data Analysis Brunel University West London

  2. Cross-Section Data • Studies often involve data sampled from a cross-section of a population • Especially in biological and medical studies • Collecting medical information on patients suffering from a particular disease and controls (healthy) • Essentially these studies show a “snapshot” of the disease process

  3. Cross-Section Data • Many processes are inherently temporal in nature • Previously healthy people can develop a disease over time going through different stages of severity • If we want to model the development of such processes, usually require longitudinal data

  4. Cross-Section vs Longitudinal Longitudinal Study Onset Disease Progression Cross Section Study

  5. Pseudo Time-Series Models • In this presentation we explore: • Ordering data based upon Minimum Spanning Trees & PQ-Trees (Rifkin et al. 2000) • Treating this ordered data as “Pseudo Time-Series” • Using Pseudo Time-Series to build temporal models • Test using a dynamic Bayesian network model for classifying: • Medical Data • Gene Expression Data

  6. Multi-Dimensional Scaling • Can be used to visualise distance between data points and pathways • Here we use classic MDS • Metric-based – Euclidean Distance

  7. Minimum Spanning Tree • Connects all nodes in graph • Links contain minimal weights Weighted Graph MST

  8. PQ-Tree • PQ-Trees are used to encode partial orderings on variables • P nodes: children can be in any order • Q nodes: children order can only be reversed

  9. Dynamic Bayesian Network Classifiers • DBNCs are used to calculate: P(C|Xt, Xt-1) • Here, we use the DBNC to model the Pseudo Time-Series for classifying data

  10. Pseudo Time-Series Models • In Summary: 1: Input: Cross-section data 2: Construct weighted graph and MST 3: Construct PQ tree from MST 4: Derive Pseudo Time-Series from PQ-tree using hill-climb search on P-nodes to minimise sequence length 5: Build DBNC model using pseudo temporal ordering of samples 6: Output: Temporal model of cross-section data

  11. The Datasets • B-Cell Microarray Data • 3 classes of B-Cell data • A number of patients • Pre-ordered into expert pseudo time-series • Visual Field Test Data • One large cross-section study • Healthy and Glaucomatous eyes • One longitudinal study for testing the models

  12. B-Cell: MDS & Pseudo Time-Series • Plots show • discovered path in 3D • Classification of B-Cell data in 2D

  13. B-Cell Accuracy • Plot shows mean accuracy and variance over Cross-Validation with repeats

  14. Expert Knowledge • Ordering Sequence length • Biologist = 512.0506: • 1-26 • PQ-tree: = 528.9907: • 1-6,7,9,8,11,10,12-18,26,19,21,20,22-25 • PQ-tree and hill-climb = 521.1865: • 1-18,26,19-25

  15. Visual Field: MDS & Pseudo Time-Series • Plots show • Path found for VF data in 3D • Classification of VF data in 2D

  16. VF Accuracy • Plot shows mean accuracy and variance over Train / Test data with repeats

  17. Related Work • Semi-Supervised Methods • Some datapoints are labelled with classes • These are used to assist classification of others in an incremental manner • Pseudo MTS imposes an order on the data as well as a distance between data • Allows for the prediction of future states

  18. Conclusions • Cross Section data usually models snapshot of a process • Longitudinal data usually needed to model temporal nature • Here we use ordering methods to create Pseudo Time-Series models • Early results on medical and biological data are promising

  19. Future Work • Dealing with outliers in dataspace • Multiple trajectories (e.g. in VF data) • Normalisation (rather than discretisation) • Combining a number of longitudinal and cross-section studies

  20. Multiple Trajectories

  21. Acknowledgements • Thanks to: • David Garway-Heath, Moorifield’s Eye Hospital, London • Paul Kellam, University College London

More Related