80 likes | 253 Vues
Geometric Approaches to Reconstructing Times Series. Project Outline 15 February 2007 CSC/Math 870 Computational Discrete Geometry Connie Phong. Problem Statement. How to reconstruct a time ordering from data without explicit time information (i.e. time indices)?
 
                
                E N D
Geometric Approaches to Reconstructing Times Series Project Outline 15 February 2007 CSC/Math 870 Computational Discrete Geometry Connie Phong
Problem Statement • How to reconstruct a time ordering from data without explicit time information (i.e. time indices)? • When does such a scenario arise?
Accurate Time Series for Biological Processes are Difficult to Obtain • Members of a population are not synchronized • Members within a population often have different process rates • Consider carcinogenesis • delayed identification of cancer cells • to investigate early stages must sample from a cell population to possibly reconstruct temporal sequence of events
Problem Formulation • f(t) = [x1(t), x2(t), …, xd(t)] is a continuous vector function • V = {f1, f2, …, fn} • si unknown time index for fi • permutation p of the index set {1, 2, …, n} is a temporal ordering of the points V = {f1, f2, …, fn} if p(i) ≤p(j)  si≤ sj for all i, j in index set • Find p given V
Magwene et al: Ordering Observations Using MSTs • f(t) is a 1-D curve embedded in the space of the measurements • Assumptions • Distance measured using standard Euclidean inner product (norms) • The embedding distance is monotonically related to the geodesic distance (shortest distance)
Magwene et al: Ordering Observations Using MSTs • Let G = {v,e} be a weighted, complete graph • V represents sampled observations • edge weights are distances in the embedding geometry • Rules for estimating arc length distances from distribution of observed data:
Magwene et al: Ordering Observations Using MSTs • Find Gmst • IfGmst is a path, then the best estimate of the ordering is Gmst • Else • If noise is low and sampling intensity is high, the estimated ordering is the diameter path of Gmst • Diameter path is the longest shortest path between any two vertices • If noise high and/or sampling intensity is low, things become a little more complicated.
Objective • To improve Magwene et al.’s algorithm • To get there: • In-depth analysis of the algorithm • What’s the intuition for exploiting the MST? • MST algorithms • Implement algorithm, replicate results on given data • Test on other empirically derived data • What realistic data scenarios cause hang-ups? • Look to the related problem of curve reconstruction for tips?