170 likes | 273 Vues
This paper presents a novel approach for the flexible and efficient retrieval of haemodialysis time series data. Utilizing Temporal Abstraction (TA), we transform time-stamped data into high-level concepts, enabling users to explore state and trend variables effectively. Our method incorporates dimensionality reduction techniques and a robust index structure, catering to both basic and abstract query processing. By employing a two-level TA and a domain-independent methodology, we address challenges in interpreting time series in the medical domain, enhancing decision-making and patient monitoring.
E N D
Flexible and efficient retrieval of haemodialysis time seriesS. Montani, G. Leonardi, A. Bottrighi, L. Portinale, P. TerenzianiDISIT, Sezione di Informatica, Universita del Piemonte Orientale, Alessandria, Italy • Introduction • Data structures • Retrieval process • Experiments • Future work
Introduction: Time Series - evolution of a phenomenon over time, to understand its behavior for future problem solving TIME SERIES • Medical domain: continuous monitoring, control instruments (e.g., ICU, hemodialysis) • State variables (e.g, distolic pressure value) vstrend variables (e.g., increasing, decreasing) • PROBLEM: difficult interpretation and retrieval – e.g. - find similar cases, - find “abstract” cases, - understand results to interactively refine\relax search need for automatic support for these tasks
Introduction: Time Series Retrieval (literature) • DIMENSIONALITY REDUCTION • mathematical transforms able to preserve the distance between two time series (or to underestimate it). E.g. Discrete Fourier Transform (DFT) • Complexity (preprocessing, post processing) • INPUT: a specific time serie (case) • Black box behavior difficult interpretation, no flexibility, no interacttivity Symbolic approaches to dimensionality reduction (e.g., [Xia, 96], survey [Daw et al., 2001])
Our approach: Time Series Retrieval + Temporal Abstraction (TA) • Original contribution: TA used for dimensionality reduction and flexible retrieval • TA: deriving high level concepts from time stamped data (from a point-based to an interval-based representation) • In our proposal: two-level TA: SYMBOL (e.g., increase vslow_increase) TIME GRANULARITY (e.g., 1h vs 20min) • DOMAIN-INDEPENDENT methodology: • General DATA STRUCTURES • CONSTRAINTS on the data structures
DATA STRUCTURES: SYMBOL TAXONOMY Example!! • SYMBOL ORDERING naturally emerges from the domain dependent interpretation (e.g., Ds may abstract slopes from −90 to −45 degrees, thus preceding Dw(slopes from −44 to −10 degrees) - Domain-independent general constraint: symbol taxonomy must respect the ordering ∀x, y, x′, y′ ∈ isa(x, x′) ∧ isa(y, y′) ∧ x′ y′ ∧ x < y → x′ < y′
DATA STRUCTURES: SYMBOL DISTANCE • ANY DISTANCE function is admitted (domain independent) - However, the DISTANCE function must be CONSISTENT with the SYMBOL ORDERING (if any) ∀x, y, z x < y < z → distance(x, y) < distance(x, z)
DATA STRUCTURES: TIME GRANULARITY TAXONOMY • ANY taxonomy of time granularities (to describe the episodes at increasingly more abstract levels of temporal aggregation) e.g. 10 min 30 min 1 h 2 h 4 h • HOMOGENEITY: aggregation must be “homogeneous” at every given level, in the sense that each granule at a given level must be an aggregation of exactly the same number of consecutive granules at the lower level IMPLICIT information about DURATION of (sub)episodes • “up” function, to aggregate from each level to the upper one e.g. up(<I,I,S>, 10 min, 30 min) <I, 30 min>
DATA STRUCTURES: TIME GRANULARITY TAXONOMY: UP FUNCTION • ANY “up” function (domain-dependent), BUT • CONSTRAINT about PERSISTENCE ∀x ∈ up(x, x) = x • CONSTRAINTS about ORDERING PRESERVATION ∀x, y x < y → x ≤ up(x, y) ≤ y ∀x, y, z x < y < z → up(x, y) ≤ up(x, z) ∀x, y, z x <y <z → up(x, z) ≤ up(y, z)
DATA STRUCTURES: INDEX OF Time Series (cases) • FOREST of TREEs • First, the TIME GRANULARITY dimension is (partially) expanded • Then, the SYMBOL dimension is (partially) expanded • Each node in the tree addresses all the time series (cases) that are abstracted (“up” function + ISA symbol taxonomy) by the pattern of the node
DATA RETRIEVAL • Exploits Temporal Abstraction (“up” function on temporal granularity and ISA on symbol taxonomy) and the INDEX Supports both • “basic” queries (retrieve time series similar to a given one) • “abstract” queries (retrieve time series similar to (<S,Iw,Iw,Iw>, 1h) • QUERY PROCESSING HIGHLIGHTS - Abstract on the symbol taxonomy (ISA) - Abstract on the time granularity taxonomy (“up”) - Find the proper (root of the) index tree in the forest - Descend the index tree backward to the lowest possible node - Return the time series (cases) addressed by such a node
DATA RETRIEVAL: an example “Abstract” query: S IwIwIw(1h time granularity level) Abstraction, symbol taxonomy: S IwIwIw S I II Abstraction, time granularity (“up” function): S I II II I
DATA RETRIEVAL: an example Descend the index from the root “I “ to search for “S IwIwIw” ALL the corresponding time series are returned
DATA RETRIEVAL: advantages FLEXIBLE and UNDERSTANDABLE - “Abstract” query: S IwIwIw (1h time granularity level) • Understandable query, process and output all time series that can be abstracted as dictated by the “abstract” query are returned • Support for INTERCACTIVITY E.g., depending on the output of the query, the user may • Relax the query, eg., by asking “S I II” • Refine the query, e.g., by asking “S SIwIwIwIwIw S “
DATA RETRIEVAL: Experimental Results • Dataset of 10388hemodialysis sessions (i.e. cases), collected at the Vigevano hospital, Italy. • Comparisons with RHENE, an approach was based on DFT for dimensionality reduction, and on spatial indexing (through TV-trees) for further improving retrieval performances ADVANTAGES • Efficiency • Flexibility (“abstract” queries vs. specific time series) • Interactivity * Trends vs, state abstractions
FUTURE WORK • Queries about SUBpattrerns • Higher level queries (e.g., regular expressions)