290 likes | 404 Vues
This paper explores methods to efficiently query data from sensor networks using in-network summaries. It focuses on approximating sensor readings—like temperature—with specified tolerances, ensuring responses meet user-defined confidence levels despite lossy communication and noisy measurements. The study examines various strategies for constructing models, preserving data integrity, and minimizing traversal costs in network queries. Techniques such as Gaussian mixtures for modeling, greedy clustering for optimal node groupings, and sensitivity analysis of model parameters are also discussed, highlighting their practical implications in real-world applications.
E N D
Approximating Sensor Network Queries Using In-Network Summaries Alexandra Meliou Carlos Guestrin Joseph Hellerstein
Approximate Answer Queries • Approximate representation of the world: • Discrete locations • Lossy communication • Noisy measurements • Applications do not expect accurate values (tolerance to noise) • Example: • Return the temperature at all locations ±1C, with 95% confidence • Query Satisfaction: • On expectation the requested portion of sensor values lies within the error range
In-network Decisions Query Use in-network models to make routing decisions No centralized planning
In-network Summaries Spanning tree T(V,E’) + Models Mv for all nodes v Mv represents the whole subtree rooted at v.
Model Complexity • Gaussian distributions at the leaves: • good for modeling individual node measurements Need for compression
Talk “outline” Compression In-network summaries Construction Traversal
Collapsing Gaussian Mixtures • Compress an m-size mixture to a k-size mixture. • Look at simple case (k=1) • Minimize KL-divergence? “Fake” mass
Quality of Compression Depends on query workload Query with acceptable error window W’<W Query with acceptable error window W
Compression Accurate mass inside interval No guarantee on the tails
Talk “outline” Compression In-network summaries Construction Traversal
Query Satisfaction • A response R={r1…rn} satisfies query Q(w,δ) if: • In expectation the values of at least δn nodes lie within [ri-w,ri+w] Q In-network summary Within error bounds [r1, r2, r3, r4, r5, r6, r7, r8, r9, r10] R
Optimal Traversal • Given: tree and models • Find: subtree such that [μleaves] response Can be computed with Dynamic Programming
Greedy Traversal • If local model satisfies • Return μ • Else descend to child node More conservative solution: enforces query satisfiability on every subtree instead of the whole tree
Talk “outline” Compression In-network summaries Construction Traversal
Optimal Tree Construction • Given a structure, we know how to build the models • But how do we pick the structure?
Traversal = cut Theorem: In a fixed fanout tree, the cost of the traversal is where |C| is the size of the cut, and F the fanout Intuition: minimize cut size Group nodes into a minimum number of groups which satisfy the query constraints Clustering problem
Optimal Clustering • Given a query Q(w,δ), optimal clustering is NP-hard • Related to the Group Steiner Tree Problem • Greedy algorithm with factor log(n) approximation • Greedily pick max size cluster • Issue: does not enforce connectivity of clusters
Greedy Clustering • Include extra nodes to enforce connectivity • Augment clusters only with accessible nodes (losing the logn guarantee)
Clustering comparison • 2 distributed clustering algorithms are compared to the centralized greedy clustering
Talk “outline” Compression Enriched models In-network summaries Construction Traversal
Enriched models • Support more complex models • k-mixtures • Compress to a k-size mixture instead of a SGM • Virtual nodes • Every component of the k-size mixture is stored as a separate “virtual node” • SGMs on multiple windows • Maintain additional SGMs for different window sizes • More space, more expensive model updates (SGM = Single Gaussian Model)
Evaluation of enriched models SGM surprisingly effective in representing the underlying data
Talk “outline” Sensitivity analysis Compression In-network summaries Construction Traversal
Tree Construction Parameters and Effect on Performance • Confidence • Performance for workloads of different confidence than the hierarchy design • Error window • Broader vs narrower ranges of window sizes • Assignment of windows across tree levels • Temporal changes • How often should the models be updated
Confidence Workload of 0.95 confidence Design confidence does not have a big impact on performance
Error windows A wide range is not always better, because it forces the traversal of more levels
Compression Conclusions Enriched models In-network summaries Traversal Construction • Analyzed compression schemes for in-network summaries • Evaluated summary traversal • Studied optimal hierarchy construction • Studied increased complexity models • Showed that simple SGM are sufficient • Analyzed the effect on efficiency of various parameters Sensitivity analysis