1 / 2

PRISM: Precision-aware Aggregation for Scalable Monitoring

Problem Statement. Results. Approach. Further Information. PRISM: Precision-aware Aggregation for Scalable Monitoring. Navendu Jain, Dmitry Kit, Prince Mahajan, Praveen Yalagandula*, Mike Dahlin, and Yin Zhang Laboratory for Advanced Systems Research, The University of Texas at Austin

todd
Télécharger la présentation

PRISM: Precision-aware Aggregation for Scalable Monitoring

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Problem Statement Results Approach Further Information PRISM: Precision-aware Aggregation for Scalable Monitoring Navendu Jain, Dmitry Kit, Prince Mahajan, Praveen Yalagandula*, Mike Dahlin, and Yin Zhang Laboratory for Advanced Systems Research, The University of Texas at Austin * Hewlett-Packard Labs, Palo Alto, CA Laboratory for Advanced Systems Research Arithmetic Imprecision • Guarantees: • Given a 10% AI budget for example, the reported aggregate value either overestimates or underestimates the true value by at most 10%. • Benefits: • Allows caching to filter small changes in aggregate value. • Approach: • Keep a bounded approximate answer [Vmin, Vmax] in which the true value resides. Imprecision is fundamental in distributed systems • We have implemented a prototype of PRISM in Java using SDIMS based on the FreePastry framework • Initial results are promising Network Disconnections Cached Values Real World Ideal World • AVG QUERY on the Abilene dataset:Monitor the total volume of incoming traffic received by all destination hosts over a 30-second sliding window • Effect of Arithmetic Imprecision • A 24% numeric error achieves over a 80% load reduction. Erroneous sensors Unbounded latency • Causes: • Sensors could be error prone. • Fluctuations in network propagation delays. • Dynamicity/Churn in the underlying system. Temporal Imprecision • Guarantees: • Given a 60-second TI for example, the system guarantees that all events that occurred 60 or more seconds ago are accounted for in the aggregated value. • Benefits: • Batch multiple updates to reduce processing and network load. • Approach: • Pipelined Delays: pipeline available TI across levels of the aggregation hierarchy. Applications are forced to live with imperfect results • Effect of Temporal Imprecision • A 10-second temporal imprecision provides an order of magnitude reduction in bandwidth cost • Key observations: • Real-world applications can tolerate bounded imprecision. • Small Imprecision provides significant bandwidth reduction. • Imprecision should be made a first class abstraction • Treat imprecision as a 3-dimensional vector: • Arithmetic Imprecision (AI): bounds the numerical inconsistency between the reported value of the aggregate relative to the true value. • Temporal Imprecision (TI): provides a real-time bound on the delay from when an event occurs until it is reported. • Network Imprecision (NI): bounds the inaccuracy introduced by node failures, slow network paths, unreachable nodes, and DHT reconfigurations. Temporal [Arithmetic, Temporal, Network] • Induced Network Imprecision • Network imprecision successfully characterizes the system state under induced network failures Network Imprecision Arithmetic Network • Provide 3 new metrics: Nreachable, Nall, Ndup • Nreachable: provides a lower bound on the number of nodes whose value is reflected in an aggregated result. • Nall: provides an estimate of the number of nodes that are members of the system. • Ndup: provides an upper bound on the number of nodes whose contribution to an aggregate may be doubly-counted. Example: 2 1 Conditioned Consistency:The AI and TI guarantees are calculated optimistically, assuming that the network is “stable”. The NI metric then qualifies the AI and TI metrics by quantifying the stability of the network during the period when these metrics are calculated. 3 4 URL: http://www.cs.utexas.edu/users/nav/PRISM Email: sdims@cs.utexas.edu

  2. Node failure at 900s • Nreach – shows that part of the tree is not reachable • Nall – Still assumes the node is alive • Ndup – tells us that the nodes in the sub-tree rooted at the failed node have joined somewhere else, and some values might be counted twice • At approximately 1700s Nall registers the failure event, and the system goes back to a stable state. • Ndup goes back to zero.

More Related