1 / 32

Monitoring Streams -- A New Class of Data Management Applications

Monitoring Streams -- A New Class of Data Management Applications. Don Carney Brown University Uğur Çetintemel Brown University Mitch Cherniack Brandeis University Christian Convey Brown University Sangdon Lee Brown University Greg Seidman Brown University Michael Stonebraker MIT

byrd
Télécharger la présentation

Monitoring Streams -- A New Class of Data Management Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Monitoring Streams -- A New Class of Data Management Applications Don Carney Brown University Uğur ÇetintemelBrown University Mitch Cherniack Brandeis University Christian Convey Brown University Sangdon Lee Brown University Greg Seidman Brown University Michael Stonebraker MIT Nesime Tatbul Brown University Stan Zdonik Brown University

  2. Background • MIT/Brown/Brandeis team • First Aurora, then Borealis • Practical system • Designed for Scalablility: 106 stream inputs, queries • QoS-Driven Resource Management • Stream Storage Management • Realiability/ Fault Tolerance • Distribution and Adaptivity • First stream startup: StreamBase • Financial applications

  3. Example Stream Applications • Market Analysis • Streams of Stock Exchange Data • Critical Care • Streams of Vital Sign Measurements • Physical Plant Monitoring • Streams of Environmental Readings • Biological Population Tracking • Streams of Positions from Individuals of a Species

  4. Not Your Average DBMS • External, Autonomous Data Sources • Querying Time-Series • Triggers-in-the-large • Real-time response requirements • Noisy Data, Approximate Query Results

  5. Outline 2. Aurora Overview/ Query Model • Runtime Operation • Adaptivity ®

  6. App App App QoS QoS QoS • Each Provides: • Aover input data streams • A Quality-Of-Service Specification ( ) • (specifies utility of partial or late results) Application Query QoS Aurora from 100,000 Feet Query . . . . . . . . . Query . . . . . . . . . . . . Query

  7. App App App QoS QoS QoS Aurora from 100 Feet Slide s s . . . . . . s s m . . . . . . . . . È m Tumble s m • Query Operators (Boxes) • Simple: FILTER, MAP, RESTREAM • Binary: UNION, JOIN, RESAMPLE • Windowed: TUMBLE, SLIDE, XSECTION, WSORT • Queries = Workflow (Boxes and Arcs) • Workflow Diagram = “Aurora Network” • Boxes = Query Operators • Arcs = Streams • Streams (Arcs) • stream: tuple sequence from common source • (e.g., sensor) • tuples timestamped on arrival (Internal use: QoS)

  8. App App App QoS QoS QoS Aurora in Action Slide s s s s s s . . . . . . s s s s s s s App m s s s s m m s . . . . . . È È È . . . È È È È m m m App Tumble Tumble Tumble s s s s m m s m Arcs ® Tuple Queues “Box-at-a-time” Scheduling Outputs Monitored for QoS

  9. Queues … … … … … O1 O2 O3 App continuous query QoS QoS QoS O4 O5 App … ad-hoc query O8 O9 O7 3 Days view Continuous and Historical Queries 1 Hour Connection Point

  10. Quality-of-Service (QoS) B C A Specifies “Utility” Of Imperfect Query Results Delay-Based (specify utility of late results) Delivery-Based, Value-Based (specify utility of partial results) QoS Influences… Scheduling, Storage Management, Load Shedding %TuplesDelivered Output Value Delay

  11. Talk Outline • Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions ®

  12. inputs outputs Storage Manager q1 q2 . . . s s qi m Buffer . . . . . . È È Persistent Store Catalog q1 q2 . . . qn … … … … … … Runtime OperationBasic Architecture Router Scheduler Box Processors QOS Monitor

  13. Runtime OperationScheduling: Maximize Overall QoS Delay = 2 sec Utility = 0.5 Choice 1: A: Cost: 1 sec (…, age: 1 sec) Delay = 5 sec Utility = 0.8 B: Cost: 2 sec Choice 2: (…, age: 3 sec) Schedule Box A now rather than later Ideal: Maximize Overall Utility Presently exploring scalable heuristics (e.g., feedback-based)

  14. … … z z z y y y x x x AB B (A (z)) B (A (y)) B (A (x)) Box Trains: B B (A (z), A (y), A (x)) A A (z, y, x) Tuple Trains: Runtime OperationScheduling: Minimizing Per Tuple Processing Overhead Train Scheduling: B A A (z) A (y) A (x) B (A (z)) B (A (y)) B (A (x)) Default Operation: = Context Switch

  15. Runtime OperationStorage Management • Run-time Queue Management Prefetch Queues Prior to Being Scheduled Drop Tuples from Queues to Improve QoS 2. Connection Point Management Support Efficient (Pull-Based) Access to Historical Data E.g., indexing, sorting, clustering, …

  16. Talk Outline • Introduction 2. Aurora Overview 3. Runtime Operation 4. Adaptivity 5. Related Work and Conclusions ®

  17. Stream Query Optimization • Differences with Traditional Query Optimization?

  18. Stream Query Optimization • New classes of operators (windows) may mean new rewrites • New execution modes (continuous/pipelining) • More dynamic fluctuations in statistics  compile time optimization not possible • Global optimization not practical; as huge query networks  Adaptive optimization. • Other cost models taking memory into account, not throughput but output rate, etc. • Query optimization and load shedding

  19. Query Optimization Compile-time, Global Optimization Infeasible Too Many Boxes Too Much Volatility in Network, Data Dynamic, Local Optimization Threshold re when to optimize

  20. Motivation of ‘Query Migration’ • Continuous query over streams • Statistics unknown before start • Statistics changing during execution • Stream rates, arrival pattern, distribution, etc • Need for dynamic adaptation • Plan re-optimization • Change the shape of query plan tree

  21. Run-time Plan Re-Optimization • Step 1 - Decide when to optimize • Statistics Monitoring • Step 2 – Generate new query plan • Query Optimization • Step 3 – Replace current plan by new plan • Plan Migration

  22. Adaptivity in Query Optimization Dynamic Optimization : Migration 1. Identify Subnetwork 2. Buffer Inputs 3. Drain Subnetwork 4. Optimize Subnetwork 5. Turn on Taps

  23. Stateful Operator in CQ Example: Symmetric NL join w/ window constraints • Why stateful • Need non-blocking operators in CQ • Operator needs to output partial results • State data structure keep received tuples ax b2 ax b3 State A State B Key Observation: The purge of tuples in states relies on processing of new tuples. AB b1 b2 b3 b4 b5 ax A B ax

  24. Naïve Migration Strategy Revisited BC AB Deadlock Waiting Problem: • Steps (1) Pause execution of old plan (2) Drain out alltuples inside old plan (3) Replace old plan by new plan (4) Resume execution of new plan A B C (2) All tuples drained (3) Old Replaced By new (4) Processing Resumed

  25. AdaptivityQuery Optimization State Movement Protocol Parallel Track Protocol

  26. Moving State Strategy • Basic idea • Share common states between two migration boxes • Key steps • State Matching • Match states based on IDs. • State Moving • Create new pointers for matched states in new box • What’s left? • Unmatched states in new box QABCD QABCD CD AB SABC SBCD SD SA CD BC SD SBC SAB SC BC AB SB SC SA SB QA QB QC QD QA QB QC QD Old Box New Box

  27. Parallel Track Strategy • Basic idea • Execute both plans in parallel and gradually “push” old tuples out of old box by purging • Key steps • Connect boxes • Execute in parallel • Until old box “expired” (no old tuple or sub-tuple) • Disconnect old box • Start execute new box only QABCD QABCD SABC SD SBCD SA CD AB SBC SAB SD SC BC CD SA SB SB SC BC AB QA QB QC QD QD QA QB QC

  28. AdaptivityLoad Shedding 1. Two Load Shedding Techniques: • Random Tuple Drops Add DROP box to network(DROP a special case of FILTER) Position to affect queries w/ tolerant delivery-based QoS reqts • Semantic Load Shedding FILTER values with low utility (acc to value-based QoS) 2. Triggered by QoS Monitor e.g., after Latency Analysis reveals certain applications are continuously receiving poor QoS

  29. Output rate = min (1/c, r) * s Monitor each application’s Delay-based QoS I I I I I I I I I O O O O O O O O O C,S C,S C,S C,S C,S C,S C,S C,S C,S Problem: Too many apps in “bad zone” P P P P P P P P P AdaptivityDetecting Overload Throughput Analysis Cost = c Selectivity = s Input rate = r 1/c > r Þ Problem Latency Analysis

  30. ImplementationGUI

  31. 5 3 2 1 4 0 6 ImplementationRuntime

  32. Conclusions Aurora Stream Query Processing System • Designed for Scalability • QoS-Driven Resource Management • Continuous and Historical Queries • Stream Storage Management • Implemented Prototype Web site: www.cs.brown.edu/research/aurora/

More Related