260 likes | 428 Vues
The Design of the Borealis Stream Processing Engine. Brandeis University, Brown University, MIT Magdalena Balazinska Nesime Tatbul MIT Brown. Node 2. Node 1. Node 3. Node 4. Distributed Stream Processing. Data sources push tuples continuously Operators process windows of tuples.
E N D
The Design of the Borealis Stream Processing Engine Brandeis University, Brown University, MIT Magdalena Balazinska Nesime Tatbul MIT Brown
Node 2 Node 1 Node 3 Node 4 Distributed Stream Processing • Data sources push tuples continuously • Operators process windows of tuples U S m Data Sources Client Apps U m S
Where are we today? • Data models, operators, query languages • Efficient single-site processing • Single-site resource management • [STREAM, TelegraphCQ, NiagaraCQ, Gigascope, Aurora] • Basic distributed systems • Servers [TelegraphCQ, Medusa/Aurora] • or sensor networks [TinyDB, Cougar] • Tolerance to node failures • Simple load management
Challenges • Tuple revisions • Revisions on input streams • Fault-tolerance • Dynamic & distributed query optimization • ...
(3pm,$11) (2pm,$12) (2:25,$9) (4:15,$10) (4:05,$11) (1pm,$10) Average price 1 hour Causes for Tuple Revisions • Data sources revise input streams: “On occasion, data feeds put out a faulty price [...] and send a correction within a few hours” [MarketBrowser] • Temporary overloads cause tuple drops • Temporary failures cause tuple drops
Current Data Model • time: tuple timestamp header data (time,a1,...,an)
New Data Model for Revisions • time: tuple timestamp • type: tuple type • insertion, deletion, replacement • id: unique identifier of tuple on stream header data (time,type,id,a1,...,an)
Revisions: Design Options • Restart query network from a checkpoint • Let operators deal with revisions • Operators must keep their own history • (Some) streams can keep history
(3pm,$11) (2pm,$11) (2pm,$12) (3pm,$11) (2:25,$9) (1pm,$10) (2pm,$12) Average price Average price 1 hour 1 hour Revision Processing in Borealis • Closed model: revisions produce revisions
Revision Processing in Borealis • Connection points (CPs) store history • Operators pull the history they need Input queue Operator Stream S CP Most recent tuple in history Oldest tuple in history Revision
Node 3 U m U S m Node 3’ Node 1’ Node 2’ U m s3’ Fault-Tolerance through Replication • Goal: Tolerate node and network failures s3 Node 1 Node 2 U S m s1 s2 s4
Reconciliation • State reconciliation alternatives • Propagate tuples as revisions • Restart query network from a checkpoint • Propagate UNDO tuple • Output stream revision alternatives • Correct individual tuples • Stream of deletions followed by insertions • Single UNDO tuple followed by insertions
Fault-Tolerance Approach • If an input stream fails, find another replica • No replica available, produce tentative tuples • Correct tentative results after failures STABLE UPSTREAM FAILURE Missing or tentative inputs Failure heals Another upstream failure in progress Reconcile state Corrected output STABILIZING
Challenges • Tuple revisions • Revisions on input streams • Fault-tolerance • Dynamic & distributed query optimization • ...
Optimization in a Distributed SPE • Goal: Optimized resource allocation • Challenges: • Wide variation in resources • High-end servers vs. tiny sensors • Multiple resources involved • CPU, memory, I/O, bandwidth, power • Dynamic environment • Changing input load and resource availability • Scalability • Query network size, number of nodes
Quality of Service • A mechanism to drive resource allocation • Aurora model • QoS functions at query end-points • Problem: need to infer QoS at upstream nodes • An alternative model • Vector of Metrics (VM) carried in tuples • Operators can change VM • A Score Function to rank tuples based on VM • Optimizers can keep and use statistics on VM
State Confidence Area Thermal Hydration Cognitive Life Signs Wound Detection 90% 60% 100% 90% 80% Physiologic Models S S S Example Application: Warfighter Physiologic Status Monitoring (WPSM)
VM ([age, confidence], value) HRate Model1 RRate Merge Model2 Temp Sensors Models may change confidence. SF(VM) = VM.confidence x ADF(VM.age) Score Function age decay function Ranking Tuples in WPSM
End-point Monitor(s) Global Optimizer Borealis Optimizer Hierarchy Neighborhood Optimizer Local Monitor Local Optimizer Borealis Node
Optimization Tactics • Priority scheduling • Modification of query plans • Commuting operators • Using alternate operator implementations • Allocation of query fragments to nodes • Load shedding
A C B D S1 S2 A B S1 Connected Plan Cut Plan C D r r S2 2cr 2r 2r 4cr 3cr 3cr Correlation-based Load Distribution • Goal: Minimize end-to-end latency • Key ideas: • Balance load across nodes to avoid overload • Group boxes with small load correlation together • Maximize load correlation among nodes
Load Shedding • Goal: Remove excess load at all nodes and links • Shedding at node A relieves its descendants • Distributed load shedding • Neighbors exchange load statistics • Parent nodes shed load on behalf of children • Uniform treatment of CPU and bandwidth problem • Load balancing or Load shedding?
Node A Node B 4C C r1 r2 App1 3C C r1 r2 App2 Local Load Shedding Goal: Minimize total loss r2’ 25% 58% CPU Load = 5Cr1, Cap = 4Cr1 CPU Load = 4Cr2, Cap = 2Cr2 CPU Load = 3.75Cr2, Cap = 2Cr2 No overload No overload A must shed 20% B must shed 50% A must shed 20% A must shed 20% B must shed 47%
Node A Node B 4C C r1 r2 App1 3C C r1 r2 App2 Distributed Load Shedding Goal: Minimize total loss r2’ 9% r2’’ 64% CPU Load = 5Cr1, Cap = 4Cr1 CPU Load = 4Cr2, Cap = 2Cr2 No overload No overload A must shed 20% B must shed 50% A must shed 20% A must shed 20% smaller total loss!
Models data Proxy Merge control Sensors control message from the optimizer Extending Optimization to Sensor Nets • Sensor proxy as an interface • Moving operators in and out of the sensor net • Adjusting sensor sampling rates • The WPSM Example:
Conclusions • Next generation streaming applications require a flexible processing model • Distributed operation • Dynamic result and query modification • Dynamic and scalable optimization • Server and sensor network integration • Tolerance to node and network failures • Borealis has been iteratively designed, driven by real applications • First prototype release planned for Spring’05 • http://nms.lcs.mit.edu/projects/borealis