370 likes | 488 Vues
This lecture explores innovative methods for stream joins and query-aware partitioning to efficiently handle massive network data streams, such as those generated by soccer players in simulations. Key topics include the Handshake Join and window-based joins which leverage highly parallel architectures, such as multi-core systems and FPGAs, for optimized performance. It emphasizes the importance of effective partitioning and replication strategies in parallel environments, with discussions on scalability, data flow management, and network monitoring techniques, ensuring high throughput and low latency in processing.
E N D
CS 410/510Data StreamsLecture 15: How Soccer Players Would do Stream Joins & Query-Aware Partitioning for Monitoring Massive Network Data Streams Kristin Tufte, David Maier Data Streams: Lecture 15
How Soccer Players Would do Stream Joins • Handshake Join • Evaluate window-based stream joins • Highly parallelizable • Implementation on multi-core machine and FPGA • Previous stream join execution strategies • Sequential execution based on operational semantics Data Streams: Lecture 15
Let’s talk about stream joins • Join window of R with window of S • Focus on sliding windows here • Scan, Insert, Invalidate • How might I parallelize? • Partition and replicate • Time-based windows vs. tuple-based windows Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011 Data Streams: Lecture 15
So, Handshake Join… • Parallelization needs partitioning; possibly replication • Needs central coordination • Entering tuple pushes oldest tuple out • No central coordination • Same semantics • May introduce disorder Figure Credit : How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011 Handshake Join Stream Join Input B Input A Traditional Stream Join Data Streams: Lecture 15
Parallelization • Each core gets a segment of each window • Data flow: act locally on new data arrival and passing on data • Good for shared-nothingsetups • Simple communication – interact with neighbors; avoid bottlenecks Data Streams: Lecture 15 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011
Parallelization - Observations • Parallelizes tuple-based windows and non equi-join predicates • As written, compares all tuples – could hash at each node to optimize • Note data transfer costs between cores and each tuple is processed at each core • Soccer players have short arms, hardware is NUMA Data Streams: Lecture 15 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011
Scalability • Data flow + point-to-point communication • Add’l cores: larger window sizes or reduce workload per core • “directly turn any degree of parallelism into higher throughput or larger supported window sizes” • “can trivially be scaled up to handle larger join windows, higher throughput rates, or more compute-intensive join predicates” Data Streams: Lecture 15 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011
Encountering Tuples • Item in either window, encounters all current times in the other window • Immediate scan strategy • Flexible segment boundaries (cores) • Other local implementations Figure : How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011 Data Streams: Lecture 15 Figure Credit: How Soccer Players Would do Stream Joins – Teubner, Mueller, Sigmod 2011
Handshake Join with Message Passing • Lock-step processing (tuple-based windows) • FIFO queues with message passing • Missed join-pair Data Streams: Lecture 15
Two-phase forwarding • Asymmetric synchronization (replication on one core only) • Keep copies of forwarded tuples until ack received • Ack for s4 must be processed between r5 and r6 Data Streams: Lecture 15
Load Balancing & Synchronization • Even distribution not needed for correctness • Maintain mostly even-sized local S windows • Synch at pipeline ends to manage windows Data Streams: Lecture 15
FPGA Implementation • Tuple-based windows that fit into memory • Common clock signal; lock-step processing • Nested-loops join processing Data Streams: Lecture 15
Performance Scalability on Multi-Core CPU Scalability on FPGAs; 8 tuples/window Data Streams: Lecture 15
Before we move on… • Soccer joins focuses on sliding windows • How would their algorithm and implementation work for tumbling windows? • What if we did tumbling windows only? Data Streams: Lecture 15
Query-Aware Partitioning for Monitoring Massive Network Data Streams • OC-786 Networks • 100 million packets/sec • 2x40 Gbit/sec • Query plan partitioning • Issues: “heavy” operators, non-uniform resource consumption • Data stream partitioning Data Streams: Lecture 15
Let’s partition the data… SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len), MIN(timestamp), MAX(timestamp) ... FROM TCP GROUP BY time, srcIP, destIP, srcPort, destPort • Computes packet summaries between src and dest for network monitoring • Round robin partitioning -> worst case a single flow results in n partial flows Data Streams: Lecture 15
And, we might want a HAVING… SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len), MIN(timestamp), MAX(timestamp) ... FROM TCP GROUP BY time, srcIP, destIP, srcPort, destPort HAVING OR_AGGR(flags) = ATTACK_PATTERN • Round robin partitioning -> no node can apply HAVING • CPU and network load on final aggregator is high Data Streams: Lecture 15
So, let’s partition better… SELECT time, srcIP, destIP, srcPrt, destPort, COUNT(*), SUM(len), MIN(timestamp), MAX(timestamp) ... FROM TCP GROUP BY time, srcIP, destIP, srcPort, destPort HAVING OR_AGGR(flags) = ATTACK_PATTERN • What about partitioning on : srcIP, destIP, srcPort, destPort (partition flows)? • Yeah! Nodes can compute and apply HAVING locally … • But, what if I have more than one query? Data Streams: Lecture 15
But I need to run lots of queries… • Large number of simultaneous queries are common (i.e. 50) • Subqueries place different requirements on partitioning • Dynamic repartitioning for each query? • That’s what the parallel DBs do… • Splitting 80 Gbit/sec -> specialized network hardware • Partition stream once and only once… Data Streams: Lecture 15
Partitioning Limitations • Program partitioning in FPGAs • TCP fields (src, dest IP) - ok • Fields from HTTP – not ok • Can’t re-partition every time the workload changes Data Streams: Lecture 15
Query-Aware Partitioning • Analysis framework • Determine optimal partitioning • Partition-aware distributed query optimizer • Takes advantage of existing partitions Data Streams: Lecture 15
Query-Aware Partitioning • Analysis framework • Determine optimal partitioning • Partition-aware distributed query optimizer • Takes advantage of existing partitions • Compatible partitioning • Maximizes amount of data reduction done locally • Formal definition of compatible partitioning • Compatible partitioning – aggregations & joins Data Streams: Lecture 15
GS Uses Tumbling Windows (only) SELECT tb, srcIP, destIP, sum(len) FROM PKT GROUP BY time/60 as tb, srcIP, destIP • Time attribute is ordered (increasing) SELECT time, PKT1.srcIp, PKT1.destIP, PKT1.len + PKT2.len FROM PKT1 JOIN PKT2 WHERE PKT1.time = PKT2.time and PKT1.srcIP = PKT2.srcIP and PKT1.destIP = PKT2.destIP Data Streams: Lecture 15
Query Example flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
Query Example flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP • Which partitioning scheme is optimal for each of the queries? heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
Query Example flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP • How to reconcile potentially conflicting partitioning requirements? heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
Query Example flows: SELECT tb, srcIP, destIP, COUNT(*) as cnt FROM TCP GROUP BY time/60 as tb, srcIP, destIP • How can we use information about existing partitioning in a distributed query optimizer? heavy_flows: SELECT tb, srcIP, max(cnt) as max_cnt FROM flows GROUP BY tb, srcIP flow_pairs: SELECT S1.tb, S1.srcIP, S1.max_cnt, S2.max_cnt FROM heavy_flows S1, heavy_flows S2 WHERE S1.srcIP = S2.srcIP and S1.tb = S2.tb+1 Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
What if we could only partition on destIP? Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
Partition compatibility SELECT tb, srcIP, destIP, sum(len) FROM PKT GROUP BY time/60 as tb, srcIP, destIP • Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union • (srcIP, destIP, srcPort, destPort) can’t aggregate locally Data Streams: Lecture 15
Partition compatibility SELECT tb, srcIP, destIP, sum(len) FROM PKT GROUP BY time/60 as tb, srcIP, destIP • Partitioning on (time/60, srcIP, destIP) -> execute aggregation locally then union • (srcIP, destIP, srcPort, destPort) can’t aggregate locally • P is Compatible with Q if for every time window, the output of Q is equal to a stream union of the output of Q running on partitions produced by P Data Streams: Lecture 15
Should we partition on temporal attributes? • If we partition on temporal atts: • Processor allocation changes with time epochs • May help avoid bad hash fcns • Might lead to incorrect results if using panes • Tuples correlated in time tend to be correlated on temporal attribute – bad for load balancing • Exclude temporal attr from partitioning Data Streams: Lecture 15
What partitionings work for aggregation queries? • Group-bys on scalar expressions of source input attr • Ignore grouping on aggregations in lower-level queries • Any subset of a compatible partitioning is also compatible SELECT expr1, expr2, .., exprn FROM STREAM_NAME WHERE tup_predicate GROUP BY temp_var, gb_var1, ..., gb_varm HAVING group_predicate Data Streams: Lecture 15
What partitionings work for join queries? • Equality predicates on scalar expressions of source stream attrs • Any non-empty subset of a compatible partitioning is also compatible • Need to reconcile partitioning of S and R SELECT expr1, expr2, .., exprn FROM STREAM1 AS S{LEFT|RIGHT|FULL}[OUTER] JOIN STREAM2 as R WHERE STREAM1.ts = STREAM2.ts and STREAM1.var11 = STREAM2.var21 and STREAM1.var1k = STEAM2.var2k and other_predicates Data Streams: Lecture 15
Now, multiple queries… tcp_flows: SELECttb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len) FROM TCP GROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort {sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)} flow_cnt: SELECttb, srcIP, destIP, count(*) FROM tcp_flows GROUP BY tb, srcIP, destIP {sc_exp(srcIP), sc_exp(destIP)} {sc_exp(srcIP), sc_exp(destIP)} Result: Data Streams: Lecture 15
Now, multiple queries… tcp_flows: SELECttb, srcIP, destIP, srcPort, destPort, COUNT(*), sum(len) FROM TCP GROUP BY time/60 as tb, srcIP, destIP, srcPort, destPort {sc_exp(srcIP), sc_exp(destIP), sc_exp(srcPort), sc_exp(destPort)} • Fully compatible partitioning set likely to be empty • Partition to minimize cost of execution flow_cnt: SELECttb, srcIP, destIP, count(*) FROM tcp_flows GROUP BY tb, srcIP, destIP {sc_exp(srcIP), sc_exp(destIP)} Data Streams: Lecture 15
Query Plan Transformation Main idea: push aggregation operator below merge to allow aggregations to execute independently on partitions Main idea: partial aggregates (think panes) Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008
Performance Data Streams: Lecture 15 Figure Credit: Query-Aware Partitioning for Monitoring Massive Network Data Streams, Johnson, et al. SIGMOD 2008