190 likes | 320 Vues
This paper discusses the implementation of explicit window management in Niagara's query execution framework. By utilizing window IDs and flexible data processing facilities, we can efficiently handle out-of-order data. The use of punctuation signals the end of processing windows, allowing for seamless integration with standard operators. We present three example queries focusing on bid aggregations over time windows, demonstrating the adaptability of our approach with both internal and external timestamps. This allows users to gain insights into bidding patterns across different auction sites.
E N D
Windows in Niagara Jin (Jenny) Li, David Maier, Vassilis Papadimos, Peter Tucker, Kristin Tufte
Overview • Make Windows Explicit • Tag tuples with a window id • Standard operators don’t know about different kinds of windows - work with window ID attribute • Use Punctuation Infrastructure • Punctuation signals end of window • No need for specialized window operators – just use punctuate-aware operators • Flexible • Window on system time, external time or tuple-based • Data can arrive and be processed out of order
Niagara Control Structure • Push-based (pipelined) system. • Each operator is a thread. • Operators are connected by queues of tuples. • Operators wait on input queue, when tuple is ready, it is processed and result is inserted in output queue select unnest (path expr) streamscan
bid bidderid:501 price: $5.00 bid bidderid:501 price: $5.00 Niagara Query Execution Query: Find all bids that bidder with id = 501 has made. select (bidderid = 501) Uses a path expression to extract matching elements from input tuples. unnest (bid.bidderid) streamscan Reads and parses data from a stream. <bid> <bidderid> 501 </bidderid> <itemid> 42 </itemid> <price> $10.00 </price> </bid>
speaker slide_number * Kristin
bid auctionid bidderid price datetime auctionsite NEXMark Schema Streams: auction itemname seller category id description reserve expires auctionsite Note: bid.datetime and auction.expires are time generated at the source sites.
Three Example Queries • All three queries are window aggregates, specifically, time-based window count • Query 1: use internal system time and internal punctuations • Query 2: use external timestamp and internal punctuations • Query 3: use external timestamp and external punctuations
Query 1: Select the number of bids on each item in the past five minutes. Update the results every minute. Groupby(B1.auctionid, B.wid, count(*)) Bucketizer Add window ranges to tuples Punctuate end of window unnest (auctionid) Punctuator/Timestamper Add timestamp field to tuple Punctuate at end of minute Timer Timestamp = CURRENT_TIME streamscan (Bid) B1 SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid WindowGroupby(B1.auctionid, count(*)) unnest (auctionid) streamscan (Bid) B1
Query 1 Details auctionid count 10 1 15 2 * * * 1-1 5:00 10 T1 * (,5:00] * 5:00 5:00 10 15 1-5 T3 T1 * ( ,5:00] 5:00 5:00 15 T2 T2 5:00 5:00 15 15 1-5 1-5 T3 T2 5:00 T3 5:00 T1 T3 T2 T1 SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid T1 T2 T3 groupby(B1.auctionid, B1.winId, count(*)) bucketizer unnest (auctionid) punctuator/timestamper 5:01 5:00 5:01 5:00 TS = timer Timestamp = CURRENT_TIME streamscan (Bid) B1
Query 1 vs. Query 2 Query 1: SELECT B1.auctionid, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE] B1 GROUP BY B1.auctionid Select the number of bids on each item in the past five minutes. Update the results every minute. Query 2: SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite Select the number of bids made at each auction site in the past five minutes. Update the results every minute. “CQL2004”
Query 2: Select the number of bids made at each auction site in the past five minutes. Update the results every minute. groupby(B1.auctionsite, B.winId, count(*)) bucketizer Add window ranges to tuples Punctuate end of window SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite unnest (auctionsite, datetime) punctuator/enforcer Enforce datetime > current timestamp Punctuate at end of minute timer Timestamp = CURRENT_TIME – 5 MINUTES streamscan (Bid) B1
Query 2: Select the number of bids made at each auction site in the past five minutes. Update the results every minute. groupby(B1.auctionsite, B.winId, count(*)) bucketizer Add window ranges to tuples Punctuate end of window SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite punctuator/enforcer Enforce datetime > current timestamp Punctuate at end of minute timer Timestamp = CURRENT_TIME – 5 MINUTES unnest (auctionsite, datetime) streamscan (Bid) B1
Query 2 vs. Query 3 Query 2: SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 1 MINUTE ATTR datetime SLACK 5 MINUTES] B1 GROUP BY B1.auctionsite Select the number of bids made at each auction site in the past five minutes. Update the results every minute. Query 3: SELECT B1.auctionsite, count(*), B1.wid FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite Select the number of bids made at each auction site in the past five minutes. Update the results every five minutes.
T1 T2 Query 3: Select the number of bids made at each auction site in the past five minutes. Update the results every five minutes. groupby(B1.auctionsite, B.winId, count(*)) bucketizer Add window ranges to tuples Punctuate end of window SELECT B1.auctionsite, count(*) FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite unnest (auctionsite, datetime) streamscan (Bid) B1 Site A … … … … … Site B Site C
Auctionsite, count, wid A A * * 2-2 1-1 A B B A 5:04 5:02 5:01 5:07 2-2 1-1 1-1 1-1 T2A T1A T1B T2B A (,5:05] A A B B 5:01 5:04 5:02 5:07 T2B T1B T2A T1A A (,5:10] T2 T1 window 1, site A: T1A Query 3 Details A, 1, 1 A, 1, 2 window 1, site B: T1B window 1, site B: T2B window 2, site A: T2A groupby(B1.auctionsite, B.winId, count(*)) SELECT B1.auctionsite, count(*), B1.wid FROM Bid [RANGE 5 MINUTES SLIDE 5 MINUTES ATTR datetime] B1 GROUP BY B1.auctionsite bucketizer Add window ranges to tuples Punctuate end of window unnest (auctionsite, datetime) Legend: Window 1: 5:00 – 5:05 Window 2: 5:05 – 5:10 streamscan (Bid) B1 Site A … … … … … Site B Site C
Discussion • Bucketizer • Apply a function to the stream • Encapsulate window semantics • Punctuate-Aware e.g. punctuation on time -> punctuation on wid • Wid is used as a grouping/join attribute • Punctuator • Adds timestamp as an attribute - optional • Enforce punctuations - optional • Converts stream semantics to punctuations • Outputs punctuations • Punctuations signal the end of windows, results are output and state is purged
Conclusions • Process window queries without specialized window operators • Flexible window semantics • Use punctuate-aware operators, introduce minimum number of new operators
Future Work • Semantics of window operators • Performance of different implementations • Study affect of disorder • Groupby ? Window
Questions? • … …