100 likes | 190 Vues
Join us to explore 4 advanced continuous queries over data streams in various languages. View queries in different languages and discuss features, tradeoffs, and more. Stream Query Repository details provided.
E N D
SWiM Panel on Stream Query Languages Jennifer Widom Stan Zdonik Carlo Zaniolo Dennis Shasha Mike Franklin
Format • Four representative continuous queries over data streams • I’ll present them in English • We’ll all present or them in our favorite query language • Demos at lunch • As we go we’ll develop a list of the most salient language features, differences, issues, tradeoffs, etc. (need scribe at board)
Stream Query Repository • The four queries were chosen from a repository we’ve been creating at Stanford • Informal description + relational schemas for four applications • English description of queries • Queries in various languages – you write them, send us the URL http://www-db.stanford.edu/stream/sqr
The Queries • Windowing and aggregation, self-join or subquery • Packets(pID, length, [time]) • Generate the stream of packets whose length is greater than twice the average packet length over the last 1 hour • Windowing, stream “splitting,” stored relation • SquirrelSensors(sID, region, [time]) • SquirrelType(sID, type) // stored relation • Create an alert when more than 20 type ‘A’ squirrels are in Jennifer’s backyard
The Queries (cont’d) • Stream self-joins • SquirrelChirps(sID, loc, [time]) • Stream an event each time 3 different squirrels within a pairwise distance of 5 meters from each other chirp within 10 seconds of each other • Super bonus query – windowing, stream transformations • Packets(pID, src, dest, length, [time]) • Log packet count and total packet length for each “flow” – new flow defined by > 2 minute gap for (src,dest) pair
Query 1 in CQL • Packets(pID, length, [time]) • Generate the stream of packets whose length is greater than twice the average packet length over the last 1 hour Select Istream() From Packets [Now] Where length > (Select 2 Avg(length) From Packets [Range 1 Hour])
Query 2 in CQL • SquirrelSensors(sID, region, [time]) • SquirrelType(sID, type) // stored relation • Create an alert when more than 20 type ‘A’ squirrels are in Jennifer’s backyard Select alert() From SquirrelSensors [Partition by sID Rows 1] SS, SquirrelType ST Where SS.region = 'JW yard' And SS.sID = ST.sID And ST.type = 'A' Having Count() > 20
Query 3 in CQL • SquirrelChirps(sID, loc, [time]) • Stream an event each time 3 different squirrels within a pairwise distance of 5 meters from each other chirp within 10 seconds of each other Select Rstream(S1.sID, S1.loc, S2.sID, S2.loc, S3.sID, S3.loc) From SquirrelChirps [Now] S1, SquirrelChirps [Range 10 Seconds] S2, SquirrelChirps [Range 10 Seconds] S3 Where S1.sID <> S2.sID And S1.sID <> S3.sID And S2.sID <> S3.sID And distance(S1.loc, S2.loc) < 5 meters And distance(S1.loc, S3.loc) < 5 meters And distance(S2.loc, S3.loc) < 5 meters
Query 4 (super bonus) in CQL • Packets(pID, src, dest, length, [time]) • Log packet count and total packet length for each “flow” – new flow defined by > 2 minute gap for (src,dest) pair FlowStartStream: Istream( (Select P., 1 as flowStartBit From Packets [Now] as P Where (src, dest) Not In (Select src, dest From Packets [Range 2 Minutes] Where pID <> P.pID)) Union (Select P., 0 as flowStartBit From Packets [Now] as P Where (src, dest) In (Select src, dest From Packets [Range 2 Minutes] Where pID <> P.pID)))
Query 4 in CQL (cont’d) FlowIDStream: Select Istream(P., flowID) From FlowStartStream [Now] P, (Select src, dest, Count() as flowID From FlowStartStream Where flowStartBit = 1 Group By src, dest) ID Where P.src = ID.src and P.dest = ID.dest FlowTable: Select src, dest, Count(), Sum(length) From FlowIDStream Group By src, dest, flowID