100 likes | 199 Vues
A Q uery A D atabase S ystem for O rder. Dennis Shasha joint work with Alberto Lerner {lerner,shasha}@cs.nyu.edu. Idea. Whatever can be done on a table can be done on an ordered table ( arrable ). Not vice - versa. A Q uery – query language on arrables Expresses many queries easily
E N D
AQuery A Database System for Order Dennis Shashajoint work withAlberto Lerner {lerner,shasha}@cs.nyu.edu
Idea • Whatever can be done on a table can be done on an ordered table (arrable). Not vice-versa. • AQuery – query language on arrables • Expresses many queries easily • Easy to optimize
Packets pID a b c d e f g len 10 5 12 32 5 7 10 time 3 4 7 10 14 22 23 Query 1 • Find the packets whose length is greater than twice the average packet length over the last 1 hour SELECT *FROM Packets ASSUMING ORDER timeWHERE len > 2*avgs(range(3600,time),len)) vectors ordering vector for this query Semantics are column-oriented as opposedto row-oriented
Vector Expressions ... WHERE len > 2*avgs(range(3600,time),len)) time 3 4 7 10 14 22 23 v1 0 1 2 3 3 1 2 v2 10 7.5 9 14.7 13.5 6 7.3 len 10 5 12 32 5 7 10 WHERE F F F T F F F 3 positionsand thecurrent * v1=range(3600,time)windows’ ranges v2=avgs(v1,len)last hour avg len len > 2*v2filter out false positions * Using range 10 here for the sake of the example
Query 2 • Find when more than 20 type ‘A’ squirrels were at Jennifer’s backyard. Suppose a flag +1 signals squirrel entry, and –1, exit. SquirrelSensor sID ... region ... time ... flag ... SquirrelType sID ... type ... SELECT time[index(sums(flag)>20)]FROM SquirrelSensor SS, SquirrelType ST ASSUMING ORDER time WHERE SS.sID=ST.sID AND ST.type = ‘A’ AND SS.region = ‘JWyard’
Vector Indexing SELECT time[index(sums(flag)>20)] ... squirrelin squirrelout flag +1 +1 -1 +1 +1 ... +1 +1 -1 -1 sums 1 2 1 2 3 ... 20 21 20 19 >20 F F F F F ... F T F F time time[i] Jennifer’s backyard i-th positionis true
Query 3 • Find when 3 different squirrels within a pair-wise distance of 5 meters from each other chirp within 10 seconds of each other SquirrelChirps sID ... loc ... time ... SELECT S1.sID, S1.loc, S2.sID, S2.loc, S3.sID, S3.locFROM SquirrelChirps S1, SquirrelChirps S2, SquirrelChirps S3 WHERE S1.sID<>s2.sID AND S1.sID<>s3.sID AND s2.sID<>s3.SID AND S1.time-S2.time < 10 AND S1.time-S3.time < 10 AND S2.time-S3.time < 10 AND distance(S1.loc,S2.loc)<5 AND distance(S1.loc,S3.loc)<5 AND distance(S2.loc,S3.loc)<5
Vector Fields r gbya (r) Flatten(gbya (r)) a z y y x z x y b 1 2 3 4 5 6 7 b 1 5 2 3 7 4 6 a z z y y y x x b 1 5 2 3 7 4 6 a z y x 4 6 Non-grouped columns become vectorfields respecting order. Flatten brings the arrable back to 1NF
Query 4 – use of vector fields • Create a log of flow information. A flow from src to dest ends after a 2-minutes silence Packets pID ... src ... dest ... len ... time ... SELECT source, dest, count(*), sum(len)FROM Packets ASSUMING ORDER time GROUP BY source, dest, sums(deltas(time)) > 120)
And Streams? • AQuery has no special facilities for streaming data, but it is expressive enough. • Idea for streaming data is to split the tables into tables that are indexed with old data and a buffer table with recent data. • Optimizer works over both transparently.