290 likes | 451 Vues
CS 519: Big Data Exploration and Analytics. Relational Query Languages. Announcements . The paper assignments are posted on the course Web site. One of your top 3 choices or a paper close to them. Some project topics are posted in Piazza. Presentations . Use examples
E N D
CS 519: Big Data Exploration and Analytics Relational Query Languages
Announcements • The paper assignments are posted on the course Web site. • One of your top 3 choices or a paper close to them. • Some project topics are posted in Piazza.
Presentations • Use examples • Introducing the problem • Previous solutions • Proposed solutions • Weak points • … • Use figures, colors, … David Paterson’s guide on “How to give a bad presentation” http://www.cs.berkeley.edu/~pattrsn/talks/BadTalk.pdf
Formal Relational Query Languages • Relational Algebra • Datalog (recursion-free with negation) • Relational calculus • Equivalency Theorem: RA, non-recursive datalog with negation and RC express the same set of queries. • Relational queries.
Relational Algebra (RA) • Used by RDBMS to execute queries • Six operators • Selection σ • Projection Π • Join ∞ • Union • Difference – • Renaming ρ (for named perspective)
Movie(mid, title, year, total-gross) Actor(aid, name, b-year) Plays(mid,aid) Datalog • Each tuple in database is a fact Movie(236878, ‘Godfather I’, 1972, 40000000) Movie(879900, ‘Godfather II’, 1974, 3900000) Actor(090988,’Robert De Niro’, 1943) • Each query is a rule Movies that were produced in 1998 and made more than $2,000. Q1(y):- Movie(x,y,1998,z),z > 2000.
Movie(mid, title, year, total-gross) Actor(aid, name, b-year) Plays(mid,aid) Relational calculus • Actors who played in a movie with total gross of $2,000. • Actors who played only in movies produced in 1990.
Conjunctive queries (CQ) • One datalog rule. • SELECT-DISTINCT-FROM-WHERE. • Select/project/join (σ, Π, ∞) fragment of RA. • Existential/ conjunctive fragment of RC • Set Semantics • Queries take sets as inputs and returns sets for output. (no duplicate elimination) • Bag semantics • Duplicates in input/ output.
Movie(mid, title, year, total-gross) Actor(aid, name, b-year) Plays(mid,aid) CQexamples Actors who played in “LTR”. Q7(y):- Actor(x,y,z),Plays(t,x),Movie(t,’LTR’,w,f). Actors who played in a movie with total gross of $2000. Non-CQ: Actors who played in some movies with only one actor.
Conjunctive queries (CQ) • There is not any comparison operator (<, ≠, …) in CQ. If used the family is called CQ<, CQ≠, … • Extensively studied family of queries. • Many problems can solved efficiently for this family and some of its variants.
Query equivalency and containment • Interesting and long standing problems in database management. • Queries q1 and q2 are equivalent if and only if for every database instance I, q1(I) = q2(I) • Shown as q1 q2 • Query q1is contained in q2if and only if for every database instance I, q1(I) q2(I) • Shown as q1 q2
Containment examples Is q1 q2? q1(x):- R(x,y),R(y,z),R(z,w). q2(x):- R(x,y),R(y,z). q1(x):-R(x,y),R(y,’Joe’). q2(x):-R(x,y),R(y,z). q1(x):- R(x,y),R(y,z),R(z,x). q2(x):- R(x,y),R(y,x).
Containment examples Is q1 q2? q1(x):- R(x,y),R(y,y). q2(x):- R(x,y),R(y,z),R(z,t).
Query semantics • Rules based form of a CQ q(u):- R1(u1),…,Rn(un). • uiis shorthand for (x,y,…,z). • Called free tuple, e.g. (x,y) q1(x):- R(x,y),R(y,y). • Valuation v is a total function from a set of variables to domain (dom) and identity on constants in the domain.
Query semantics • The set of variables in q is shown as var(q) e.g. var(q1) = {x,y} q1(x):- R(x,y),R(y,y). • The image of database instance I under query q, q(u):- R1(u1),…,Rn(un) is q(I) = {v(u)| v is a valuation over var(q), for each }
Query homomorphism • A homomorphism is a function from such that for each atom R(x, y, …) in the query q1 , there is an atom R(h(x), h(y), …) in the query q2 . • h leaves the constants in q2intact.
Homomorphism example q1(x):- R(x,y),R(y,z),R(z,w). q2(x):- R(x,y),R(y,z). We treat head variables, ‘x’, as constants. They must be the same in q1 and q2 .
Homomorphism Theorem • Given CQs q1 and q2, we have q1 q2 if and only if there exists a homomorphism . • Example: q1(x):-R(x,y),R(y,z),R(z,w). q2(x):-R(x,y),R(y,z). • Since is a homomorphism, we have .
Homomorphism examples q1(x):-R(x,y),R(y,’Joe’). q2(x):-R(x,y),R(y,z). q1(x):- R(x,y),R(y,z),R(z,x). q2(x):- R(x,y),R(y,x). There is no homomorphism:
Homomorphism examples Is q1 q2? q1(x):- R(x,y),R(y,y). q2(x):- R(x,y),R(y,z),R(z,t).
Homomorphism Theorem • Given CQs qand q’, we have qq’if and only if there exists a homomorphism . • Proof: For each w in q(I), there is a valuation v that maps free tuples in q to I such that v(u)= w. Thus, h(v) will map free tuples in q’into I and h(v(u’)) = w, where w is in q’(I). Using canonical instances: read the book page 117.
Checking containment • Check if there exists a homomorphism between queries. • Bad news: The problem is NP-complete, proved by reducing from 3-SAT. • Since the size of queries are relatively small, the process is sufficiently fast.
Query minimization • A conjunctive query q is minimal if for every other conjunctive query q’ , if q’ q, q’ has at least as many atoms as q. • Examples: q1(x):- R(x,z),R(z,t),R(x,w). q2(x):- R(x,z),R(z,t),R(x,’Joe’).
Query minimization algorithm 1. Remove an atom from q. Let’s call new query q’. 2. We have q q’. 3. Check to see if q’ q, if it is then remove the atom permanently. • Example: q1(x):- R(x,z),R(z,t),R(x,w). q2(x):- R(x,z),R(z,t). We have a homomorphism from q1 to q2.
Query minimization • RDBMSs do not minimize input queries • Set semantics instead of bag semantics • Users do not generally submit non-minimal queries • But, it has a lot of applications in rewriting queries using views. • Data integration • Query optimization • …
Movie(mid, title, year, total-gross) Actor(aid, name, b-year) Plays(mid,aid) Larger families: UCQ • CQ with union Movies that were produced in 1998 or made more than $2,000. Q1(y):- Movie(x,y,1998,z). Q1(y):- Movie(x,y,z,t), t > 2000. • We can extend homomorphism theorem for UCQs.
Homomorphism Theorem for UCQ • Given UCQs and , we have if and only if for every there is a , such that • Sagiv and Yannakakis – 1981 • Thus, we can use apply homomorphism theorem to each CQ in a UCQ to check the containment. • Containment checking for UCQs is NP-complete.
Larger families: relational queries • Containment checking for relational queries is undecidable. • Proved by reduction from finite satisfiability problem: • Given a query, is there any (finite) database where the query as at least one answer.