1 / 29

290 likes | 434 Vues

CS 519: Big Data Exploration and Analytics. Relational Query Languages. Announcements . The paper assignments are posted on the course Web site. One of your top 3 choices or a paper close to them. Some project topics are posted in Piazza. Presentations . Use examples

Télécharger la présentation
## CS 519: Big Data Exploration and Analytics

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**CS 519: Big Data Exploration and Analytics**Relational Query Languages**Announcements**• The paper assignments are posted on the course Web site. • One of your top 3 choices or a paper close to them. • Some project topics are posted in Piazza.**Presentations**• Use examples • Introducing the problem • Previous solutions • Proposed solutions • Weak points • … • Use figures, colors, … David Paterson’s guide on “How to give a bad presentation” http://www.cs.berkeley.edu/~pattrsn/talks/BadTalk.pdf**Formal Relational Query Languages**• Relational Algebra • Datalog (recursion-free with negation) • Relational calculus • Equivalency Theorem: RA, non-recursive datalog with negation and RC express the same set of queries. • Relational queries.**Relational Algebra (RA)**• Used by RDBMS to execute queries • Six operators • Selection σ • Projection Π • Join ∞ • Union • Difference – • Renaming ρ (for named perspective)**Movie(mid, title, year, total-gross)**Actor(aid, name, b-year) Plays(mid,aid) Datalog • Each tuple in database is a fact Movie(236878, ‘Godfather I’, 1972, 40000000) Movie(879900, ‘Godfather II’, 1974, 3900000) Actor(090988,’Robert De Niro’, 1943) • Each query is a rule Movies that were produced in 1998 and made more than $2,000. Q1(y):- Movie(x,y,1998,z),z > 2000.**Movie(mid, title, year, total-gross)**Actor(aid, name, b-year) Plays(mid,aid) Relational calculus • Actors who played in a movie with total gross of $2,000. • Actors who played only in movies produced in 1990.**Conjunctive queries (CQ)**• One datalog rule. • SELECT-DISTINCT-FROM-WHERE. • Select/project/join (σ, Π, ∞) fragment of RA. • Existential/ conjunctive fragment of RC • Set Semantics • Queries take sets as inputs and returns sets for output. (no duplicate elimination) • Bag semantics • Duplicates in input/ output.**Movie(mid, title, year, total-gross)**Actor(aid, name, b-year) Plays(mid,aid) CQexamples Actors who played in “LTR”. Q7(y):- Actor(x,y,z),Plays(t,x),Movie(t,’LTR’,w,f). Actors who played in a movie with total gross of $2000. Non-CQ: Actors who played in some movies with only one actor.**Conjunctive queries (CQ)**• There is not any comparison operator (<, ≠, …) in CQ. If used the family is called CQ<, CQ≠, … • Extensively studied family of queries. • Many problems can solved efficiently for this family and some of its variants.**Query equivalency and containment**• Interesting and long standing problems in database management. • Queries q1 and q2 are equivalent if and only if for every database instance I, q1(I) = q2(I) • Shown as q1 q2 • Query q1is contained in q2if and only if for every database instance I, q1(I) q2(I) • Shown as q1 q2**Containment examples**Is q1 q2? q1(x):- R(x,y),R(y,z),R(z,w). q2(x):- R(x,y),R(y,z). q1(x):-R(x,y),R(y,’Joe’). q2(x):-R(x,y),R(y,z). q1(x):- R(x,y),R(y,z),R(z,x). q2(x):- R(x,y),R(y,x).**Containment examples**Is q1 q2? q1(x):- R(x,y),R(y,y). q2(x):- R(x,y),R(y,z),R(z,t).**Query semantics**• Rules based form of a CQ q(u):- R1(u1),…,Rn(un). • uiis shorthand for (x,y,…,z). • Called free tuple, e.g. (x,y) q1(x):- R(x,y),R(y,y). • Valuation v is a total function from a set of variables to domain (dom) and identity on constants in the domain.**Query semantics**• The set of variables in q is shown as var(q) e.g. var(q1) = {x,y} q1(x):- R(x,y),R(y,y). • The image of database instance I under query q, q(u):- R1(u1),…,Rn(un) is q(I) = {v(u)| v is a valuation over var(q), for each }**Query homomorphism**• A homomorphism is a function from such that for each atom R(x, y, …) in the query q1 , there is an atom R(h(x), h(y), …) in the query q2 . • h leaves the constants in q2intact.**Homomorphism example**q1(x):- R(x,y),R(y,z),R(z,w). q2(x):- R(x,y),R(y,z). We treat head variables, ‘x’, as constants. They must be the same in q1 and q2 .**Homomorphism Theorem**• Given CQs q1 and q2, we have q1 q2 if and only if there exists a homomorphism . • Example: q1(x):-R(x,y),R(y,z),R(z,w). q2(x):-R(x,y),R(y,z). • Since is a homomorphism, we have .**Homomorphism examples**q1(x):-R(x,y),R(y,’Joe’). q2(x):-R(x,y),R(y,z). q1(x):- R(x,y),R(y,z),R(z,x). q2(x):- R(x,y),R(y,x). There is no homomorphism:**Homomorphism examples**Is q1 q2? q1(x):- R(x,y),R(y,y). q2(x):- R(x,y),R(y,z),R(z,t).**Homomorphism Theorem**• Given CQs qand q’, we have qq’if and only if there exists a homomorphism . • Proof: For each w in q(I), there is a valuation v that maps free tuples in q to I such that v(u)= w. Thus, h(v) will map free tuples in q’into I and h(v(u’)) = w, where w is in q’(I). Using canonical instances: read the book page 117.**Checking containment**• Check if there exists a homomorphism between queries. • Bad news: The problem is NP-complete, proved by reducing from 3-SAT. • Since the size of queries are relatively small, the process is sufficiently fast.**Query minimization**• A conjunctive query q is minimal if for every other conjunctive query q’ , if q’ q, q’ has at least as many atoms as q. • Examples: q1(x):- R(x,z),R(z,t),R(x,w). q2(x):- R(x,z),R(z,t),R(x,’Joe’).**Query minimization algorithm**1. Remove an atom from q. Let’s call new query q’. 2. We have q q’. 3. Check to see if q’ q, if it is then remove the atom permanently. • Example: q1(x):- R(x,z),R(z,t),R(x,w). q2(x):- R(x,z),R(z,t). We have a homomorphism from q1 to q2.**Query minimization**• RDBMSs do not minimize input queries • Set semantics instead of bag semantics • Users do not generally submit non-minimal queries • But, it has a lot of applications in rewriting queries using views. • Data integration • Query optimization • …**Movie(mid, title, year, total-gross)**Actor(aid, name, b-year) Plays(mid,aid) Larger families: UCQ • CQ with union Movies that were produced in 1998 or made more than $2,000. Q1(y):- Movie(x,y,1998,z). Q1(y):- Movie(x,y,z,t), t > 2000. • We can extend homomorphism theorem for UCQs.**Homomorphism Theorem for UCQ**• Given UCQs and , we have if and only if for every there is a , such that • Sagiv and Yannakakis – 1981 • Thus, we can use apply homomorphism theorem to each CQ in a UCQ to check the containment. • Containment checking for UCQs is NP-complete.**Larger families: relational queries**• Containment checking for relational queries is undecidable. • Proved by reduction from finite satisfiability problem: • Given a query, is there any (finite) database where the query as at least one answer.

More Related