280 likes | 409 Vues
This paper discusses methods for minimizing view sets in query answering systems without sacrificing performance. Using a web-caching scenario as a context, the study demonstrates how to remove redundant cached query results while preserving query-answering capabilities. It explores concepts like query-answering power and p-containment, distinguishing between traditional query containment and the proposed methods. Key algorithms and strategies for maintaining efficiency in view set management are presented, providing insights into theoretical implications and practical applications in query optimization.
E N D
Minimizing View Sets without Losing Query-Answering Power Chen Li Stanford University joint work with Mayank Bawa and Jeff Ullman ICDT'2001, London, UK
A web-caching scenario user query Client cache source query answer Server
Client Cached query results: Q1(T,A,Pr) :- book(T,A,Pub,Pr) Q2(T,A,Pr) :- book(T,A,prenhall,Pr) Q3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2) Source relation: Book(Title, Author, Pub, Price)
What query results to remove? Book(Title, Author, Pub, Price) Cached query results: Q1(T,A,Pr) :- book(T,A,Pub,Pr) Q2(T,A,Pr) :- book(T,A,prenhall,Pr) Q3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2) • Q2 Q1 • Remove Q2? Cannot answer query: • Q(T,Pr) :- book(T,smith,prenhall,Pr)
How about removing Q3? Book(Title, Author, Pub, Price) Cached query results: Q1(T,A,Pr) :- book(T,A,Pub,Pr) Q2(T,A,Pr) :- book(T,A,prenhall,Pr) Q3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2) Compute Q3 using Q2: Q3(A1,A2) :- Q2(T,A1,Pr1),Q2(T,A2,Pr2) We are not losing any query-answering power!
Observations: • Traditional query-containment does not help [Chandra and Merlin, 1977] . • We should consider query-answering power. • General questions: • How to describe “query-answering power”? • How to minimize a view set without losing its query-answering power?
Rest of the talk • Answering queries using views • Query-answering power • p-containment • Relationship with traditional query containment • Minimizing a view set • p-containment relative to a set of queries • Conclusion and open problems
Answering queries using views • Conjunctive queries and views: h(X) :- g1(X1),…,gn(Xn) • Example: V1(T,A,Pr) :- book(T,A,Pub,Pr) V2(T,A,Pr) :- book(T,A,prenhall,Pr) V3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2)
Query answerability • A query Q is answerable by a view set V if we can rewrite Q using views in V [LMSS95]. • Example: V2(T,A,Pr) :- book(T,A,prenhall,Pr) V3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2) V3 is answerable by V2: V3(A1,A2) :- V2(T,A1,Pr1),V2(T,A2,Pr2)
Algorithms • Bucket algorithm [LRO96] • Inverse-rule algorithm [DG97,Qia96] • MiniCon algorithm [PL00] • SVB algorithm [Mit99] • CoreCover Algorithm [ALU00] Testing whether a query is answerable by a set of views is NP-complete.
Views are expensive to maintain • Require storage space. • Need to be kept up-to-date. We want to minimize a given view set while keeping its query-answering power.
p-containment • A view set V is p-contained in another view set W if W can answer all the queries that are answerable by V. • “p” stands for “power.” • Denoted: V p W • Two view sets are equipotent,if V p W and Wp V. • They have the same power to answer queries.
Example: V1(T,A,Pr) :- book(T,A,Pub,Pr) V2(T,A,Pr) :- book(T,A,prenhall,Pr) V3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2) {v1,v2,v3}p {v1,v2} {v1,v2} p {v1,v2,v3} Therefore: {v1,v2,v3} and {v1,v2} are equipotent.
Lemma: V p W iff each view in V can be answered by W. • Implies an algorithm for testing p-containment. • Assuming view sets are finite. • Theorem: Testing V p W is NP-complete.
p-containment and query containment V1(T,A,Pr) :- book(T,A,Pub,Pr) V2(T,A,Pr) :- book(T,A,prenhall,Pr) V3(A1,A2) :- book(T,A1,prenhall,Pr1), book(T,A2,prenhall,Pr2) • Query containment does not imply p-containment {v1} and {v2} • p-containment does not imply query containment {v2} and {v3}
Minimizing a view set • Keep removing views from the view set while retaining the equipotence. • Might have multiple equipotent minimals V1(A) :- r(A,B) V2(B) :- r(A,B) V3(A,B) :- r(A,X),r(Y,B) {V1,V2,V3} has two equipotent minimals: {V1,V2}, {V3}
p-containment relative to queries Queries: Q={Q1,Q2,…} V = {V1,V2,…,Vm} W = {W1,W2,…,Wn} V is p-contained in W w.r.t. Q if the queries in Q that are answerable by V are also answerable by W.
Example of relative p-containment Relations: car(Make,Dealer) loc(Dealer,City) Queries: Q1(D,C) :- car(toyota,D),loc(D,C) Q2(D,C) :- car(honda,D), loc(D,C) Views: V = {V1,V2}, V1 = Q1, V2 = Q2 W = {W1} W1(M,D,C) :- car(M,D),loc(D,C)
Testing relative p-containment • Q is finite: test by the definition. • Q is infinite?
Parameterized queries • Motivation: web search forms. • A PQ is a conjunctive query with placeholders. • Example: q(D) :- car($M,D),loc(D,$C) • Placeholders $M,$C, replaced by constants • Instances: q(D) :- car(toyota,D),loc(D,sf) q(D) :- car(honda,D),loc(D,pa) • The domain of each placeholder is infinite. • Thus, represent infinite number of queries.
Q: q(D) :- car($M,D),loc(D,$C) • v1(M,D,C) :- car(M,D),loc(D,C) • Answer all instances of Q. • v2(M,D) :- car(M,D),loc(D,sf) • Answer some instances of Q. • Answerable instances of Q are instances of: q(D) :- car($M,D),loc(D,sf) • v3(M) :- car(M,D),loc(D,sf) • Answer no instances of Q.
Assume queries are generated by one PQ; • Results easily extendable to the case with finite set of PQs. • Complete answerability of a PQ using views • V can answer all instances of a PQ Q. • Example: q(D) :- car($M,D),loc(D,$C) v1(M,D,C) :- car(M,D),loc(D,C)
An algorithm for testing complete answerability • Replace each placeholder with a new distinct constant, get a canonical instance I; • Test if I is answerable by V. Example: PQ: q(D) :- car($M,D),loc(D,$C) View: v1(M,D,C) :- car(M,D),loc(D,C) Canonical instance: q(D) :- car(m0,D),loc(D,c0) Rewriting: q(D) :- v1(m0,D,c0)
Partial answerability • Some instances of Q are answerable by V q(D) :- car($M,D),loc(D,$C) v2(M,D) :- car(M,D),loc(D,sf) • Theorem: All the answerable instances of a PQ using V are instances of a finite set of PQs, s.t. each of them is completely answerable by V. q(D) :- car($M,D),loc(D,sf)
All instances of Q answerable instances PQ1 PQ2 a parameterized query Q … PQk V={V1,…,Vn} An algorithm for finding the finite set of PQs.
Testing p-containment w.r.t. PQ • Find the PQs whose instances are all the instances of Q that are answerable by V. • For each of the PQs, test if it is completely answerable by V. • Details are in the paper.
Conclusion • Introduced p-containment, which is different from query containment. • Showed how to minimize a view set without losing query-answering power. • Developed an algorithm for testing relative p-containment w.r.t. instances of PQs. • Extended to MCR-containment.
Open problems • Find a view subset with lowest “cost.” • If views are not given, find the best views to materialize.