200 likes | 299 Vues
An efficient approach to manage XML data by caching frequently used results and counting frequent query patterns to improve the search engine and XML query system. Preliminaries and experimental results are discussed.
E N D
Approximate Counting of Frequent Query Patterns over XQuery Stream Liang Huai Yang, Mong Li Lee, Wynne HSU DASFAA 2004 Speaker:Ming Jing Tsai
Introduction • Efficient approach to improve XML management system • Cache frequently retrieved results • Frequent query patterns • application • Search engine • XML query system
Preliminaries • S = QPT1,QPT2,…,QPTN • Query pattern trees(QPT) • Label:{“*”,”//”} ∪tagset • Rooted subtree(RST) • root(RST) = root(QPT) • RSTV’ QPTV , RSTE’ QPTE
book title author price book title author price QPT book book author section title price fn ln title QPT1 QPT2 QPT3 RST
Approximate Counting • rst.count app ≧ (σ-ε)N • rst.count app ≧ rst.counttrue-Εn • XQuery stream divided into buckets of w = • bcurrent =
book book 1 2 3 8 title title author author price price D-GQPT book 1 author section price 2 3 6 8 title 4 5 7 fn ln title RST3 1,2,-1,3,-1,8,-1
book book 1 2 3 8 title title author author price price D-GQPT book 1 author section price 2 3 6 8 title 4 5 7 fn ln title RST3 1,2,-1,4,-1,9,-1
1 1 1 1 1 2 3 6 8 Grmlne = Grmlne = Grmlne Grmlne Gjoin Gjoin 1 1 1 1 1 1 1 1 1 Gjoin 1 1 1 2 3 6 3 2 3 2 3 3 8 6 8 3 6 8 6 8 6 3 3 6 5 4 7 Gjoin Grmlne Gjoin 1 1 3 3 6 8 7 4 5 4 4 ECTree
Candidate Generation • Rightmost active leaf node expansion Grmlne( )= • Gjoin ( )= | = X j = i+1,…,N
Prune • RSTK+1 doesn’t exist in ECTree • RSTk+1.Δ = bcurrent - β • | RSTK+1.tidlist| < β prune • RSTK+1 exists in ECTree • RSTK+1.countapp = RSTK+1. countapp+|RSTK+1.tidlist| • RSTK+1.countapp +RSTk+1.Δ < bcurrent prune • Join result with RSTK+1 • subtree induced by RSTK+1
1 1 1 1 1 2 3 6 8 Grmlne = Grmlne = Grmlne Grmlne Gjoin Gjoin 1 1 1 1 1 1 1 1 1 Gjoin 1 1 1 2 3 6 3 2 3 2 3 3 8 6 8 3 6 8 6 8 6 3 3 6 5 4 7 Gjoin Grmlne Gjoin 1 1 3 3 6 8 7 4 5 4 4 ECTree
Experiment • P4 2.4GHz, 1GB RAM, WINXP • DBLP DTD:98 nodes • Shakespears’ Play DTD: 23 nodes