260 likes | 388 Vues
As XML becomes increasingly popular for data representation, efficient retrieval mechanisms are essential. This paper discusses the importance of structural indexes in optimizing XML queries, particularly focusing on the incremental maintenance of these indexes. The authors present various index types, including the 1-index and A(k)-index, explaining their construction and benefits. They evaluate the efficiency of maintenance algorithms, addressing overlooked issues in index upkeep that can significantly enhance retrieval performance. This research aims to improve the operational efficiency of XML databases.
E N D
Incremental Maintenance of XML Structural Indexes Ke Yi1, Hao He1,Ioana Stanoi2and Jun Yang1 1Department of Computer Science, Duke University 2IBM T. J. Watson Research Center
Motivation • XML is gaining tremendously in popularity in recent years • Used to represent many kinds of data • Major DB vendors are rushing to incorporate solutions for native XML repositories and retrieval • IBM DB2, Oracle , Microsoft SQL Server • Tamino, Natix, X-Hive, …
Overview paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses
Label Path Expressions paper /paper/section/algorithm 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses
Structural Indexes • Why do we need them? • Speedup the evaluation of path expressions • Provides a structural summary of the data graph • Structural indexes • DataGuide [Goldman & Widom 97] • 1-index [Milo & Suciu 99] • A(k)-index [Kaushik et al. 02], D(k)-index [Qun et al. 03],M(k)-index [He & Yang 04] • Integration of structural indexes and inverted lists[Kaushik et al. 04] • Focus on maintenance • Has a major effect on index efficiency • Remains an overlooked issue
Outline paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses
1-Index: Definition • Constructed by using bisimilarity • Definition based on stability • Partition data nodes into index nodes • dnode (v) and inode (I[v]) • I[u] is v’s index parent if uis v’s parent • An inode is stable if all of its dnodes have the same index parents • In a 1-index, all inodes are stable I[u] u I[v] v
1-Index: Example paper paper 1 1 13 section section title 14 section 2 2,4,8,13 section 8 4 section 3 15 exp exp title exp algorithm title algorithm 16 10 15,16 3,5,9,14 6,10 6 9 algorithm title 5 title 18 about proof proof 17 11 17,18 proof 7 7 11 about about uses proof 12 12 /paper/section/algorithm uses data graph 1-index
1-Index: Quality paper • Assigning dnodes that are bisimilar into different inodes • does not affect correctness, • but does affect efficiency • The quality of an index 1 section 2,4 2,4,8,13 8,13 exp title algorithm 15,16 3,5,9,14 6,10 proof 11 17,18 # inodes 7 − 1 X 100% about proof # inodes in the minimum 1-index 12 uses Ideal: quality = 0%
Previous Results • Construction • The PT algorithm [Paige & Tarjan 87], in time O(m log n) • m – # edges, n - # nodes • Edge changes • The propagate algorithm [Kaushik et al. 02] • Quality of the 1-index after update • No guarantee on the quality of the resulted index • 3 ~ 5% after 500 edge insertions in experiments • Subgraph addition • Index-reconstruction
Edge Insertion: An Example (1) R R R A B A B A B C1 C2 C3 C1, C2 C3 C1 C2 C3 D1 D2 D3 D1, D2 D3 D1, D2 D3 Data Graph 1-Index Split 1
Edge Insertion: An Example (2) R R R A B A B A B C1 C2 C3 C1 C2, C3 C1 C2, C3 D1 D2 D3 D1 D2 D3 D1 D2, D3 Split 2 Merge 1 Merge 2 Indeed the minimum 1-index for the data graph after update Not a coincidence!
Minimum & Minimal Indexes • Minimum: with the smallest number of inodes • Minimal: no two inodes can be merged R R R A1 A2 A1 A2 A1,A2 B1 B2 B1,B2 B1 B2 Data graph Minimum 1-index Minimal 1-index
Quality Guarantee • Theorem: The split/merge algorithm always maintains a minimal 1-index • Lemma: For acyclic data graphs, there is a unique minimal 1-index • The minimum 1-index is always maintained • For cyclic data graphs, there could be more than one minimal 1-index • One of them is maintained
Outline paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses
A(k)-Index: Definition • k-bisimilarity • Definition based on stability • A(0)-index: partition by label • … • A(k)-Index • An inode in A(k)-index is stable if all of its dnodes have the same index parents in A(k-1)-index • Only interested in paths of length ≤k • Shown to be much smaller and more efficient than 1-index [Kaushik et al. 02] • But, no efficient maintenance algorithms are known!
A(k)-index: Example R R R R A B A B A B A B C1 C2 C3 C1 C2,C3 C1 C2,C3 C1,C2,C3 C4,C5,C6 C4 C5 C6 C4 C5,C6 C4,C5,C6 Data graph A(2) (=1-index) A(1) A(0) Maintenance of A(i)-index requires the information in A(i-1)-index
A(k)-index: Refinement Tree R R R R A B A B A B A B C1 C2 C3 C1 C2,C3 C1 C2,C3 C1,C2,C3 C4,C5,C6 C4 C5 C6 C4 C5,C6 C4,C5,C6 Data graph A(2) (=1-index) A(1) A(0)
A(k)-index: Refinement Tree R R R R A B A B A B A B C1 C2 C3 C C C C C C4 C5 C6 C C C Data graph A(2) A(1) A(0) • Reduce storage cost • Reduce maintenance cost 0.5% ~ 13% additional storage
Quality Guarantee • Theorem: The split/merge algorithm always maintains A(k)-index • Lemma: There is a unique minimal A(k)-index for any data graph, acyclic or cyclic the minimum a minimal
Outline paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses
Experiments on Edge Changes • Datasets • Real-life: IMDB (272,000 nodes) • Benchmark: XMark (198,000 nodes) • Setup • First delete a portion of existing ID-REF links • Then do random mixed insertions/deletions • Compare with • 1-index: propagate (+ reconstruction) • A(k)-index: recompute affected portion (+ reconstruction)
Experiment Results: A(k)-index running times
Conclusions • The first solutions for the maintenance (edge & subgraph additions/deletions) of 1-index and A(k)-index that are both effective and efficient • Effective: quality guarantee on the resulted index • Efficient: the algorithms themselves are fast Thank you!
Graphical Illustration size valid 1-index merge split index the index can only grow in size due to splitting, if merging is not enforced