Incremental Maintenance of XML Structural Indexes

Incremental Maintenance of XML Structural Indexes Ke Yi1, Hao He1,Ioana Stanoi2and Jun Yang1 1Department of Computer Science, Duke University 2IBM T. J. Watson Research Center

Motivation • XML is gaining tremendously in popularity in recent years • Used to represent many kinds of data • Major DB vendors are rushing to incorporate solutions for native XML repositories and retrieval • IBM DB2, Oracle , Microsoft SQL Server • Tamino, Natix, X-Hive, …

Overview paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses

Label Path Expressions paper /paper/section/algorithm 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses

Structural Indexes • Why do we need them? • Speedup the evaluation of path expressions • Provides a structural summary of the data graph • Structural indexes • DataGuide [Goldman & Widom 97] • 1-index [Milo & Suciu 99] • A(k)-index [Kaushik et al. 02], D(k)-index [Qun et al. 03],M(k)-index [He & Yang 04] • Integration of structural indexes and inverted lists[Kaushik et al. 04] • Focus on maintenance • Has a major effect on index efficiency • Remains an overlooked issue

Outline paper 1 13 section section 2 title 14 title 3 8 section 4 section “experiments” exp “intro” algorithm 15 16 exp 5 title 6 9 title 10 algorithm proof 7 17 “A(k)-index” 11 18 “1-index” about proof about 12 uses

1-Index: Definition • Constructed by using bisimilarity • Definition based on stability • Partition data nodes into index nodes • dnode (v) and inode (I[v]) • I[u] is v’s index parent if uis v’s parent • An inode is stable if all of its dnodes have the same index parents • In a 1-index, all inodes are stable I[u] u I[v] v

1-Index: Example paper paper 1 1 13 section section title 14 section 2 2,4,8,13 section 8 4 section 3 15 exp exp title exp algorithm title algorithm 16 10 15,16 3,5,9,14 6,10 6 9 algorithm title 5 title 18 about proof proof 17 11 17,18 proof 7 7 11 about about uses proof 12 12 /paper/section/algorithm uses data graph 1-index

1-Index: Quality paper • Assigning dnodes that are bisimilar into different inodes • does not affect correctness, • but does affect efficiency • The quality of an index 1 section 2,4 2,4,8,13 8,13 exp title algorithm 15,16 3,5,9,14 6,10 proof 11 17,18 # inodes 7 − 1 X 100% about proof # inodes in the minimum 1-index 12 uses Ideal: quality = 0%

Previous Results • Construction • The PT algorithm [Paige & Tarjan 87], in time O(m log n) • m – # edges, n - # nodes • Edge changes • The propagate algorithm [Kaushik et al. 02] • Quality of the 1-index after update • No guarantee on the quality of the resulted index • 3 ~ 5% after 500 edge insertions in experiments • Subgraph addition • Index-reconstruction

Edge Insertion: An Example (1) R R R A B A B A B C1 C2 C3 C1, C2 C3 C1 C2 C3 D1 D2 D3 D1, D2 D3 D1, D2 D3 Data Graph 1-Index Split 1

Edge Insertion: An Example (2) R R R A B A B A B C1 C2 C3 C1 C2, C3 C1 C2, C3 D1 D2 D3 D1 D2 D3 D1 D2, D3 Split 2 Merge 1 Merge 2 Indeed the minimum 1-index for the data graph after update Not a coincidence!

Minimum & Minimal Indexes • Minimum: with the smallest number of inodes • Minimal: no two inodes can be merged R R R A1 A2 A1 A2 A1,A2 B1 B2 B1,B2 B1 B2 Data graph Minimum 1-index Minimal 1-index

Quality Guarantee • Theorem: The split/merge algorithm always maintains a minimal 1-index • Lemma: For acyclic data graphs, there is a unique minimal 1-index • The minimum 1-index is always maintained • For cyclic data graphs, there could be more than one minimal 1-index • One of them is maintained

A(k)-Index: Definition • k-bisimilarity • Definition based on stability • A(0)-index: partition by label • … • A(k)-Index • An inode in A(k)-index is stable if all of its dnodes have the same index parents in A(k-1)-index • Only interested in paths of length ≤k • Shown to be much smaller and more efficient than 1-index [Kaushik et al. 02] • But, no efficient maintenance algorithms are known!

A(k)-index: Example R R R R A B A B A B A B C1 C2 C3 C1 C2,C3 C1 C2,C3 C1,C2,C3 C4,C5,C6 C4 C5 C6 C4 C5,C6 C4,C5,C6 Data graph A(2) (=1-index) A(1) A(0) Maintenance of A(i)-index requires the information in A(i-1)-index

A(k)-index: Refinement Tree R R R R A B A B A B A B C1 C2 C3 C1 C2,C3 C1 C2,C3 C1,C2,C3 C4,C5,C6 C4 C5 C6 C4 C5,C6 C4,C5,C6 Data graph A(2) (=1-index) A(1) A(0)

A(k)-index: Refinement Tree R R R R A B A B A B A B C1 C2 C3 C C C C C C4 C5 C6 C C C Data graph A(2) A(1) A(0) • Reduce storage cost • Reduce maintenance cost 0.5% ~ 13% additional storage

Quality Guarantee • Theorem: The split/merge algorithm always maintains A(k)-index • Lemma: There is a unique minimal A(k)-index for any data graph, acyclic or cyclic the minimum a minimal

Experiments on Edge Changes • Datasets • Real-life: IMDB (272,000 nodes) • Benchmark: XMark (198,000 nodes) • Setup • First delete a portion of existing ID-REF links • Then do random mixed insertions/deletions • Compare with • 1-index: propagate (+ reconstruction) • A(k)-index: recompute affected portion (+ reconstruction)

Experiment Results: 1-index

Experiment Results: A(k)-index running times

Conclusions • The first solutions for the maintenance (edge & subgraph additions/deletions) of 1-index and A(k)-index that are both effective and efficient • Effective: quality guarantee on the resulted index • Efficient: the algorithms themselves are fast Thank you!

Graphical Illustration size valid 1-index merge split index the index can only grow in size due to splitting, if merging is not enforced

Incremental Maintenance of XML Structural Indexes

Incremental Maintenance of XML Structural Indexes

Presentation Transcript

Efficient Incremental Maintenance of Data Cubes

Trie Indexes for Efficient XML Query Processing

Indexes

Indexes

Structural indexes of XML Databases

Covering Indexes for XML Queries by Prakash Ramanan

Incremental Maintenance for Non-Distributive Aggregate Functions

Distributed Structural and Value XML Filtering

Indexes

Incremental Maintenance of Ontology-Exploiting Association Rules

Distributed Structural and Value XML Filtering

Efficient Querying of XML Data Using Structural Joins

Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Field Maintenance of Structural and Vegetative Measures

Efficient Incremental Validation of XML Documents

Structure Indexes for XML

Tutorial XML Maintenance Group

Incremental Validation of XML Documents Yannis Papakonstantinou Victor Vianu

Efficient Incremental Maintenance of Data Cubes

An Algebraic Approach For Incremental Maintenance of Materialized XQuery Views

Indexes

Structure Indexes for XML