Trie Indexes for Efficient XML Query Processing

Trie Indexes for Efficient XML Query Processing Sofia Brenes, Yuqing Wu, Dirk Van Gucht, Pablo Santa Cruz Indiana University, Bloomington {sbrenesb, yuqwu, vgucht, psantacr}@cs.indiana.edu

XML and Queries – An Example • Query 1: //A/B/C • Query 2: //B/C • Query 3: //A/B[./D]/C • Query 4: //A[./B[./D]]/B/C

Index and XML Query Evaluation • Challenges  Structure • Data: containment relationship • Query: • pattern matching • (nested) predicates

Structural Indices for XML Data • Consider both value and structure

Expected Features for an XML Index • Reasonable size • Easy to construct and adjust • Query evaluation • Index-only plan for most queries.

Outline • Introduction • Methodology • Partition induced by structural characteristics of XML • Partition induced by fragments of XPath Algebra • Coupling and Block Union Theorems • Trie Indices and Query Evaluation • Experimental Evaluation • Future Directions

Rewind – back to the world of RDB RDBMS Engineering Techniques RDBMS Theory

Our approach • Study XML query language and its fragments • Study the indistinguishibility of components in an XML documents • Reason about existing XML indices • Design new XML indices.

XML Data Model • Represent XML document D as a finite unordered node-labeled tree • D = (V, Ed, r, ) • Nodes: V • Edges: Ed • Root: r • Labels:

Label Path • LP(m,n) • LP(m,n) = (A,B,C) • LP(n, k) • LP(n,0) = (C) • LP(n, 1) = (B,C) • LP(n,4) = (A,A,B,C) • LP(n,7) = (A,A,B,C) m n

N [k]Equivalence • Given an XML document and value k

N [k]Partition Label Path N [1][(A,B)] = {B1, B2, B3, B4}

P [k]Equivalence • Given an XML document and value k

P [k]Partition P [1][(A,A)] = {(A1, A2)}

P [k]Partition P [2][(A,B,C)] = {(A1, C1), (A2, C2), (A2, C3)}

Path semantics Node semantics XPath Algebra

Fragments of XPath Algebra • Dalgebra XPath algebra - ↑, π1 • D [ ]algebra XPath algebra - ↑ • D[k] algebra D algebra up to length k • D [ ][k] algebra D [ ] algebra up to length k

D[k]Equivalence • Given an XML document and value k and (m1, n1), (m2, n2) in DownPairs(D) • For any E in D[k]

Coupling Theorem Let D be a document and k is an integer. • The P[k]-partition of D and the D[k]- partition of D are the same under the path semantics • The N[k]-partition of D and the D[k]-partition of D are the same under the node semantics

k-Label-Path Set • The set of label-paths of length k in an XML document that satisfies an XPath expression in algebra D.

Label-Union Theorem Let D be a document, k an integer, and E is an D[k] expression. Then there exists a class of partition blocks of the P[k]-partition (N[k]-partition) of D such that

Query Evaluation Using Label-Union Theorem • Query 2: //B/C • LPS(E,2) = {(A,B,C), (B,B,C)}

N[k]-Trie Index • Keep track of the N [k]-partitions • Use the reverse label path as key

Query Evaluation with N [k]-Trie Index • Query 1: //A/B/C • LPS(E,2) = {(A,B,C)}

Query Evaluation with N [k]-Trie Index • Query 2: //B/C • LPS(E,2) = {(A,B,C), (B,B,C)}

P[k]-Trie Index • Keep track of the P[k]-partitions • Use the reverse label path as key

Query Evaluation with P[k]-Trie Index • Query 1: //A/B/C

Query Evaluation with P[k]-Trie Index • Query 2: //B/C

Query Evaluation with P[k]-Trie Index • Query 3: //A/B[./D]/C

Experimental Setup • Indices prototyped in TIMBER system • Report results on DBLP data • 127M bytes • 3.3M nodes

Index Sizes

Index Creation Time

Query Evaluation • //dblp/inproceedings/title/i/sub

Query Evaluation • //dblp/inproceedings[./title[./i]/sub]/ee

Outline • Introduction • Methodology • Partition induced by structural characteristics of XML • Partition induced by fragments of XPath Algebra • Coupling and Block Union Theorems • Trie Indices and Query Evaluation • Experimental Evaluation • Conclustion

Conclusion • P [k]-Trie index is able to facilitate index-only plan for most queries  consistently and significantly outperform N[k]-Trieand A(k)-index. • A modest kvalue is sufficient for providing significant performance improvements.

Thanks!! Questions?

Research Direction • Further study of query decomposition and inversion algorithms • Study workload driven index creation • Develop other appropriate index structures

Trie Indexes for Efficient XML Query Processing

Trie Indexes for Efficient XML Query Processing

Presentation Transcript

XML + Query Processing: A Foundation for Intelligent Networks

Efficient OLAP Query Processing for Distributed Data Warehouses

Efficient Complex Query Support For Multi-version XML Documents

XML Storage and Query Processing

Efficient Mining of XML Query Patterns for Caching

Efficient Processing of XPath Queries Using Indexes

Query Processing with XML

Query Processing of XML Data

Combining efficient XML compression with query processing

XML Query Processing

Query Processing of XML Data

Query Languages for XML

Structure Indexes for XML

XML query

Efficient XML Storage, Query, and Update

XML Native Query Processing

XML-Query

Query Processing for High-Volume XML Message Brokering

Efficient Processing of XML Update Streams

XML + Query Processing: A Foundation for Intelligent Networks

Efficient processing of path query with not-predicates on XML data

Query Processing for High-Volume XML Message Brokering

Sea Ice

Sea Ice