1 / 31

Covering Indexes for XML Queries by Prakash Ramanan

Covering Indexes for XML Queries by Prakash Ramanan. presented by Dilek Demirel. Contents. XML query languages Some definitions and concepts Bisimulation and simulation relations Results. The paper is about.

maida
Télécharger la présentation

Covering Indexes for XML Queries by Prakash Ramanan

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Covering Indexes for XML Queriesby Prakash Ramanan presented by Dilek Demirel

  2. Contents • XML query languages • Some definitions and concepts • Bisimulation and simulation relations • Results

  3. The paper is about • Minimizing the search tree, trying to build similar but smaller graphs equivalent the original XML document graph.

  4. An XML document can be represented as a graph D=(N, E, Eref), where N is the set of nodes, E is the set of edges and Eref is a set of idref edges. • Idref edges denotes an element–subelement relationship. • The subgraph T=(N, E) is a tree.

  5. Example

  6. XML query languages • Some XML query languages • XPath • XQuery • They allow navigation in an XML document along different axes, to locate the desired element.

  7. Axes • XPath provides 13 different axes • Self • Child • Descendant/Descendant or self • Parent • Ancestor/ Ancestor or self • Preceding/Preceding sibling • Following/Following sibling • Attribute • Namespace

  8. Subset languages of XPath • Core Xpath (CXPath) • Branching Path Queries (BPQ) • Tree Pattern Queries (TPQ) • TPQ = TPQ+ subsetof BPQ+subsetof CXPath+subsetof Xpath Where C+ denotes query language C without the operator NOT

  9. Core XPath • Does not contain arithmetic and string operations • Has the full navigational power of XPath • Consists all queries involving the thirteen axes and three boolean operators and, or and not

  10. Branching Path Queries • A subset of CXPath • CXPath queries that ignore the order of sibling elements • Allows nine axes, excluding the order respecting axes

  11. Tree Pattern Queries • Involve four axes • Self • Child • Descendant • Descendant or self • The only operator and • Do not involve idref edges

  12. Definitions and concepts • An index for an XML document • Obtained by merging “equivalent” nodes into a single node. • “equivalent” according to what, coming soon…

  13. Index of an XML doc.

  14. Definitions cont’d • A query Q distinguishes between two nodes in an XML document D, if exactly one of the two nodes is in the result of evaluating query Q on D.

  15. Definitions cont’d • An index DI is a covering index for a class C of queries, if the following holds: • No query in C can distinguish between two nodes of D that are in the same extend in DI. • The important point about the covering index is: • A covering index DI can be used to evaluate the queries in C, without using D.

  16. Focus of the paper • The paper have studied the evaluation of CXPath queries and covering indexes for the above mentioned subclasses of CXPath.

  17. Definitions cont’d • CXPath+ is complete, in the sense that, • For any node n in an XML document D, one can always construct a query, which starts from the root , Q in CXPath+, that distinguishes n from all the other nodes. • The paper presented a method to build this query.

  18. We, till now, • Described some classes of XML queries • Give some definitions and concepts • Will describe the equivalence relations that are mentioned in the beginning: • Define the simulation relation on vertices of an ordinary graph • Define simulation and bisimulation relations on an XML document

  19. Simulation and bisimulation

  20. Question • Why do people deal with these simulation quotients? • Because, for an XML document, if its simulation quotient is small, then a set of queries can be evaluated faster by using this index instead of the bigger XML document graph.

  21. Simulation for Ordinary Graphs • Directed graphs G1=(V1, A1), G2=(V2,A2), each vertex v has a type t(v) • Simulation is a binary relation between the vertex sets V1 and V2 of two graphs. It provides a possible notion of dominance/equivalence between the vertices of the two graphs.

  22. Forward simulation • Fsimulation of G1 by G2 is the largest binary relation subset of V1 * V2, such that • Preserves vertex types t(v1)=t(v2) • Preserve outgoing arcs: for each v1’ elementOf post(v1), there exists v2’ elementOf post(v2) such that v1’ is Fsimulated by v2’ • Fsimilarity is an equivalence relation

  23. Backward Simulation • Analogous to Fsimulation • Deal with the incoming arcs at a vertex, as opposed to forward simulation which deals with outgoing arcs.

  24. Forward and Backward Simulation • Fbsimulation • Preserves vertex types • Preserves outgoing arcs • Preserves incoming arcs

  25. Simulation for an XML Document • Fsimulation of D is the largest binary relation on N (node set of D), such that • Preserves node types • If n1=root(D) then n2=root(D) • Else t(n2)=t(n1) • Preserve outgoing tree edges • For each tree edge (n1,n1’), there exists a tree edge (n2, n2’) such that n1’ is fsimulated by n2’. • Preserve outgoing idref edges • For each idref edge (n1,n1’), there exists an idref edge (n2, n2’) such that n1’ is fsimulated by n2’.

  26. FBsimulation of D • Deals with both incoming and outgoing arcs • Preserves node types • Preserve outgoing tree edges • Preserve outgoing idref edges • Preserve incoming tree edges • Preserve incoming idref edges

  27. Bisimulation Relation • Forward bisimulation of D is the largest binary relation on N (node set of D), such that • Preserves node types • If n1=root(D) then n2=root(D) and vice versa • Else t(n2)=t(n1) • Preserve outgoing tree edges • For each tree edge (n1,n1’), there exists a tree edge (n2, n2’) such that n1’ is fsimulated by n2’ and vice versa. • Preserve outgoing idref edges • For each idref edge (n1,n1’), there exists an idref edge (n2, n2’) such that n1’ is fsimulated by n2’ and vice versa.

  28. The Quotients • An equivalence relation on N partitions N into equivalence classes. Any two nodes in the same class are related, any two nodes in different classes are not. • The quotient graph D~ is obtained from D by merging the nodes of each equivalence class into a single node.

  29. Example

  30. Results • A CXPath+ query Q can be evaluated on an XML document D by computing the simulation of Q by D. • For an XML document, its simulation quotient is the smallest covering index for BPQ+. • For an XML document, its simulation quotient, with idref edges ignored throughout, is the smallest covering index for TPQ.

  31. Questions?

More Related