1 / 24

Projecting XML Documents

Projecting XML Documents. Amélie Marian, Columbia University Jérôme Simeon, Bell Laboratories. Motivation. XQuery used in very different environments: XQuery implementations on XML stored in databases (with indexes).

chun
Télécharger la présentation

Projecting XML Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Projecting XML Documents Amélie Marian, Columbia University Jérôme Simeon, Bell Laboratories

  2. Motivation • XQuery used in very different environments: • XQuery implementations on XML stored in databases (with indexes). • Main-memory XQuery implementations on XML in files, sent as streams, computed on the fly… • Example Applications: • Web Services (e.g., ActiveXML). • Telecommunication apps (XML messages). • XML documents. • Information Integration.

  3. Memory Limitations • Main-memory XQuery implementations cannot handle large documents. • Complex XQuery expressions require materialization (DOM). • DOM is the bottleneck. XMark Query 1 on an IBM laptop T23 (256Mb RAM)

  4. Projection: Example <site> <regions>...</regions> <people> ... <person id="person120"> <name>Wagar Bougaut</name> <emailaddress>mailto:Bougaut@wgt.edu</emailaddress> </person> <person id="person121"> <name>Waheed Rando</name> <emailaddress>mailto:Rando@pitt.edu</emailaddress> <address> <street>32 Mallela St</street> <city>Tucson</city> <country>United States</country> <zipcode>37</zipcode> </address> <creditcard>7486 5185 1962 7735</creditcard> <profile income="59224.09"> ... • XMark Query 1 • for $b in /site/people/person[@id=“person0”] • return $b/name <site> <regions>...</regions> <people> ... <person id="person120"> <name>Wagar Bougaut</name> <emailaddress>mailto:Bougaut@wgt.edu</emailaddress> </person> <person id="person121"> <name>Waheed Rando</name> <emailaddress>mailto:Rando@pitt.edu</emailaddress> <address> <street>32 Mallela St</street> <city>Tucson</city> <country>United States</country> <zipcode>37</zipcode> </address> <creditcard>7486 5185 1962 7735</creditcard> <profile income="59224.09"> ... Less than 2% of original document !

  5. Projection: Intuition • Given a query: For $b in /site/people/person[@id=“person0”] Return $b/name • Most nodes in the input document(s) are not required. • Projection operation removes unnecessary nodes. • Evaluation of the query on projected document yields the same results as on the original document. • How it works: • Projection defined by set of paths. • Static analysis infers sets of paths used within a query. /site/people/person /site/people/person/@id /site/people/person/name

  6. Projection: Challenges • For an XQuery expression, compute all paths that allow to reach nodes required to evaluate the expression. • XQuery is complex: • Variables • Composition • Syntactic Sugar • Complex expressions • Have to be able to analyze all of XQuery.

  7. Contributions • Definition of a notion of projection for XML documents, based on path expressions. • Static Analysis algorithm for arbitrary XQuery expressions used to infer projection paths. • Loading algorithm to build projected XML document. • Integration of XML Projection in XQuery processor (Galax). • Experimental evaluation of projection on XML query processing.

  8. XML Projection • Similar to relational projection: • One key operation. • Prunes unnecessary part of the data. • Essential for memory management. • Specific problems related to XML: • Projection must operate on trees. • Requires analysis of the query. • Need to address XQuery complexity.

  9. Notation • Projection Paths: • Path expressions are noted using XPath semantics (/site/people/person/@id) • “#” notation used when subtree should be kept (/site/people/person/name#) • Static Analysis: inference rule notation Expr => Paths

  10. Static Analysis: Variables • Variables can be bound to nodes coming form different paths. for $b in /site/people/(teacher | student) return $b/name • Analysis must remember paths to which variable was bound /site/people/teacher /site/people/student • Environment is maintained during path analysis: Env |- Expr => Paths

  11. Env |- Literal => {} Env |- Expr1 => Paths1 Env |- Expr2 => Paths2 Env |- Expr1,Expr2 => Paths1 U Paths2 Static Analysis: Example • Literals do not require any paths: • Paths are propagated in a sequence: • Static analysis algorithm is correct (see Technical Report) 32 => {} /a/b => {/a/b} /a/d => {/a/d} /a/b,/a/d => {/a/b,/a/d}

  12. Static Analysis: Composition (if (count (/site/regions/*) = 3) then /site/people/person else /site/open_auctions/open_auction)/@id • /@id does not apply to /site/regions/* • Final set of paths should be /site/regions/* /site/people/person/@id /site/open_auctions/open_auction/@id • Need to differentiate two sets of paths during analysis: • Returned Paths: returned by the expression, further path steps are applied on them. • Used Paths: used to compute the expression. Env |- Expr => Paths using UPaths

  13. XQuery Core • Subset of XQuery: • Reduced grammar. • All operations are made explicit. • Same expressive power as XQuery. • Removes syntactic sugar. • Simplifies complex expressions • Analysis only needs to be applied to a small set of expressions for $b in /site/people/person return $b/name XQuery for . in /return for . in child::site return for . in child::people return for . in child::person return child::name XQuery Core

  14. Optimized Projection • XQuery Core decomposition may lead to redundant paths being kept: /site /site/people /site/people/person /site/people/person/name# • Optimization on inference rules can avoid redundancy • Details in paper, full optimized analysis in technical report

  15. XQuery Processing Architecture Document Data Model

  16. Loading Algorithm: Description • Input: • Set of projection paths. • Document SAX events. • Decide on action to apply on document nodes: • Skip: ignore node and its subtree. • KeepSubtree: keep node and its subtree. • Keep: keep node without its subtree. • Move: keep processing SAX events. Current node is only kept if some of its children are kept. • Keep a set of current paths.

  17. Loading Algorithm: Example a Projection Paths: /a/b/c# /a/d Loaded Nodes: b d Similar to XML filtering algorithms c • Limitations: • - Backward Axis! • - Number of current paths can • be huge (descendant axis) Current Paths: /b/c# /d /a/b/c# /a/d /c# f Action: Keep Subtree Move Keep Skip Document Stream <a> <g> <b> </b> </g> <b> <c> <f> </f> </c> </b> <d> <e> </e> </d> </a>

  18. Experiments: Settings • XML Projection Evaluation: • Effectiveness: projection impact on different queries. • Maximum document size: largest document that can be processed. • Processing time: effect on processing time. • Experimental Setup: • Default XMark document size: 50Mb.

  19. 100% 100% 100% 33% 60% Experiments: Effectiveness All queries but one require less than 5% of the document.

  20. Experiments: Maximal Document Size

  21. Experiments: Query Execution Time Projection significantly reduces query processing time Next Bottleneck: Joins!

  22. Improvements • Complete XQuery implementation with projection available in Galax • Demo at VLDB 2003. • Galax uses a more recent pure streaming algorithm for applying projection to a document. • Better performance. • Can be used as a stand-alone operation, without loading.

  23. Conclusion and Future Work • Main contributions: • Definition of a notion of projection for XML. • Static analysis to infer projection paths from any XQuery expression. • Full implementation in Galax. • Experimental results: • Dramatic increase in the size main-memory XQuery processor can handle. • Projection helps reducing query processing time. • Future work: • Define loading algorithm for backward axis. • Combine projection with other optimizations. • Pushing down query operations during projection (e.g., predicate evaluation)

  24. Advertisment • “XQuery from the Experts” • Just released by Addison Wesley • Ask Jerome for 20% discount flyers!

More Related