1 / 15

R-SOX : R untime S emantic Query O ptimization over X ML Streams

R-SOX : R untime S emantic Query O ptimization over X ML Streams Song Wang, Hong Su, Ming Li , Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner and Murali Mani D atabase S ystems R esearch G roup Department of Computer Science Worcester Polytechnic Institute

jana
Télécharger la présentation

R-SOX : R untime S emantic Query O ptimization over X ML Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R-SOX:Runtime Semantic Query Optimization over XML Streams Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner and Murali Mani Database Systems Research Group Department of Computer Science Worcester Polytechnic Institute Worcester, Massachusetts, USA VLDB2006Seoul, Korea

  2. Background:XML Stream Applications • Wide-range and growing applications • Examples: news publishing and on-line auction systems • Characteristics • Real-time processing: short response time • Limited resources: minimize memory News Publishing On-line Auction

  3. Background:Optimization Using Constraints • Constraint Properties • Document Type Definition (DTD) or XML Schema • Constraints are statically available beforehand • General XML Semantic Query Optimization (SQO) • Tree minimization • Recursion optimization • Stream-specific XML SQO • Context-aware shortcutting • Token-granularity data output

  4. R-SOX: Motivation and Goal • Motivation • Scenarios where static schema cannot be applied • Challenges when schema comes dynamically: - how to represent and manage runtime schema - how to exploit dynamic schema for runtime optimization - how to propagate runtime schema down stream • Goals • Runtime schema encoding and synchronization • Semantic query optimization techniques • Runtime schema propagation

  5. R-SOX: Architecture and Workflow Annotated Output Stream Input Stream Stream Annotator Result Stream Extended Raindrop XQuery Engine Plan Refinement RSI Result Schema Query Plan Generator Schema Inf. Manager Query Plan Adaptor Schema knowledge Query Plan R-SOX System XQuery Basic XQuery Evaluation Runtime Schema Refinement Runtime Semantic Query Optimization Downstream Schema Propagation Raindrop Engine Demon Focus R-SOX Contributions Future Work

  6. Basic XQuery Evaluation XQuery Q1-1: FOR $o in document(“news.xml")/stream/news RETURN <result> $o/source, $o/comments </result> • Raindrop XQuery Engine • Construction of Raindrop plan • Automaton-based query evaluation SJoin on $x ExtractNest $b ExtractNest $c Nav $x//source-> $b Nav $x//comments->$c Nav stream//news -> $x Input Token Stream: <stream> <news> <source> <content>CNN…</content> <rank>…</rank>… </source> <comments> <content>President…</content>… </comments> …… </news> …… Raindrop XQuery Plan Stream Data content s4 s3 source stream news s0 s1 s2 content comments s5 s6 Query Automata

  7. Runtime Schema Refinement Example of RSI: News ((source | comment)+, date+) RSI 1: ((news,inf,TIME), (/news/comment, , ),-) News (source+, date+) RSI 2: ((/news,200,COUNT), (/news/comment, /news/source, *), +) News (source*, comment+, date+) • Runtime Schema Information (RSI) • Representing RSI: RSI Grammar • Encoding RSI: - embedded into input XML token stream - extracted using DFA stream loader • Managing Schema Information • Schema Graph: directed ordered graph • Schema graph synchronization with the newly received RSIs • History-aware RSI rollback

  8. Runtime SQO: Overview Supporting Following SQO Techniques: ( 1) Tree Minimization ( 2) Recursion Optimization ( 3) Fast Data Output ( 4) Navigation Shortcutting • Runtime Plan Adaptor • Incremental plan migration • Rule library • Rule applier • Query Execution • Modifying automata computations • Switching execution modes • Performing event-condition actions

  9. Runtime SQO: Tree Minimization XQuery Q1: FOR $o in document(“news.xml")/stream/news RETURN <result> $o/source, $o/comments </result> • Benefits • Expedite document traversal on pattern retrieval by avoiding unnecessary navigation • Change query plan at run-time by adjusting automata • Query Execution • Temporarily removing and adding automaton states RSIs: P1: ((stream,inf,Count), (/news, source , ), -) P2: ((stream,inf,Count), (/news, comments ,), -) stream (1,∞) news Cut by P1 Cut by P2 (1, ∞) (1, ∞) comments date source …… …… …… Schema Graph Refinement Disable the transition by P1 content s4 s3 source Disable the transition by P2 stream news s0 s1 s2 content comments s5 s6 Query Automata Refinement

  10. Runtime SQO: Recursion Optimization Recursive-aware operators will be switched to the non-recursive operator if input XML data isn’t recursive RSIs: P1: ((news,inf,Count), (/news, news, ), - ) P2: ((news,inf,Count), (/news, news, ), +) • Benefits • Improve performance by avoiding unneces-sary over-head on recursive handling • Optimization Processing • Detect recursion by analyze the runtime schema knowledge • Switch between recursion-aware/non-recursive operators • Characterize safe moments of runtime migration RecurSJoin on $x Recursive Operator RecurExtractNest $b RecurExtractNest $c P1 P2 RecurNav $x//source-> $b RecurNav $x//comments->$c Non-recursive Operator RecurNav stream//news -> $x Stream Data Operator Switching in the Query Plan XQuery Q2: (slightly different with Q1) FOR $o in document(“news.xml") stream//news RETURN <result> $o/source, $o/comments </result>

  11. Runtime SQO: Fast Data Output source date comments S2 S3 S4 S1 comments date comments source • Benefits • Minimize memory consumption by avoiding unnecessary data storage and releasing buffered data at the earliest moment • Optimization Processing • Augment query automata with Glushkov automata • Encode event-condition actions Glushkov Automata for Type “News” start • Case 1: Overall Schema Knowledge as • news((source | comments | date)+) • No order constraints can be used. Storing comments/content • Case 2: Overall Schema Knowledge as • News(source+,comments+,date+) •  Global order constraint: Order( source, comments ) • No storage is needed • Case3: Overall Schema Knowledge as • News( (source | comment)+, date+, comment+ ) •  Local order constraint: LocalOrder( source, comments ) • Same as Case 1 at the beginning. Glushkov automata on the type “news” is used to indicate the completeness of source elements. After that, storage on comments/content is not needed XQuery Q1: FOR $o in document(“news.xml")/stream/news RETURN <result> $o/source, $o/comments </result> content s5 s4 source stream news s1 s2 s3 content comments s6 s7 Actions Encoded into the Automata

  12. Runtime SQO: Navigation Shortcut (I) • Benefit • Expedite document-order traversal on pattern retrieval by early filtering of failed patterns • Optimization Rules • Order, occurrence and exclusive rules • Completeness and minimal cost optimization is guaranteed • Query Execution • Introduce new pattern look-up into query automata • Encode event-condition actions

  13. Runtime SQO: Navigation Shortcut (II) XQuery Q3: FOR $a in stream(bids)/auction, $bin$a/seller[homepage], $cin$a/bidder[sameAddr] WHERE $b/*/phone = “508” RETURN <auction> $b, $c </auction> Actions Encoded into the Automata Overall Schema Knowledge as: Occurrenc( phone, 2 ) when </phone> is encountered twice, check /*/phone: if fails the predicate, suspend states s2and s3 Overall Schema Knowledge as: Order( primary, homepage) when <primary> is encountered once, check /homepage: if no presence, suspend states s10, s3 and s2 Utilizing Order Constraints Utilizing OccurrenceConstraints

  14. R-SOX System Demonstration Algebraic Query Plan Generation Runtime SchemaRefinement • Application Scenarios: • On-line auction data • News publishing data Runtime SQO

  15. Raindrop Project http://davis.wpi.edu/dsrg/raindrop Recent Publications • S.Wang etc. R-SOX: Runtime Semantic Query Optimization over XML Streams. VLDB 2006. • H.Su etc. Automata Meets Algebra. DKE Journal 2006. • M.Wei etc. Processing Recursive XQuery over XML Streams: the Raindrop Approach. XSDM 2006. • H.Su etc. Semantic Query Optimization in an Automata-Algebra Combined XQuery Engine. VLDB 2004. • H.Su etc. Semantic Query Optimization for XQuery over XML Streams. VLDB 2005. Source Code Release • Raindrop 1.0 is released: http://davis.wpi.edu/dsrg/raindrop/release Acknowledgement • NSF for the Support on Grants IIS 0414567 and CNS 0551584

More Related