1 / 30

Stream Processing of XPath Queries with Predicates

Stream Processing of XPath Queries with Predicates. Ashish Kumar Gupta Dan Suciu University of Washington SIGMOD 2003. 報告者 : 蔡明瑾. Introduction. XML messages :exchange information XML stream processing problem Processing XPath queries(filters) on an incoming stream of XML packets

libby
Télécharger la présentation

Stream Processing of XPath Queries with Predicates

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stream Processing of XPath Queries with Predicates Ashish Kumar Gupta Dan Suciu University of Washington SIGMOD 2003 報告者:蔡明瑾

  2. Introduction • XML messages :exchange information • XML stream processing problem • Processing XPath queries(filters) on an incoming stream of XML packets • Workload is very high • XPath queries • multiple predicates

  3. Definition - XPath fragment E is atomic predicates

  4. Definition – XML and SAX Parsers • startDocument() • startElement(a) • text(s) • endElement(a) • endDocument() • a:element or attribute label • s:data value

  5. <a c=“3”> <b>4</b></a> • startDocument() • startElement(a) • startElement(@c) • text(“3”) • endElement(@c) • startElement(b) • text(“4”) • endElement(b) • endElement(a) • endDocument()

  6. XML stream processing problem • XPath expression P:boolean filter • A XML documentmatches P if and only if P selects at least one node when evaluated on the document’s root • Set P = {P1,…,Pn} • Set I = {o1,…,on}

  7. XPush Machine • Modified deterministic pushdown automaton • Simulate the execution of XPath filters • Input :stream of XML documents • Outout:oids • Changes: • States:top-down,bottom-up • Accepts SAX events as input

  8. XPush Machine(cont.)

  9. SAX call-back functions current state(qt,qb)

  10. P1 = //a[b/text()=1 and .//a[@c>2]] P2 = // a[@c>2 and b/text()=1] <a> <b> 1 </b> <a c= “3” > <b> 1 </b></a></a> qo q2 qo q1 qo q1 qo qo qo q4 q4 q4 q5 qo qo qo q3 q3 q3 q3 q3 q3 q3 q3 q9 qo qo qo qo qo qo qo qo qo qo qo qo qo q15

  11. Compiling a set of XPath filters to an XPush Machine • Convert XPath filters P1,…Pn into an Alternating Finite Automaton A1,…An • Translate all AFAs to a single XPush machine

  12. Step1:Construct the AFA • Nondeterministic finite automaton A1,…,An • S:union of all states in A1,…,An • One initial state s1,…,sn • terminal states are OR states labeled with an atomic predicate on data values • πs(v): true of predicates on v V, else false

  13. Step1:Construct the AFA (cont.) • States label: AND, OR, or NOT • εtransitions • δ: S * (Σ∪ {ε})  P(S) • AND and OR states :εtransitions NOT states : one outgoing transition

  14. Step1:Construct the AFA (cont.) • Given an XML document tree, AFA accepts document: • Initial states matches the root node • OR state s matches node x: • node x is a data value node and πs(v)=true • Some transition s’ δ(s,a) matches y(child of x labeled a) • AND state s matches node x: • All transitions s’ δ(s,ε) matches x • NOT state s matches node x: • If s’doesn’t match x ,δ(s,ε) = {s’}

  15. AFAs for P1,P2

  16. example1 • S = {1,..,13} • s1 = 1,s2 = 8 • wildcard:δ(5,@c) = Ø , δ(5,b) =5, δ(5,a) =6 • And states : states2 and 9 • π7(55)=true, π2(v)=false • State :correspond to a subquery in XPath: state2 [b/text()=1 and .//a[@c>2]]

  17. Step2: construct XPush Machine • (Qt,Qb,qot,qob,tpush,tvalue,tpop,)

  18. tpop(qb,a)= δ-1(q,a) • δ-1(q,a) {s’|δ(s’,a) ∩ q≠ Ø } • eval (q): a set of states q • Adds to q all states that are implied by states already in q • AND states • OR states • NOT states

  19. XPush Machine

  20. example2 • tvalue(qot,1)={4,13} = q1 • tvalue(qot,x)={7,11} = q2 , for x > 2 tvalue(qot,x)= {Ø} = qo, for all other values of x • tpop(q8,a)={1,5} = q14 • tbadd(q3, q6)={3,12}∪{5}= q8 • leaf states cannot match with any other statesno mixed data • <a>1<b>2</b></a> X

  21. Lazy XPush Machine • Do not construct states that are inconsistent with DTD • Lazy evaluation exploits regularities in the data that are not captured by the DTD • Avoid constructing States don’t occur in a given data set

  22. Top-down Pruning • <e1>….<c>ci1</c>…..<c>cij</c>…</e1> • keeping track of the enabled branches in the top-down state • bottom-up computations only at the enabled branches

  23. Order Optimization • /person[name/text()=“smith” and age/text()=“33”and phone/text()=“5551234”] • prec(s)={s’|s’ s} • tadd(qsb,qb)=qsb ∪ {s|s qb,prec(s) qsb}

  24. Training the XPush Machine • Generate one XML document tree for every XPath query

  25. Experiment • Real data sets: Protein • 9.12MB XML fragment • A non-recursive DTD • Max depth of document is 7

  26. Effectiveness

  27. Runtime memory

  28. Hit Ratio

More Related