Stream Processing of XPath Queries with Predicates

Stream Processing of XPath Queries with Predicates Ashish Kumar Gupta Dan Suciu University of Washington SIGMOD 2003 報告者:蔡明瑾

Introduction • XML messages :exchange information • XML stream processing problem • Processing XPath queries(filters) on an incoming stream of XML packets • Workload is very high • XPath queries • multiple predicates

Definition - XPath fragment E is atomic predicates

Definition – XML and SAX Parsers • startDocument() • startElement(a) • text(s) • endElement(a) • endDocument() • a:element or attribute label • s:data value

<a c=“3”> 4</a> • startDocument() • startElement(a) • startElement(@c) • text(“3”) • endElement(@c) • startElement(b) • text(“4”) • endElement(b) • endElement(a) • endDocument()

XML stream processing problem • XPath expression P:boolean filter • A XML documentmatches P if and only if P selects at least one node when evaluated on the document’s root • Set P = {P1,…,Pn} • Set I = {o1,…,on}

XPush Machine • Modified deterministic pushdown automaton • Simulate the execution of XPath filters • Input :stream of XML documents • Outout:oids • Changes: • States:top-down,bottom-up • Accepts SAX events as input

XPush Machine(cont.)

SAX call-back functions current state(qt,qb)

P1 = //a[b/text()=1 and .//a[@c>2]] P2 = // a[@c>2 and b/text()=1] <a> 1 <a c= “3” > 1 </a></a> qo q2 qo q1 qo q1 qo qo qo q4 q4 q4 q5 qo qo qo q3 q3 q3 q3 q3 q3 q3 q3 q9 qo qo qo qo qo qo qo qo qo qo qo qo qo q15

Compiling a set of XPath filters to an XPush Machine • Convert XPath filters P1,…Pn into an Alternating Finite Automaton A1,…An • Translate all AFAs to a single XPush machine

Step1:Construct the AFA • Nondeterministic finite automaton A1,…,An • S:union of all states in A1,…,An • One initial state s1,…,sn • terminal states are OR states labeled with an atomic predicate on data values • πs(v): true of predicates on v V, else false

Step1:Construct the AFA (cont.) • States label: AND, OR, or NOT • εtransitions • δ: S * (Σ∪ {ε})  P(S) • AND and OR states :εtransitions NOT states : one outgoing transition

Step1:Construct the AFA (cont.) • Given an XML document tree, AFA accepts document: • Initial states matches the root node • OR state s matches node x: • node x is a data value node and πs(v)=true • Some transition s’ δ(s,a) matches y(child of x labeled a) • AND state s matches node x: • All transitions s’ δ(s,ε) matches x • NOT state s matches node x: • If s’doesn’t match x ,δ(s,ε) = {s’}

AFAs for P1,P2

example1 • S = {1,..,13} • s1 = 1,s2 = 8 • wildcard:δ(5,@c) = Ø , δ(5,b) =5, δ(5,a) =6 • And states : states2 and 9 • π7(55)=true, π2(v)=false • State :correspond to a subquery in XPath: state2 [b/text()=1 and .//a[@c>2]]

Step2: construct XPush Machine • (Qt,Qb,qot,qob,tpush,tvalue,tpop,)

tpop(qb,a)= δ-1(q,a) • δ-1(q,a) {s’|δ(s’,a) ∩ q≠ Ø } • eval (q): a set of states q • Adds to q all states that are implied by states already in q • AND states • OR states • NOT states

XPush Machine

example2 • tvalue(qot,1)={4,13} = q1 • tvalue(qot,x)={7,11} = q2 , for x > 2 tvalue(qot,x)= {Ø} = qo, for all other values of x • tpop(q8,a)={1,5} = q14 • tbadd(q3, q6)={3,12}∪{5}= q8 • leaf states cannot match with any other statesno mixed data • <a>12</a> X

Lazy XPush Machine • Do not construct states that are inconsistent with DTD • Lazy evaluation exploits regularities in the data that are not captured by the DTD • Avoid constructing States don’t occur in a given data set

Top-down Pruning • <e1>….<c>ci1</c>…..<c>cij</c>…</e1> • keeping track of the enabled branches in the top-down state • bottom-up computations only at the enabled branches

Order Optimization • /person[name/text()=“smith” and age/text()=“33”and phone/text()=“5551234”] • prec(s)={s’|s’ s} • tadd(qsb,qb)=qsb ∪ {s|s qb,prec(s) qsb}

Training the XPush Machine • Generate one XML document tree for every XPath query

Experiment • Real data sets: Protein • 9.12MB XML fragment • A non-recursive DTD • Max depth of document is 7

Effectiveness

Runtime memory

Hit Ratio

Stream Processing of XPath Queries with Predicates

Stream Processing of XPath Queries with Predicates

Presentation Transcript

Scalable stream processing with Storm

Processing and Optimization of Forecast Queries

Stream Processing with BigData: SSS-MapReduce

Bottom-up Evaluation of XPath Queries

Stream Processing

Customizable Parallel Execution of Scientific Stream Queries

On the Minimization of XPath Queries

XPath Queries on Streaming Data

Efficient Processing of XPath Queries Using Indexes

BLAS : An Efficient XPATH Processing System

BLAS: An Efficient XPath Processing System

Efficient processing of XPath queries with structured overlay networks

XML Storage and XPath Queries in Oracle

Efficient Algorithm For Processing XPath Queries

Event Stream Processing with Out-of-Order Data Arrival

Transforming XPath Queries for Bottom-Up Query Processing

Efficient Processing of Metric Skyline Queries

Using Processing Stream

XML Stream Processing

Ranking-based Processing of SQL Queries

Efficient processing of path query with not-predicates on XML data