Early Profile Pruning on XML-aware Publish-Subscribe Systems

Bib article article vol year title title root no Q1 a Q2 a Q4 a Q5 e Q6 e Q3 a 7 1996 Profile Index Profiles proceedings t1 11 t2 a a a a a a a a a a a year SIGMOD journal a1 author b c e e f g P1 P2 P3 1 </e> b b b b b b b b b b b 2006 TPDS Prüfer Sequence b2 b5 </d> c d f f h h last first c c c c c c c c c c c author d c3 c6 d <d> c d b d h 4 1 2 d 3 Florescu Daniela e last mi first (a)‏ (b)‏ (c)‏ c c c <c> (a) Document (b) Queries Q1 d4 d7 e9 0 Structural constraints: ////article[/author[@last=``Smith'']]//procs[@conf=``VLDB''] f b b b <b> DeWitt J David c b a 5 6 7 8 a e8 a a a c d f10 <a> Profile Manager Matching Algorithm Tree pattern: 3 2 4 b 3 4 Q2 b c article (a) Document and BUFF (b)‏ (c)‏ (d)‏ Q1 1 2 Q1 d a c d 5 6 1 5 Matching Module author proceedings Q2 e Q2 5 1 a </f> </e> 2 </d> f h f e a 3,6 </c> 7 8 9 6 7 0 8 last conf 0 Q4 Q3 Q3 e h 5 1 e Profiles (queries)‏ Input Documents Matched Documents h e a f f 11 12 10 11 12 1,2 1,2,5 1,2 c c c Q5 10 9 Q5 Q4 b b b b g h g e 13 14 14 13 a a a a Q6 Q6 (c) NFA (d) BUFF (e)‏ (f)‏ (g)‏ (h)‏ Publisher Publisher Publisher Publisher Publisher Publisher Documents Documents Documents Matching algorithm Profile Profile Profile Profile Result Result Result Submit, Modify Submit, Modify Submit, Modify Submit, Modify Publisher Publisher Publisher Publisher 1 A 2 B 5 D 3 C 6 E 8 F 4 E 7 E 9 D Prüfer Sequence 3 2 1 6 5 8 5 1 C B A E D F D A 1.03 0.95 0.35 Early Profile Pruning on XML-aware Publish-Subscribe Systems Mirella M. Moro, Petko Bakalov, Vassilis J. Tsotras @ UCR Full version appears in the Proceedings of the 33rd International Conference on Very Large Data Bases (VLDB 2007) 1 Motivation 3 FSM based bottom up approach for XML filtering (BUFF) Theorem: If a query tree Q is a subgraph of a document tree D then the Prüfer sequence Q is a subsequence of the Prüfer sequence of D Increased popularity of Publish-subscribe systems – an important class of content-based dissemination systems where the message transmission is defined by the message content, rather than its destination IP address. 2.1 Bottom-up vs. Top-down filtering • We can derive two new sequences • Upper bound U: for each position take largest element • Lower bound L: for each position take smallest element • L and U form a Sequence Envelope. • Top-down approach: (i.e. in-order traversal or depth first order): advancing the state machine for each XML element (or attribute) read. • Bottom-up approach: This approach takes into consideration the fact that an XML document has its more selective elements in the leaves Sequence envelopes can be nested forming BoXFilter tree 2.2 BUFF algorithm • The document is parsed through a SAX parser, which triggers events for specific marks (tags) in the XML document • The machine keeps a runtime stack that stores the current document path being processed. • For each opening tag, the respective element is pushed to the stack • For each closing tag, an element is popped from and is employed to trigger a set of transitions within the NFA. 2 System Description • Participants in the system: • Publisher: generates messages outside of the system • Subscribers: announce their interest by submitting profiles • Matching process: in charge of finding which messages satisfy which profile • The profiles in the system are organized in BoXFilter tree. Documents are traversed thought the tree Documents • There are two variations of the filtering algorithm • Sequential – documents are processed one by one • Batch processing – documents are organized in a tree like the queries and both trees are joined. • After the traversal, there is a verification step 4 Bounding-based XML Filtering (BoxFilter) • The data is exchanged in XML format. • Nodes - correspond to elements, attributes or text values • Edges - represent immediate element-sub element or element-value relationships 5 Results • Prüfer Sequence: A unique sequential encoding of a enumerated and labeled tree • Algorithm: • Iteratively removes nodes from the tree. • At each iteration, the algorithm finds and removes the leaf with the smallest number and adds to the Prüfer sequence the number of that leaf's parent. <Bib> <article vol=“7” no=“11”> <title>t1</title> <author> <last>DeWitt</last> <mi>J</mi> <first>David</first> </author> <journal>TPDS</journal> <year>1996</year> </article> <article> <title>t2</title> <author> <last>Florescu</last> <first>Daniela</first> </author> <proceedings>SIGMOD </proceedings> <year>2006</year> </article> </Bib> (b) Tree representation • The user profiles are expressed in XML query language (XPath, XQuery) • XML query contains • structural constraints • value-based constraints (a) Document

Early Profile Pruning on XML-aware Publish-Subscribe Systems

Early Profile Pruning on XML-aware Publish-Subscribe Systems

Presentation Transcript

Publish-Subscribe Systems

Survey of Publish Subscribe Event Systems

Preference-Aware Publish/Subscribe Delivery with Diversity

Preference-Aware Publish/Subscribe Delivery with Diversity

Partition-Tolerant Distributed Publish/Subscribe Systems

Ranked Publish/Subscribe Delivery

Opportunistic Multipath Forwarding in Publish/Subscribe Systems

On Novelty in Publish/Subscribe Delivery

Overlay Neighborhoods for Distributed Publish/Subscribe Systems

Preferential Publish/Subscribe

Early Profile Pruning on XML-aware Publish-Subscribe Systems

RESTful Publish Subscribe

Publish/Subscribe

Multicast Protocols for Publish Subscribe Systems

Summer School: Events, Publish/Subscribe & Systems

Distributed Publish/Subscribe

Supporting Disconnected Operations in Publish/Subscribe Systems

Management of Uncertainty in Publish/Subscribe Systems

Securing Content Based Routing Publish-Subscribe Systems

Distributed Publish/Subscribe

Management of Uncertainty in Publish/Subscribe Systems

Publish & Subscribe

Early Profile Pruning on XML-aware Publish-Subscribe Systems