420 likes | 539 Vues
Branch Code: A Labeling Scheme for Efficient Query Answering on Trees. Yanghua X iao, Ji Hong , Wanyun Cui, Zhenying He, Wei Wang, Guodong Feng April 2012. Background. Tree is widely used data model XML data File directory Spanning tree in graphs
E N D
Branch Code: A Labeling Scheme for Efficient Query Answering on Trees YanghuaXiao, Ji Hong, Wanyun Cui, Zhenying He, Wei Wang, GuodongFeng April 2012
Background • Tree is widely used data model • XML data • File directory • Spanning tree in graphs • One typical task on tree data is querying structural relationships • PC: Parent/Child • AD: Ancestor/Descendant • SR: Sibling Relation • LCA: Lowest Common Ancestor
Previous Labeling Schemes • Interval-based • A triple <start, end, level>, generated by pre-order/post-order traverse • Can not support SR • Hard to compute LCA • Hard to update • Prefix-based • Dewey Code and its variety • Storage costly for deep trees • Hard to update • Prime-based (Integer-based) • Use primes to encode (X. Wu, etc. , ICDE’04) • Storage costly
Our Labeling Schemes: Brach codes • Support various queries efficiently • PC, AD in constant time • LCA in O(d), where d is the depth of tree • Space efficient • Exact labeling cost O(Nd) spaces, but in most cases is less space than other labelings • Approximate labeling allows us to tradeoff accuracy for space cost • Support update on trees • Amortized O(logN) modification cost by Splay tree
Outline • Original Idea • Definition of BranchCode • Addressing Update Operations on Trees • Compression Method • Experimental Evaluation • Conclusion
Basic Idea • Prefix-based A : * B : *.0 C : *.1 D : *.0.0 E : *.0.1 F : *.0.1.0 • Prime-based A : 2 B : 3 × A C : 5 × A D : 7 × B E : 11 × B F : 13 × E Our Idea
Representation of Numbers Complex Radix Digit Vector: D = <d0,d1,d2,…dn> Radix Vector: R = <r0,r1,r2,…rn> S(D,R) = , where Simple Radix • Decimal (10-based): 123, 78, 23472, … • Binary (2-based): 0, 1, 101, 1010, 1101,…
Complex Radix • The representation of complex radix can be formalized in recursive style: ,where , Prefix form
Outline • Original Idea • Definition of BranchCode • Addressing Update Operations on Trees • Compression Mechanism • Experimental Evaluation • Conclusion
Definition of BranchCode • Definition I: B Code. B code for a node b in an ordered tree T, is a function defined as: ,where . Here d(v) is the depth of v and p(v) is the parent of v; x(v) and y(v) are the degree of v and the order (from 0) of v among its siblings.
Example [3 , -] [3 , 1] [2 , 1] [- , 1] • R = <2, 3, 3> • D = <1,1,1> • b(n) = S(D, R) = 1 + 2 × (1 + 3 × 1) = 13
Query Answering 1. Ancestor/Descendant (AD) Determination is the descendant of and 2. Navigability 3. Lowest Common Ancestor (LCA) Stems from Navigability. Sibling Relationship , are siblings and .
Outline • Original Idea • Definition of BranchCode • Addressing Update Operations on Trees • Compression Mechanism • Experimental Evaluation • Conclusion
BranchCode for Dynamic Trees • S(D,R), where D = <d0,d1,d2,…dn> R = <r0,r1,r2,…rn> • S’(D’,R’), where D’ = <d0,d1,d2,…,di’,…dn> R’ = <r0,r1,r2,…,ri’,…,rn> • Delta = |S’ – S| • How to calculate Delta?
Incremental Update of BranchCode • Lemma 6 (Effect on g function): If a new node is inserted as a child of node s, then for any node k except the newly added node in the subtreeTs, the increment of g(k) satisfies the following equation:
Incremental Update of BranchCode • Lemma 7 (Effect on h function (degree change)): For node s in a tree T, if its degree increases by one, then for any node k in the subtreeTs, the increment of h(k) caused by the degree change of s satisfies the following equation:
Incremental Update of BranchCode • Lemma 8 (Effect on h function (order change)): If the order of a node s in tree T increases by one, then for each of its descendant k in Ts the increment of k’s h function caused by the order change of s is:
Incremental Update of BranchCode • Theorem 10 (Increment of BranchCode): If we insert a new node as a child of s, for any node k in Ts except s, the increment of , i.e. is given by:
Affect Nodes after Update • When we insert (or delete) a child of a particular node, all its descendants will be affected. • According to mathematical proofs, in expection O(n) nodes can be affected after an insertion operation in some bad cases, here n is the size of the tree.
Affect Nodes after Update (Cont’d) • Post-order traversal on trees. Seq = {2, 3, 6, 7, 4, 5, 1} • Two properties of post-order sequence: • All descendants of a single node are consecutive in the post-order sequence. • All descendants of a set of consecutive siblings are consecutive in the post-order sequence. Use Splay Tree to maintain the sequence.
Update and query based on splay tree • Update Based on Splay Tree
Outline • Original Idea • Definition of BranchCode • Addressing Update Operations on Trees • Compression Mechanism • Experimental Evaluation • Conclusion
Compressed BranchCode • Definition of Compressed Code:
Property of Compressed Code • Congruence: • CA Determination:
Outline • Original Idea • Definition of BranchCode • Addressing Update Operations on Trees • Compression Mechanism • Experimental Evaluation • Conclusion
Results on Real Data • Data sets:
Outline • Original Idea • Definition of BranchCode • Addressing Update Operations on Trees • Compression Mechanism • Experimental Evaluation • Conclusion
Conclutions • We systematically explore the basic properties about branch code and construct conditions for correctly determining the relationships of nodes in trees. • The compressed BranchCodereduces the storage cost to linear complexity. • We also design an incremental approach (of O(logN) amortized update cost and query cost) based on splay tree to maintain branch codes on dynamic trees.
Open Question • Compressed Code False Positive (FP) Answers • Multiple ModulosReduce Possibility of FP How to theoretically estimate the possibility of FP given particular modulo set?
Motivation of Problem • Why you study this problem?
Related works • How did people solve this problem in the previous works? • Survey of any other related works • Problems that is similar to your works • Techniques that used in your solution • Any other related works
Problem definition • Formal definition • Property of proposed problem • Is this problem novel • Difference of this problem to the related problem • Does this problem deserve our research efforts? • Challenges of this problem • Is this problem NP-hard? If so, give the proof
Baseline Solution • What is the naive solution to solve this problem • Why this solution is unacceptable? • Complexity • Salability • Or any other issues
Your solution • Basic idea of your solution • Example if exists • Algorithm framework of your solution
Key technique of your solution • For each technique, give the following • Rationality of this technique • Procedure of the technique • Can we prove the efficiency or effectiveness of your solution?If so, give them • Optimization of your technique when handle large data or dynamic data
Planning of next step • What you plan to do as the next step? • Checkpoint • Delivery