200 likes | 293 Vues
This research paper explores techniques for identifying top nodes in vast networks based on query topics and distance metrics. The study assesses various approaches, such as weighted set cover and graph skyline, to rank and discover dominant nodes efficiently.
E N D
Finding Skyline Nodes in Large Networks Arijit Khan* Vishwakarma Singh* Jian Wu# *Computer Science, University of California, Santa Barbara, USA #College of Computer Science, Zhejiang University, China {arijitkhan, vsingh}@cs.ucsb.edu, wujian2000@zju.edu.cn
Motivation Query in LinkedIn Network: If John is interested in Big Data, Cloud Computing, and Map Reduce, who will be the top-5 people John should ask about these topics? • Evaluation Metrics: • Distance from the query node. (John) • Coverage of the Query Topics. (Big Data, Cloud Computing, Map Reduce) Finding Skyline Nodes in Large Networks 2
Homogeneous Approach ? Query in LinkedIn Network: If John is interested in Big Data, Cloud Computing, and Map Reduce, who will be the top-5 people John should ask about these topics? Score = λ . Distance + (1- λ ). Coverage How to get λ ? Finding Skyline Nodes in Large Networks 3
Weighted Set Cover ? • Find nodes with smallest aggregate distance from the query node, such that they cover all query topics. u0 = q Q = { a, b, c } • Ignore some interesting nodes. • Cannot rank the results. a b c u1 u2 u3 abc a cd u5 u4 u6 abc de u7 u8 Finding Skyline Nodes in Large Networks 4
Graph Skyline • Dominance on Coverage: u >c v • Query topics covered by node u is a superset of the query topics covered by node v. • Dominance on Distance: u >d v • Distance of u from q is less than that of v from q. • Dominance: u > v • (1) u >c v and u ≥d v ; • or (2) u ≥c v and u >d v. u0 = q Q = { a, b, c } a b c u1 u2 u3 abc a cd u5 u4 u6 abc de u7 u8 Graph Skyline: A node is a skyline node if it is not dominated by any other node in the network. Finding Skyline Nodes in Large Networks 5
Ranking of Skyline Nodes • Too many skyline nodes. • Rank them. u0 = q Q = { a, b, c } • Dominance Count: # nodes dominated by a skyline node. [Lin et. al., ICDE ‘07] • Higher Dominance Count => more pruning from candidate set. a b c u1 u2 u3 abc a cd u5 u4 u6 • 1. DC(u4) = {u5, u6, u7}, • 2. DC(u1) = {u5} • 3. DC(u2) = Φ; 4. DC(u3) = Φ abc de u7 u8 Problem Statement: Given a query node and a set of query topics in a network, find the top-k skyline nodes with maximum dominance count. Finding Skyline Nodes in Large Networks 6
Algorithm • Construct a Query DAG. • Three variables associated with each DAG node: Count (C), Dominance • (D), Traversal (T). u0 = q • Naïve Complexity: O(n2r) • Complexity with • Preprocessing: O(nr2) Q = { a, b, c } C = 2 D = - T = - abc a b c C = 0 D = - T = - ab ac bc C = 0 D = - T = - u1 u2 u3 C = 0 D = - T = - abc a cd u5 u4 u6 C = 1 D = - T = - C = 2 D = - T = - C = 2 D = - T = - abc de a b c u7 u8 Input Network Query DAG Finding Skyline Nodes in Large Networks 7
Query DAG Construction • Preprocessing: For each label, find a sorted list of nodes that contain the label. • Online Query DAG Construction: Incremental DAG construction. u0 = q Q = { a, b, c } u4 u7 u3 u4 u6 u7 a b c c ab u1 u2 u3 abc a cd a b u5 u4 u6 abc de u1 u5 u2 u7 u8 Finding Skyline Nodes in Large Networks 8
Query DAG Construction (cont.) • Preprocessing: For each label, find a sorted list of nodes that contains the label. • Online Query DAG Construction: Consider the labels and their sorted lists in order. u0 = q abc Q = { a, b, c } u4 u7 a b c ab u1 u2 u3 abc a cd a b c u5 u4 u6 abc de u1 u5 u2 u3 u6 u7 u8 Finding Skyline Nodes in Large Networks 9
Query DAG Construction (cont.) • Preprocessing: For each label, find a sorted list of nodes that contains the label. • Online Query DAG Construction: Consider the labels and their sorted lists in order. u0 = q abc Q = { a, b, c } u4 u7 a b c bc ab ac u1 u2 u3 abc a cd a b c u5 u4 u6 abc de u1 u5 u2 u3 u6 u7 u8 Finding Skyline Nodes in Large Networks 10
Find Dominance Variable • Perform a topological ordering of the DAG nodes to evaluate the Dominance variable (D) of each DAG node. • # Nodes dominated (or equal) by coverage. u0 = q • Naïve Complexity: O(n2r) • Complexity by • Topological Ordering: O(3r) Q = { a, b, c } C = 2 D = 7 T = - abc a b c C = 0 D = 4 T = - ab ac bc C = 0 D = 3 T = - u1 u2 u3 C = 0 D = 3 T = - abc a cd u5 u4 u6 C = 1 D = 1 T = - C = 2 D = 2 T = - C = 2 D = 2 T = - abc de a b c u7 u8 Input Network Query DAG Finding Skyline Nodes in Large Networks 11
Find Traversal Variable • Perform a Breadth First Search (BFS) starting from the query node. • # Nodes not dominated by distance. u0 = q C = 2 D = 7 T = 1 • Complexity by BFS: O(n+e) Q = { a, b, c } abc a b c C = 0 D = 4 T = 0 ab ac bc C = 0 D = 3 T = 0 u1 u2 u3 C = 0 D = 3 T = 0 abc a cd u4 u6 u5 h =2 C = 1 D = 1 T = 1 C = 2 D = 2 T = 2 C = 2 D = 2 T = 2 abc de a b c u7 u8 Input Network Query DAG Finding Skyline Nodes in Large Networks 12
Find Skyline Nodes • Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node. • Helps to prune non-skyline nodes directly. u0 = q Q = { a, b, c } abc a b c a b c ab ac bc u1 u2 u3 h =1 abc a cd u4 u5 u6 abc de a b c u7 u8 Input Network Query DAG Lookup Table Finding Skyline Nodes in Large Networks 13
Find Skyline Nodes (cont.) • Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node. • Helps to prune non-skyline nodes directly. u0 = q Q = { a, b, c } abc a b c ab ac bc u1 u2 u3 abc a cd u4 u5 u6 abc de a b c h =2 u7 u8 Input Network Query DAG Lookup Table Finding Skyline Nodes in Large Networks 14
Dominance Count of Skyline Nodes • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. C = 2 D = 7 T = 0 u0 = q Q = { a, b, c } abc a b c ab ac bc C = 0 D = 4 T = 0 u1 u2 u3 C = 0 D = 3 T = 0 C = 0 D = 3 T = 0 abc a cd u4 u5 u6 h =2 C = 2 D = 2 T = 1 C = 1 D = 1 T = 1 abc de a b c C = 2 D = 2 T = 1 u7 u8 Input Network Query DAG Lookup Table Finding Skyline Nodes in Large Networks 15
Pruning and Early Termination • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • Top-k Pruning: Dominance Variable of a DAG node has smaller value than the smallest Dominance Count in the top-k buffer. • Early Termination: Skyline Bits of all entries in the Lookup Table are 1’s. Finding Skyline Nodes in Large Networks 16
Experimental Results • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • DBLP: 0.7M Nodes, 3M Edges, 10 Node Labels (distinct). • 5 Query Topics. Finding Skyline Nodes in Large Networks 17
Efficiency • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • DBLP: 185M Nodes, 90M Edges, 1000 Node Labels (distinct). • 5 Query Topics, Top-5 Result Nodes. Finding Skyline Nodes in Large Networks 18
Conclusion and Future Works • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • Efficient Algorithm to find top-k skyline nodes in large attributed network. • Required experimental evaluation in real and synthetic datasets. • Time Complexity is linear in the number of nodes and edges in the network. Distance based indexing might improve the efficiency. • Top-k Skyline set instead of Top-k Skyline nodes might be more effective. Finding Skyline Nodes in Large Networks 19
Questions • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. Thank You ! ! ! Finding Skyline Nodes in Large Networks 20