Finding Skyline Nodes in Large Networks

Finding Skyline Nodes in Large Networks Arijit Khan* Vishwakarma Singh* Jian Wu# *Computer Science, University of California, Santa Barbara, USA #College of Computer Science, Zhejiang University, China {arijitkhan, vsingh}@cs.ucsb.edu, wujian2000@zju.edu.cn

Motivation Query in LinkedIn Network: If John is interested in Big Data, Cloud Computing, and Map Reduce, who will be the top-5 people John should ask about these topics? • Evaluation Metrics: • Distance from the query node. (John) • Coverage of the Query Topics. (Big Data, Cloud Computing, Map Reduce) Finding Skyline Nodes in Large Networks 2

Homogeneous Approach ? Query in LinkedIn Network: If John is interested in Big Data, Cloud Computing, and Map Reduce, who will be the top-5 people John should ask about these topics? Score = λ . Distance + (1- λ ). Coverage How to get λ ? Finding Skyline Nodes in Large Networks 3

Weighted Set Cover ? • Find nodes with smallest aggregate distance from the query node, such that they cover all query topics. u0 = q Q = { a, b, c } • Ignore some interesting nodes. • Cannot rank the results. a b c u1 u2 u3 abc a cd u5 u4 u6 abc de u7 u8 Finding Skyline Nodes in Large Networks 4

Graph Skyline • Dominance on Coverage: u >c v • Query topics covered by node u is a superset of the query topics covered by node v. • Dominance on Distance: u >d v • Distance of u from q is less than that of v from q. • Dominance: u > v • (1) u >c v and u ≥d v ; • or (2) u ≥c v and u >d v. u0 = q Q = { a, b, c } a b c u1 u2 u3 abc a cd u5 u4 u6 abc de u7 u8 Graph Skyline: A node is a skyline node if it is not dominated by any other node in the network. Finding Skyline Nodes in Large Networks 5

Ranking of Skyline Nodes • Too many skyline nodes. • Rank them. u0 = q Q = { a, b, c } • Dominance Count: # nodes dominated by a skyline node. [Lin et. al., ICDE ‘07] • Higher Dominance Count => more pruning from candidate set. a b c u1 u2 u3 abc a cd u5 u4 u6 • 1. DC(u4) = {u5, u6, u7}, • 2. DC(u1) = {u5} • 3. DC(u2) = Φ; 4. DC(u3) = Φ abc de u7 u8 Problem Statement: Given a query node and a set of query topics in a network, find the top-k skyline nodes with maximum dominance count. Finding Skyline Nodes in Large Networks 6

Algorithm • Construct a Query DAG. • Three variables associated with each DAG node: Count (C), Dominance • (D), Traversal (T). u0 = q • Naïve Complexity: O(n2r) • Complexity with • Preprocessing: O(nr2) Q = { a, b, c } C = 2 D = - T = - abc a b c C = 0 D = - T = - ab ac bc C = 0 D = - T = - u1 u2 u3 C = 0 D = - T = - abc a cd u5 u4 u6 C = 1 D = - T = - C = 2 D = - T = - C = 2 D = - T = - abc de a b c u7 u8 Input Network Query DAG Finding Skyline Nodes in Large Networks 7

Query DAG Construction • Preprocessing: For each label, find a sorted list of nodes that contain the label. • Online Query DAG Construction: Incremental DAG construction. u0 = q Q = { a, b, c } u4 u7 u3 u4 u6 u7 a b c c ab u1 u2 u3 abc a cd a b u5 u4 u6 abc de u1 u5 u2 u7 u8 Finding Skyline Nodes in Large Networks 8

Query DAG Construction (cont.) • Preprocessing: For each label, find a sorted list of nodes that contains the label. • Online Query DAG Construction: Consider the labels and their sorted lists in order. u0 = q abc Q = { a, b, c } u4 u7 a b c ab u1 u2 u3 abc a cd a b c u5 u4 u6 abc de u1 u5 u2 u3 u6 u7 u8 Finding Skyline Nodes in Large Networks 9

Query DAG Construction (cont.) • Preprocessing: For each label, find a sorted list of nodes that contains the label. • Online Query DAG Construction: Consider the labels and their sorted lists in order. u0 = q abc Q = { a, b, c } u4 u7 a b c bc ab ac u1 u2 u3 abc a cd a b c u5 u4 u6 abc de u1 u5 u2 u3 u6 u7 u8 Finding Skyline Nodes in Large Networks 10

Find Dominance Variable • Perform a topological ordering of the DAG nodes to evaluate the Dominance variable (D) of each DAG node. • # Nodes dominated (or equal) by coverage. u0 = q • Naïve Complexity: O(n2r) • Complexity by • Topological Ordering: O(3r) Q = { a, b, c } C = 2 D = 7 T = - abc a b c C = 0 D = 4 T = - ab ac bc C = 0 D = 3 T = - u1 u2 u3 C = 0 D = 3 T = - abc a cd u5 u4 u6 C = 1 D = 1 T = - C = 2 D = 2 T = - C = 2 D = 2 T = - abc de a b c u7 u8 Input Network Query DAG Finding Skyline Nodes in Large Networks 11

Find Traversal Variable • Perform a Breadth First Search (BFS) starting from the query node. • # Nodes not dominated by distance. u0 = q C = 2 D = 7 T = 1 • Complexity by BFS: O(n+e) Q = { a, b, c } abc a b c C = 0 D = 4 T = 0 ab ac bc C = 0 D = 3 T = 0 u1 u2 u3 C = 0 D = 3 T = 0 abc a cd u4 u6 u5 h =2 C = 1 D = 1 T = 1 C = 2 D = 2 T = 2 C = 2 D = 2 T = 2 abc de a b c u7 u8 Input Network Query DAG Finding Skyline Nodes in Large Networks 12

Find Skyline Nodes • Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node. • Helps to prune non-skyline nodes directly. u0 = q Q = { a, b, c } abc a b c a b c ab ac bc u1 u2 u3 h =1 abc a cd u4 u5 u6 abc de a b c u7 u8 Input Network Query DAG Lookup Table Finding Skyline Nodes in Large Networks 13

Find Skyline Nodes (cont.) • Store DAG nodes into a Lookup Table. Skyline Bit for each DAG node. • Helps to prune non-skyline nodes directly. u0 = q Q = { a, b, c } abc a b c ab ac bc u1 u2 u3 abc a cd u4 u5 u6 abc de a b c h =2 u7 u8 Input Network Query DAG Lookup Table Finding Skyline Nodes in Large Networks 14

Dominance Count of Skyline Nodes • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. C = 2 D = 7 T = 0 u0 = q Q = { a, b, c } abc a b c ab ac bc C = 0 D = 4 T = 0 u1 u2 u3 C = 0 D = 3 T = 0 C = 0 D = 3 T = 0 abc a cd u4 u5 u6 h =2 C = 2 D = 2 T = 1 C = 1 D = 1 T = 1 abc de a b c C = 2 D = 2 T = 1 u7 u8 Input Network Query DAG Lookup Table Finding Skyline Nodes in Large Networks 15

Pruning and Early Termination • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • Top-k Pruning: Dominance Variable of a DAG node has smaller value than the smallest Dominance Count in the top-k buffer. • Early Termination: Skyline Bits of all entries in the Lookup Table are 1’s. Finding Skyline Nodes in Large Networks 16

Experimental Results • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • DBLP: 0.7M Nodes, 3M Edges, 10 Node Labels (distinct). • 5 Query Topics. Finding Skyline Nodes in Large Networks 17

Efficiency • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • DBLP: 185M Nodes, 90M Edges, 1000 Node Labels (distinct). • 5 Query Topics, Top-5 Result Nodes. Finding Skyline Nodes in Large Networks 18

Conclusion and Future Works • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. • Efficient Algorithm to find top-k skyline nodes in large attributed network. • Required experimental evaluation in real and synthetic datasets. • Time Complexity is linear in the number of nodes and edges in the network. Distance based indexing might improve the efficiency. • Top-k Skyline set instead of Top-k Skyline nodes might be more effective. Finding Skyline Nodes in Large Networks 19

Questions • DC(u4) = D(abc)-T(abc)-T(ab)-T(ac)-T(bc)-T(a)-T(b)-T(c)-1 = 3 • Top-k Buffer to store top-k skyline nodes. Thank You ! ! ! Finding Skyline Nodes in Large Networks 20

Finding Skyline Nodes in Large Networks

Finding Skyline Nodes in Large Networks

Presentation Transcript

Skyline

SI 614 Finding communities in networks

Large-Scale MIMO in Cellular Networks

Detecting Phantom Nodes in Wireless Sensor Networks

Community Structure in Large Complex Networks

Removing Hidden Nodes in IEEE 802.11 Wireless Networks

Finding Effectors in Social Networks

Modeling and Finding Abnormal Nodes (chapter 2)

Detecting Phantom Nodes in Wireless Sensor Networks

Hanover Management for Mobile Nodes in IPv6 Networks

Finding Large Sticks and Potatoes in Polygons.

Handover Management for Mobile Nodes in IPv6 Networks

Local Computations in Large-Scale Networks

Finding patterns in large, real networks

Finding Protection Cycles in DWDM Networks

Detecting Phantom Nodes in Wireless Sensor Networks

Finding Distance-Preserving Subgraphs in Large Road Networks

SKYLINE

Modeling and Finding Abnormal Nodes (chapter 2)