140 likes | 252 Vues
Searching for k-cliques in unknown graphs. Roni Stern, Meir Kalech , Ariel Felner Department of Information Systems Engineering Ben Gurion University. Topics. Known vs. unknown graphs Finding k-clique with minimum exploration Heuristics MDP and Monte-Carlo approach
E N D
Searching for k-cliques in unknown graphs Roni Stern, Meir Kalech, Ariel Felner Department of Information Systems Engineering Ben Gurion University
Topics • Known vs. unknown graphs • Finding k-clique with minimum exploration • Heuristics • MDP and Monte-Carlo approach • Experimental results • Simulated random graphs • Crawling in Google Scholar
Known vs. unknown graphs Explored • Known graph • Unknown graph • Need exploration actions • World Wide Web • Dynamic, too large and simply unknown Known Unknown http://www.google.com/search?&q=nice HTTP HTML HTTP Parse Links
K-cliques in unknown graphs 3-cliques 4-cliques A D E B C F
K-cliques in unknown graphs • How to find a k-clique in unknown graphs? • Goal: minimize exploration ? ? ? Which node to explore?
K-cliques in unknown graphs • How to find a k-clique in unknown graphs? • Goal: minimize exploration Cost D A A ? A E D E B B ? C B ? 4 G F I H G F 6 Which node to explore?
Heuristic #1: Known Degree Known degree: number of explored neighbors F 1 E 2 D 2 A 1 B 1 C 1
Heuristic #2: Clique* Expand the largest potential k-clique • [Altshuler et. al. ’05] C & D are a potential 4-clique A 3 ? 1 B C C 2 ? A,B &C are NOT a potential 4-clique D 2 ? ? 1 1 ? ? 1 An m-clique K-1-m common neighbors
Heuristic #2: Clique* Expand the largest potential k-clique • [Altshuler et. al. ’05] ? ? C ? D ? ? ? ? An m-clique K-1-m common neighbors
Heuristc #3: RClique* • Unknown graph but known domain • How can a probabilistic model be used? • MDP state space is too large • Monte Carlo sampling approach: • Simulate exploration with domain model • Use average sample results ? 0.1 0.3 ? 0.8 ?
Experimental Results Random and scale free graphs Heuristics much better than random Clique* advantage diminishes with density RClique* is much better
Real application, crawling online • Max. 101 nodes explored 100% 83% 66% 58% Success rate 28% 30% 20% 0% 0%
Summary • Find k-cliques in unknown graph • Minimize exploration cost • Heuristics • Known Degree, Clique*, RClique* • Future work • Incorporate with data mining techniques • Exploring with multiple agents • Generalize to subgraph isomorphism