10 likes | 143 Vues
This paper presents a novel algorithm for online distance computation in massive graphs, leveraging sketch-based techniques. The authors, Atish Das Sarma, Sreenivas Gollapudi, Marc Najork, and Rina Panigrahy, explore distance/path computation specifically on social networks and web-scale data. They propose an efficient method to precompute small sketches for nodes, enabling quick distance estimates between any two nodes during queries. The study utilizes real-world data, demonstrating effectiveness across undirected and directed graphs, and contributes significantly to developing foundational algorithms for various online applications.
E N D
Sketch-Based Distance Estimates for Web Scale Graphs Atish Das Sarma (Georgia Tech), Sreenivas Gollapudi, Marc Najork, and Rina Panigrahy (Microsoft) Distance Computation Algorithm • Online Distance Computation on Massive Graphs • Distance/path computation on Social Networks • Distance between search and ad results • Building block for other online algorithms • pre-computation : all sketches • query time: nodes u and v • at runtime, retrieve Obama • Road Networks • Already solved very efficiently – specific to 2D • Set Sketch Based Distances Effectiveness of our Algorithm For all nodes x, precompute small information Sketch(x) At query time, combine Sketch(u) and Sketch(v) to estimate distance. You undirected Real Data • 65M web pages, 420M URLs, 2.3B edges • C = 60M (directed), C = 128M (undirected) • Undirected distance [1,15] • Directed distance [1,100] (∞ otherwise) • Sketch size: (s+8)k |logC|bits • k = 3 number of copies of seed sets • s = 12 size of seed id. 8 to store distance • ~200, 400 bytes for undirected, directed Sketch computation Repeatedly (k times), sample random set of nodes (S) of sizes 20, 21, 22, …, 2│logC| from candidate set C and store nearest node and distance to it from all nodes in the graph. directed