1 / 1

Efficient Sketch-Based Distance Estimation for Large-Scale Graphs in Online Settings

This paper presents a novel algorithm for online distance computation in massive graphs, leveraging sketch-based techniques. The authors, Atish Das Sarma, Sreenivas Gollapudi, Marc Najork, and Rina Panigrahy, explore distance/path computation specifically on social networks and web-scale data. They propose an efficient method to precompute small sketches for nodes, enabling quick distance estimates between any two nodes during queries. The study utilizes real-world data, demonstrating effectiveness across undirected and directed graphs, and contributes significantly to developing foundational algorithms for various online applications.

adonis
Télécharger la présentation

Efficient Sketch-Based Distance Estimation for Large-Scale Graphs in Online Settings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sketch-Based Distance Estimates for Web Scale Graphs Atish Das Sarma (Georgia Tech), Sreenivas Gollapudi, Marc Najork, and Rina Panigrahy (Microsoft) Distance Computation Algorithm • Online Distance Computation on Massive Graphs • Distance/path computation on Social Networks • Distance between search and ad results • Building block for other online algorithms • pre-computation : all sketches • query time: nodes u and v • at runtime, retrieve Obama • Road Networks • Already solved very efficiently – specific to 2D • Set Sketch Based Distances Effectiveness of our Algorithm For all nodes x, precompute small information Sketch(x) At query time, combine Sketch(u) and Sketch(v) to estimate distance. You undirected Real Data • 65M web pages, 420M URLs, 2.3B edges • C = 60M (directed), C = 128M (undirected) • Undirected distance [1,15] • Directed distance [1,100] (∞ otherwise) • Sketch size: (s+8)k |logC|bits • k = 3 number of copies of seed sets • s = 12 size of seed id. 8 to store distance • ~200, 400 bytes for undirected, directed Sketch computation Repeatedly (k times), sample random set of nodes (S) of sizes 20, 21, 22, …, 2│logC| from candidate set C and store nearest node and distance to it from all nodes in the graph. directed

More Related