170 likes | 288 Vues
An XMT/MTGL Case Study: PageRank. Jonathan Berry Scalable Algorithms Department Sandia National Laboratories July 23, 2008. Informatics Datasets Are Different. Informatics : The analysis of datasets arising from “information” sources such as the WWW (not physical simulation)
 
                
                E N D
An XMT/MTGL Case Study: PageRank Jonathan Berry Scalable Algorithms Department Sandia National Laboratories July 23, 2008
Informatics Datasets Are Different • Informatics: The analysis of datasets arising from “information” sources such as the WWW (not physical simulation) • Motivating Applications: • Homeland security • Computer security (DOE emphasis) • Biological networks, etc. “One of the interesting ramifications of the fact that the PageRank calculation converges rapidly is that the web is an expander-like graph” Page, Brin, Motwani,Winograd 1999 From UCSD ‘08 Broder, et al. ‘00 Primary HPC Implication: Any partitioning is “bad”
MTGL ADAPTER We Are Developing The MultiThreaded Graph Library • Enables multithreaded graph algorithms • Based upon community standard (Boost Graph Library) • Abstracts data structures and other application specifics • Hide some shared memory issues • Preserves good multithreaded performance S-T connectivity scaling (MTA-2) SSSP scaling (MTA-2) Solve time (sec) Solve time (sec) MTGL C MTGL C MTA-2 Processors MTA-2 Processors
Initial Algorithmic Impacts of MTGL on XMT Are Promising • S-T Connectivity • Gathering evidence for 2005prediction • This plot show results for ≤ 2 billion edges • Even with Threadstorm 2.0 limitations, XMT predictions look good • Connected Components • Simple SV is fast, but hot-spots • Multilevel Kahan algorithm scales (but XMT data incomplete) 32000p Blue Gene\L MTGL/XMT 10p Time (s) MTGL/MTA 10p # Edges MTGLKahan’s algorithm MTGLShiloach-Vishkin algorithm Time (s) # XMT Processors
The “PageRank Derby” • Ranking of data is a key operation • Which terabytes of some petabytes of data are the most interesting? • Which gigabytes of those terabytes? Etc.. • We have chosen PageRank as a candidate kernel ranking operation (though there are others) • We wish to understand computational tradeoffs for various architectures and various datasets • XMT, XT4, Niagara, Netezza, Hadoop, etc. • We simulate real data with “R-MAT” graphs (Faloutsos, et al.) • No previous results for traditional HPC (distributed memory) • Two of Sandia’s top distributed memory people have gotten some.
“R-MAT” (Recursive MATrix Decomposition) • Think of dropping a marble through a sequence of plastic trays with holes in them • Pick an initiator with k=pq holes • The i’th level has k^i holes • The marble goes through each hole with some probability (normalized to 1.0 over all holes) • The bottom level has N cells (1x1) and is the adjacency matrix • The probabilities determine the nature of the graph • All equal generates Erdos-Renyi graphs • Unbalanced probabilities can lead to inverse power-law graphs. 0.57 0.19 0.19 0.05 0.57 0.19 0.57 0.19 0.19 0.05 0.19 0.05 0.57 0.19 0.57 0.19 0.19 0.05 0.19 0.05
PageRank’s Kernel Operation • Vertices “vote” by contributing their current rank to their neighbors (in proportion) • For example, supposing that all current ranks are 1.0: • u contributes 0.5 to x • v contributes 0.33.. To x • w contributes 0.5 to x • This operation, done over all edges, dominates the running time of PageRank v u w x
Attempt 3: Load Balance Via Your Own Threads = “kind of ok, but not very pleasing”
Attempt 4: Remove The Hot Spots, Auto-Parallelize Load balanced and no hot spot! Extra memory for in-adjacencies
“CANAL” Output Load balanced and no hot spot! Extra work to compute parallel prefix for merge
“CANAL” Output – Serial Inner Loop Can still work well in cases with enough work – even with high degree vertices. Less total work than previous (nodep) Serial inner loop is a scalability risk
Current Performance Results Environment variable throttles back the number of streams for 128P scaling
MTGL/Qthreads PageRank On Niagara Seconds Threads The MTGL Is Having An Interdisciplinary Impact • Algorithms/architectures/visualization integration • Sandia architects profiled MTGL to predict performance on XMT • Titan visualization framework uses MTGL • Qthreads/MTGL → X-caliber driver application • Scalable facility location on MTA-2 • Based on expertise gained in EPA sensor placement WFO project • Applications to community detection, sensor placement, …
Impact of HPC Informatics Activities • LDRD • The Networks Grand Challenge LDRD is building on MTGL’s success • Industry • 2005 WFO project helped justify the Cray XMT • Scholarly community • 3 algorithms track papers 1st MTAAP (2007) • Keynote talk at 2nd MTAAP (2008) • IEEE CiSE Special Issue on Combinatorial Computing • DIMACS shortest paths challenge • Indiana University collaboration: Parallel Processing Letters, BGL refactor
Acknowledgements MultiThreading Background Simon Kahan (Google (formerly Cray)) Petr Konecny (Google (formerly Cray)) MultiThreading/Distributed Memory Comparisons Kristyn Maschhoff (Cray) Bruce Hendrickson (1415) Douglas Gregor (Indiana U.) Andrew Lumsdaine (Indiana U.) MTGL Algorithm Design and Development Vitus Leung (1415) Kamesh Madduri (Georgia Tech.) William McLendon (1423) Cynthia Phillips (1412) MTGL Integration Brian Wylie (1424) Kyle Wheeler (Notre Dame) Brian Barrett (1422)