1 / 14

BFS Optimization in MIC

BFS Optimization in MIC. Jul. 9 2014 Heng LIN PACMAN Group Tsinghua University. Background. Using the framework of graph500 http:// www.graph500.org. Optimization. Graph layout optimize. In edge pair generation phase, remap the id by descending degree of vertex.

todd
Télécharger la présentation

BFS Optimization in MIC

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BFS Optimization in MIC Jul. 9 2014 Heng LIN PACMAN Group Tsinghua University

  2. Background • Using the framework of graph500 • http://www.graph500.org

  3. Optimization • Graph layout optimize. • In edge pair generation phase, remap the id by descending degree of vertex. • In graph construction phase, sort the neighbor by descending degree of each vertex. • Warm up the bfs_tree data structure. • 4.68 -> 11.12 GTEPS

  4. Related Work • Graph500 June 2014 list release. • K computer ranks 1st.

  5. Related Work [1] Traversing Trillions of Edges in Real-time : Graph Exploration on Large-scale Parallel Machines. IPDPS’14 (1.5TTEPS work) [2] Fast and Energy-efficient Breadth-First Search on a Single NUMA System. ISC’14 [3] NUMA-optimized parallel breadth-first search on multicore single-node system. Big Data’13 [4] Parallel distributed breadth first search on GPU. HiPC’13 [5] Highly scalable graph search for the Graph500 benchmark. HPDC’12

  6. Related Work (Checconiet al, IPDPS’14) • Data decomposition-> 1D • Vertex partition among nodes • Neighbor partition by threads in node.

  7. Related Work (Checconiet al, IPDPS’14) • Data structures -> CSR based. • Coarse index for vertex. • Shortcut in edge list.

  8. Related Work (Checconiet al, IPDPS’14) • Search Pruning -> Direction optimization. • Topdown + Bottomup • Load balance -> Split huge vertex.

  9. Related Work (Checconiet al, IPDPS’14) • Algorithm overview

  10. Related Work (Checconiet al, IPDPS’14) • Communication. • Each thread have buffer for every other. • A header contain source thread, buffer size. • 24 bits local part of destination vertex + 24 bits local part of source vertex • Differential encoding scheme. ( 24 + 8 bits)

  11. Related Work (Checconiet al, IPDPS’14)

  12. Related Work (Checconiet al, IPDPS’14)

  13. Related Work (Checconiet al, IPDPS’14)

  14. END.

More Related