1 / 41

MPADS: Memory-Pooling-Assisted Data Splitting

Stephen Curial - Xymbiant Systems Inc . Peng Zhao - Intel Corporation J. Nelson Amaral - University of Alberta Yaoqing Gao, Shimin Cui, Raul Silvera, Roch Archambault - IBM Toronto Software Laboratory. MPADS: Memory-Pooling-Assisted Data Splitting. FROM SUN MICROSYSTEMS. Goal. What:

huy
Télécharger la présentation

MPADS: Memory-Pooling-Assisted Data Splitting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stephen Curial - Xymbiant Systems Inc. Peng Zhao - Intel Corporation J. Nelson Amaral - University of Alberta Yaoqing Gao, Shimin Cui, Raul Silvera, Roch Archambault - IBM Toronto Software Laboratory MPADS: Memory-Pooling-Assisted Data Splitting FROM SUN MICROSYSTEMS José Nelson Amaral

  2. ISMM 2008 Goal • What: • Improve spatial locality • Where: • Linked-based data structures • How: • Pooling similar structures together • Grouping same fields from multiple objects together

  3. ISMM 2008 Goal (cont.) • Why: • Because we can • Allow easy-to-write, easy-to-read, easy-to-maintain code to improve performance • What compiler: • IBM XL compiler suite • Limitation: • Needs more precise pointer analysis to benefit from more opportunities

  4. ISMM 2008 Most Relevant Earlier Work • Pool Allocation • Lattner and Adve (CGO 04, PLDI 05) • Reference Affinity • Zhong, Orlovich, Shen, Ding (PLDI 04) • Rabbah and Palem (TECS 03) • Array Reshaping • Zhao, Cui, Gao, Silvera, Amaral (TOPLAS 07)

  5. ISMM 2008 A refreshing outcome “MPADS is not the first implementation of the combination of memory pools and splitting of pointer-based data structures.” “MPADS is still not delivering its full potential on standard benchmarks in the IBM XL compiler.” Reviewer’s Comment: “The technique only worked for Olden, and did nothing for SPECcpu2000 (but the authors get bonus points for being honest about that.)”

  6. ISMM 2008 Student University Class The Cost of Programming Productivity • Easy-to-read and easy-to-maintain code often results in lower runtime performance.

  7. ISMM 2008 Student The Cost of Programming Productivity • Abstraction • Inheritance Person Support Staff Professor

  8. ISMM 2008 Univ. ID Date of Adm Faculty Department Program Classes Enr. Grades Student The Cost of Programming Productivity • Data Encapsulation Name Address Date of Birth Driver Lic. Gender Person Citizenship

  9. ISMM 2008 Univ. ID Date of Adm Faculty Department Program Classes Enr. Grades Student: Name 4 bytes Person: Address 32 bytes 4 bytes Date of Birth 32 bytes 1 byte Driver Lic. 4 bytes 1 byte Gender 3 bytes 2 bytes Citizenship 1 byte 4 bytes 16 bytes 4 bytes 4 bytes A possible data layout

  10. ISMM 2008 8000 Name 0 8008 8 8016 16 8024 24 8032 Address 8064 32 8040 8072 40 8048 8080 48 8056   Date of Birth Dr. Lic. Ge Univ. ID Univ. ID Univ. ID Date of Adm. Date of Adm. Date of Adm. Fa. Fa. De De Progr. Progr. Classes Enr. Classes Enr. Citizenship Grades Grades Data in Memory 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Memory Address Memory Address

  11. ISMM 2008 Assume a Cache Organization • POWER5 Cache Organization • L1 Data Cache: 32 Kbytes, 128-byte cache lines • L2 Cache: 1.44 Mbytes, 128-byte cache lines • L3 Cache: 32 Mbytes, 512-byte cache lines

  12. ISMM 2008 Cache Organization Bytes 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127 0 1 2 3 Cache Lines 4 5 ••• 255

  13. ISMM 2008 Univ.ID Adm. F. D. Prg Class. Grades • Univ.ID Adm. F. D. Prg Class.  Example: A search through the data structures Bytes 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127 0 1 2 3 Cache Lines 4 5 ••• 255 How many Computing Science students are younger than 23 year old?

  14. ISMM 2008 Example: A search through the data structures Bytes 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127 0 Univ.ID Adm. F. D. Prg Class. Grades • Univ.ID Adm. F. D. Prg Class.  1 2 3 Cache Lines 4 5 ••• 255 Student structure: For every 24 bytes loaded, reads either 1 or 5.

  15. ISMM 2008 Example: A search through the data structures Bytes 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127 0 Univ.ID Adm. F. D. Prg Class. Grades • Univ.ID Adm. F. D. Prg Class.  1 2 3 Cache Lines 4 5 Name Address DofB DL. G Citizens. ••• 255 0 32 64 68 72 ••• 127

  16. ISMM 2008 Example: A search through the data structures Bytes 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 ••• 127 0 Univ.ID Adm. F. D. Prg Class. Grades • Univ.ID Adm. F. D. Prg Class.  1 2 3 Cache Lines 4 5 Name Address DofB DL. G Citizens. ••• 255 0 32 64 68 72 ••• 127 Person structure: For every 88 bytes loaded, reads 4.

  17. ISMM 2008 ••• ••• Univ. ID Univ. ID Univ. ID Univ. ID Univ. ID Univ. ID Date of Adm. Date of Adm. Date of Adm. Fa. Fa. Fa. De De De Progr. Progr. Progr. Classes Enr. Classes Enr. Classes Enr. Grades Grades Grades ••• ••• ••• ••• Date of Adm. Date of Adm. Date of Adm. ••• Fa. Fa. Fa. ••• De De De ••• Progr. Progr. Progr. ••• Data Reshaping for Arrays of Structures Student *ListOfStudents; …. ListOfStudents = (Student*)malloc(….); •••

  18. ISMM 2008 ID1 ID2 ID3 Adm1 Adm2 Adm3 Fac1 Fac2 Fac3 Dep1 Dep2 Dep3 Clas1 Clas2 Clas3 1 Grad1 Grad2 2 3 Grad3 Maximal Structure Splitting ID1 Adm1 Fac1 Dep1 Clas1 Grad1 1 Grad2 2 ID2 Adm2 Fac2 Dep2 Clas2 ID3 Adm3 Fac3 Dep3 Clas3 Grad3 3

  19. ISMM 2008 ID1 ID2 ID3 ID4 ID5 Adm1 Adm2 Adm3 Adm4 Adm5 Fac1 Fac2 Fac3 Fac4 Fac5 Dep1 Dep2 Dep3 Dep4 Dep5 Clas1 Clas2 Clas3 Clas4 Clas6 Grad1 Grad2 Grad3 Grad4 Grad5 1 2 3 4 6 ID7 Adm7 Fac7 Dep7 Clas7 Grad7 7 Implementation of Pool Allocation • Intercept mallocs and replace by pool allocation: each structure layout gets its own pool. • If pool is full another pool can be allocated

  20. ISMM 2008 Implementing Pool Allocation • The following types of statements need to be transformed: • Memory allocation statements • Memory reference statements

  21. ISMM 2008 Transforming Memory Allocation Statements • Extended pointer analysis to maintain a set of allocation sites associated with each alias set. • When an alias set is selected for transformation: • Replace each associated allocation with a call to the pool allocation function.

  22. ISMM 2008 Transforming Memory References • Update address calculation for loads and stores: • Uniform splitting --- all fields are the same size • Address calculation is simpler • Restricts application of technique or • Requires memory padding • Non-uniform splitting --- fields of different size • Address calculation is more involved • Can be applied more generally

  23. ISMM 2008 s Non-UniformExample pool_base struct example { type_3 a; /* 3 bytes */ type_7 b; /* 7 bytes */ type_5 c; /* 5 bytes */ }; pool_base = s & 0xF…F000 index = (s – pool_base) / 3 field_base = (3+7)*num_structs_per_pool s->c = *(s + field_base - 3*index + 5*index) How can the compiler find the address to access: s->c s->c = *(s + field_base + (5-3)*index) field_base

  24. ISMM 2008 Data Transformation Safety • How the compiler decide whether it is safe to transform a given structure? • Based on the results of the pointer analysis.

  25. ISMM 2008 Is it safe to transform a given data structure? Structure layout: two structures have the same layout if each field has the same offset and the same length. • Build alias set • If a pointer P may point to the structure • Then all the objects in the points-to set of the alias set of P must have the same layout. Points-to set Data Struct 1 P Q Data Struct 2 Alias set

  26. ISMM 2008 Experimental Results - Micro Benchmarks (Speedup) Power 4 Power 5 Linked List 1A Linked List 2 Binary Tree Linked List 1A Linked List 2 Binary Tree Linked List 1B Linked List 1B Linked List 2 w/ alloc Binary Tree w/ alloc Linked List 2 w/ alloc Binary Tree w/ alloc

  27. ISMM 2008 Experimental Results - Micro Benchmarks(Instruction Count) Power 4 Power 5 Linked List 1A Linked List 2 Binary Tree Linked List 1A Linked List 2 Binary Tree Linked List 1B Linked List 1B Linked List 2 w/ alloc Binary Tree w/ alloc Linked List 2 w/ alloc Binary Tree w/ alloc

  28. ISMM 2008 Experimental Results - Micro Benchmarks(L2 Cache Misses) Power 4 Power 5 Linked List 1A Linked List 2 Binary Tree Linked List 1A Linked List 2 Binary Tree Linked List 1B Linked List 1B Linked List 2 w/ alloc Binary Tree w/ alloc Linked List 2 w/ alloc Binary Tree w/ alloc

  29. ISMM 2008 Experimental Study - Olden & LLU (Speedup) Power 4 Power 5 tsp llu bh tsp llu em3d health bh em3d health power power

  30. ISMM 2008 Active Hardware Prefetch Streams Active Prefetching Streams from Memory to L2 (in POWER4)

  31. ISMM 2008 Related Work • Pool Allocation • Lattner & Adve - PLDI 2005 • Data Structure Analysis • Array Based Structure Splitting • Zhong et al. - PLDI 2004 • Reference affinity / affinity based splitting • Memory Trace • Safe Pointer Based Structure Splitting • Jeon, Shin and Han - CC 2007 • Similar to non-uniform splitting • Affinity based splitting uses static analysis • Regular expression framework • Guarantee Safety with regular expressions

  32. ISMM 2008 Final Remarks • Our Compiler-Research Guiding Principles • Programming productivity • Enables programmers to be efficient • Enables easy-to-write/easy-to-maintain programs • Execution Time Performance • Recover runtime efficiency (time, storage or energy) through • Code analysis • Improved code generation • Knowledge of computer architecture and memory hierarchy

  33. ISMM 2008

  34. ISMM 2008

  35. ISMM 2008 Pointer Analysis Primer • The following statement: int *a = malloc(…); • Creates: • a memory object (A), • a pointer (a), • and a points-to relation (a,A): a A

  36. ISMM 2008 a Alias Analysis Primer: Andersen’s X Steensgaard’s Program: Steensgaard (unification-based): a = &b; S = {(a,b)} b Andersen: S = {(a,b)} b a (Shapiro/Horwitz, PPL97)

  37. ISMM 2008 a Alias Analysis Primer: Andersen’s X Steensgaard’s Program: Steensgaard (unification-based): a = &b; b = &c; S = {(a,b); (b,c)} b c Andersen: S = {(a,b); (b,c)} b c a (Shapiro/Horwitz, PPL97)

  38. ISMM 2008 a Alias Analysis Primer: Andersen’s X Steensgaard’s Program: Steensgaard (unification-based): a = &b; b = &c; a = &d; S = {(a,b); (b,c)} b c Andersen: What should happen in the Steensgaard analysis? S = {(a,b); (b,c); (a,d)} b c a d (Shapiro/Horwitz, PPL97)

  39. ISMM 2008 a Alias Analysis Primer: Andersen’s X Steensgaard’s Program: Steensgaard (unification-based): a = &b; b = &c; a = &d; S = {(a,b); (b,c); (a,d); (d,c)} (b,d) c Andersen: S = {(a,b); (b,c); (a,d)} b c a d (Shapiro/Horwitz, PPL97)

  40. ISMM 2008 a Alias Analysis Primer: Andersen’s X Steensgaard’s Program: Steensgaard (unification-based): a = &b; b = &c; a = &d; d = &e; S = {(a,b); (b,c); (a,d); (d,c)} (b,d) c Andersen: And now? S = {(a,b); (b,c); (a,d)} b c a d (Shapiro/Horwitz, PPL97)

  41. ISMM 2008 a Alias Analysis Primer: Andersen’s X Steensgaard’s Program: Steensgaard (unification-based): a = &b; b = &c; a = &d; d = &e; S = {(a,b); (b,c); (a,d); (d,c); (d,e); (b,e)} (b,d) (c,e) Andersen: S = {(a,b); (b,c); (a,d); (d,e)} b c a e d (Shapiro/Horwitz, PPL97)

More Related