1 / 23

Software-Hardware Cooperative Memory Disambiguation

Software-Hardware Cooperative Memory Disambiguation. Ruke Huang, Alok Garg, and Michael Huang Department of Electrical & Computer Engineering University of Rochester. Motivation. Hiding long latencies Scaling up of many structures Complex, hard to design Consumes more energy Slower

livia
Télécharger la présentation

Software-Hardware Cooperative Memory Disambiguation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Software-Hardware Cooperative Memory Disambiguation Ruke Huang, Alok Garg, and Michael Huang Department of Electrical & Computer Engineering University of Rochester "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  2. Motivation • Hiding long latencies • Scaling up of many structures • Complex, hard to design • Consumes more energy • Slower • Inefficiency in hardware • Meticulously keep track of all instructions • No prior knowledge of out-of-order execution • Simply cross-compare all loads and stores 16% LQ Size ROB size: 320 SQ size: 48 LQ size: 48 "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  3. Software Assistance • Global information • Statically identify non-conflicting memory accesses • Advantages • Reduced resource pressure • Energy savings • Loads not requiring memory disambiguation • Average 43% dynamic loads in FP Spec applications "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  4. Recent Research • Chrysos and Emer (ISCA’98) • Sethumadhavan et al. (MICRO’03) • Park et al. (MICRO’03) • Baugh and Zilles (PACC’04) • Akkary et al. (MICRO’03) • Gandhi et al. (ISCA’05), etc. Hardware-only: Provisioning, re-occurring overhead Cooperative: Consumption, one-time overhead "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  5. Outline • Cooperative Memory Disambiguation • Framework • Evaluation • Conclusion "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  6. Cooperative Memory Disambiguation- Resource-Effective Approach • 90% dynamic loads do not communicate with in-flight stores • Many loads do not require memory disambiguation resources • Safe loads: Software analyzer can identify them • Can exploit hardware specific information • Hardware resources only for non-safe loads int A[1000], B[1000]; void VecAdd() { for(int i=0; i<1000; i++) A[i] = A[i] + B[i]; } "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  7. Source compiler Compilation Original binary Hardware specific translator ISA Translator Hardware specific internal binary Hardware Extended instruction set Cooperative Memory Disambiguation Framework • Software-hardware Interface • Decoupled ISA (No compatibility obligations) • Software Support • Binary to binary translator - alto (Muth et al.) • Binary analyzer • Identify read-only data loads • Identify other general safe loads • Architectural Support • Light-weight "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  8. Instruction window General Safe Loads … Load Load … Store Branch … … Store … • Scope of parser analysis • Steady state loop • No internal control flow • Limited in-flight instructions • ROB size, store queue size i-2 … … Store … i-1 Simple loop body Load … Store … i Steady state loop execution "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  9. General Safe Loads (Cont.)-Real example from a SPEC FP application 0x120033140: ldl r31, 256(r3) ; prefetch 0x120033144: ldt f21, 0(r3) ; Ld1 0x120033148: lda r27, -2(r27) ; r27 = r27-2 0x12003314c: lda r3, 16(r3) ; r3 = r3+16 0x120033150: ldt f22, -8(r3) ; Ld2 0x120033154: ldt f23, 0(r11) ; Ld3 0x120033158: cmple r27, 0x1, r1 ; 0x12003315c: lda r11, 16(r11) ; r11 = r11+16 0x120033160: ldt f24, -8(r11) ; Ld4 0x120033164: lds f31, 240(r11) ; prefetch 0x120033168: mult f20, f21, f21 ; 0x12003316c: mult f20, f22, f22 ; 0x120033170: addt f23, f21, f21 ; 0x120033174: addt f24, f22, f22 ; 0x120033178: stt f21, -16(r11) ; St1 0x12003317c: stt f22, -8(r11) ; St2 0x120033180: beq r1, 0x120033140 ; 0x120033140: ldl r31, 256(r3) ; prefetch 0x120033144: ldt f21, 0(r3) ; Ld1 0x120033148: lda r27, -2(r27) ; r27 = r27-2 0x12003314c: lda r3, 16(r3) ; r3 = r3+16 0x120033150: ldt f22, -8(r3) ; Ld2 0x120033154: ldt f23, 0(r11) ; Ld2 0x120033158: cmple r27, 0x1, r1 ; 0x12003315c: lda r11, 16(r11) ; r11 = r11+16 0x120033160: ldt f24, -8(r11) ; Ld4 0x120033164: lds f31, 240(r11) ; prefetch 0x120033168: mult f20, f21, f21 ; 0x12003316c: mult f20, f22, f22 ; 0x120033170: addt f23, f21, f21 ; 0x120033174: addt f24, f22, f22 ; 0x120033178: stt f21, -16(r11) ; St1 0x12003317c: stt f22, -8(r11) ; St2 0x120033180: beq r1, 0x120033140 ; AddrLd1=_R3+16*i AddrLd2=_R11+16*i AddrSt1=_R11+16*i AddrSt2=_R11+16*i+8 Analysis window: 16 iterations Address range = _R11+(i-16)*16 to _R11+(i-1)*16+8 Ld2 statically determined to be safe Ld1 need run-time evaluation One loop from galgel "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  10. General Safe Loads (Cont.)-Real example from a SPEC FP application New_entry: mark_sq if(r3-r11+8>0) or (r3-r11+264<0) then cset CR0, 1 0x120033144: sldt f21, 0(r3), [CR0] ; Ld1 (safe) 0x12003314c: lda r3, 16(r3) ; r3 = r3+16 0x120033154: sldt f23, 0(r11), [CR_TRUE] ; Ld2 (safe) 0x120033158: cmple r27, 0x1, r1 ; 0x12003315c: lda r11, 16(r11) ; r11 = r11+16 0x120033174: addt f24, f22, f22 ; 0x120033178: stt f21, -16(r11) ; St1 0x12003317c: stt f22, -8(r11) ; St2 Modified Code "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  11. Safe stores • Safe stores • If it does not communicate with future loads • Indirectly discover safe loads • Un-analyzable store • Load is safe if all stores in SQ are safe • Summary of safe load detection • Simple loop body • All stores must be analyzable • Address range calculation … Load (A) … Store1 (UA) … Store2 (A) … Branch … Load (A) … Store1 (UA) … Store2 (A) … Branch … Load (A) ... Loop Body In-flight instructions "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  12. Architectural Support • Safe loads • Boolean condition registers • cset (instruction) • Safe stores • Scope marker • Indirect jumps • Flash-reset all condition registers "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  13. Outline • Cooperative Memory Disambiguation • Framework • Evaluation • Conclusion "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  14. Experimental Setup • Modified SimpleScalar 3.0b simulator • Wattch to estimate dynamic energy consumption • SPEC CPU2000 benchmark suite "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  15. Breakdown of Safe Loads (FP) 97% 43% "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  16. Performance Improvement (FP) 40/48% "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  17. Breakdown of Safe Loads (INT) "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  18. Performance Improvement (INT) "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  19. Energy Savings Floating-point applications Integer applications "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  20. Conclusions • Software assistance improves LSQ efficiency • Detects average 43% loads as safe • Average 10% performance gain • Compiler techniques for optimization of micro-architecture resources • Future work • More powerful static analyzer • Manage other micro-architecture resources • E.g., register file "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  21. Thank you! Questions? "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  22. Support for Coherency Hash Table: 2-bit • Total entries: 512 • Details: http://www.ece.rochester.edu/~mihuang/PAPERS/hpca06tr.pdf Table 1 Table 2 Access bit Invalidation bit "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

  23. Read-Only Data Loads • Alpha COFF binary header • Global pointer (GP) • Read-only sections • Access address calculation • Algorithm - extended constant propagation gp=0x120022000 Read-Only Section Start: 0x120023000 End: 0x120024000 "Software-Hardware Cooperative Memory Disambiguation", Alok Garg, HPCA 2006

More Related