1 / 25

Support for Symmetric Shadow Memory in Multiprocessors

Support for Symmetric Shadow Memory in Multiprocessors. Vijay Nagarajan Rajiv Gupta University of California, Riverside. Runtime Monitoring. Applications of monitoring Security DIFT Debugging Memcheck, Redux, OnTrac Performance Speculation Requirements of monitoring

maegan
Télécharger la présentation

Support for Symmetric Shadow Memory in Multiprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support for Symmetric Shadow Memory in Multiprocessors Vijay Nagarajan Rajiv Gupta University of California, Riverside

  2. Runtime Monitoring • Applications of monitoring • Security • DIFT • Debugging • Memcheck, Redux, OnTrac • Performance • Speculation • Requirements of monitoring • Shadow Memory (SM) • Meta-data associated with memory locations • Shadow memory instructions (SMIs) • Instruction for maintenance of meta-data

  3. DIFT: Example • Each word/reg associated with “taint” value • Data from input channels are considered tainted • Flow of tainted data is tracked • Usage of tainted data in “malicious” fashion detected

  4. Shadow Memory Observations • Single vs Multiple Shadow values • DIFT associates one taint value • Other applications associate multiple shadow values • DDG computes dynamic dependence graph on the fly • For each memory word, maintains (instruction, instance) pair that wrote to it last. • Symmetric SMIs • Original stores (loads) associated with shadow stores (loads) • Atomic SMIs • OMI and SMIs must be executed atomically

  5. Atomic SMIs Proc A St1 S St1 St2 S St2 Proc B Ld S Ld Proc A St1 S St1 St2 S St2 Proc A St1 S St1 St2 S St2 Proc B Ld S Ld Proc B Ld S Ld Inconsistent View Atomicity

  6. Robust & Efficient SM • Each SM access involves • Calculating effective and shadow address • Accessing the shadow values • Half-and-Half scheme • Reserve half of virtual space for shadow memory • Efficient SM access • Not Robust [Nethercote and Seward VEE ’07] • Valgrind’s s/w page table like scheme • Robust • Inefficient (Valgrind’s Memcheck causes 22x slowdown) • Need to be efficient and robust!

  7. Research Question • Can we make SMIs and OMIs atomic? • Can we make SM accesses efficient without sacrificing robustness? • Can we do the above with minimal HW support?

  8. Our Approach • Convey atomic block to the processor • Simple ISA support: shadow-start, shadow-end • SMIs implicitly identified • Coupled Coherence • Coherence of SMIs and OMIs are coupled • Enforces the effect of atomicity • OS Support • Couple allocation of original and shadow pages • Efficient addressing without sacrificing robustness

  9. ISA Support EXAMPLE 0. shadow-start // Original load 1. ld reg1, vaddr // 1st shadow load 2. ld reg2, vaddr // 2nd shadow load 3. ld reg3, vaddr 4. shadow-end • Shadow-start / Shadow-end instructions • OMIs and SMIs enclosed • Conveys atomic block to the processor • Guides actions of cache-coherence protocol • Implicitly distinguishing SMIs • First instruction is an OMI • All others with same VA treated as SMIs • Multiple accesses implicitly assumed to access different shadow values

  10. Coupled Coherence • Dependence Mirroring • Dependences among SMIs mirror those of the OMIs • If OMI2  OMI1 then SMI2  SMI1 • Couple coherence enforces this Proc B Ld S Ld Proc A St1 S St1 St2 S St2

  11. Coupled Coherence • Coupled Coherence involves • No Explicit Shadow coherence messages • SMIs do not trigger coherence messages • Shadow stores do not trigger invalidates • Shadow loads do not cause misses • Co-transfer • Data replies of original blocks are piggybacked with shadow blocks • Co-existence • Original blocks and shadow blocks co-exist in the cache • Brought in together • Replaced together

  12. Dependence Mirroring: RAW Proc A Proc B Block ‘B’ Shared shared Exc Inv Proc A send invalidate for B and B’ Shadow Block ‘B’ Proc B send read miss for B and B’ Exc Inv Shared shared Proc A sends blocks B and B’ St S St Ld S Ld

  13. Dependence Mirroring: RAW Proc A Proc B Block ‘B’ Ready bit 0 1 Exc Inv Proc A send invalidate for B and B’ Proc B send read miss for B and B’ shadow-st St Proc A waits until ready bit set Ld Proc A sends blocks B and B’ S St shadow-end S Ld

  14. Dependence Mirroring: WAR Proc A Proc B St1 S St1 Proc A send invalidates Proc B send read miss for B and B’ Ld Proc A sends blocks B and B’ St2 S St2 S Ld

  15. Coupled Coherence • On a cache miss • Original Ld / St • Place read miss for original, shadow block(s) • Write back dirty blocks • Shadow Ld / St • //No coherence events • Shadow-start • Set ready bit to 0 • Shadow-end • Set ready bit to 1

  16. Symmetric/General SM • Symmetric SM • Original loads (stores) accompanied by shadow loads (stores) • General SM • Original load can be accompanied by both shadow loads and stores • Eg. Eraser: Online race detection • Need to enforce shadow coherence for RAR • Typically no coherence events for RAR • Future Work

  17. Addressing Support • Shadow pages allocated adjacent to original pages • Virtual Memory space unaffected • Retains robustness • OS treats them as a single “superpage” • Swapped in and swapped out together • Address Translation • During Address translation add offset to access shadow page • Provides efficiency • No separate TLB for shadow pages Memory TLB OMI Ph.page Ori.Page Shadow Page 1 Shadow Page 2 V.Page Off V.Page Off SMI Shadow Value cnt

  18. Experiments • Implementation in SESC Simulator • Cycle Accurate, targets MIPS architecture • Shadow-start, Shadow-end instructions • Models cache coherence protocol • Coupled Coherence implementation • Bus based protocol • Models basic OS services • Coupled page allocation • Monitoring Applications • DIFT: Detection of security attacks • DDG: Computes Dynamic dependence graph online • Benchmarks • SPLASH-2

  19. Efficiency of SM • Three versions: • SM • Our SM implementation • ISA support • OS support for address translation • Coupled Coherence protocol for atomicity • VAL: serial • Valgrind’s SM support. • Address Translation: involves software page table accesses • Atomicity: Enforced by thread serialization • VAL:lb • Valgrind’s SM support with no atomicity guarantees • Means of comparison of our address translation support

  20. Efficiency of SM: DIFT • VAL:serial causes 41 times overhead on an average • Effect of serialization • SM causes only 7 times overhead • Efficient Address translation + coupled coherence • Even without serialization VAL:lb causes 12 times overhead • With coupled coherence this reduces to 7 times

  21. Efficiency of SM:DDG • VAL:serial causes 78 times overhead on an average • Effect of serialization • SM causes only 23 times overhead • Efficient Address translation + coupled coherence • Even without serialization VAL:lb causes 27 times overhead • With coupled coherence this reduces to 23 times • Effect not as pronounced as in DIFT

  22. Effect of Coupled Coherence • Performance overhead < 0.6% for DIFT and DDG • Total amount of traffic is about the same • Coupled coherence sees more bursts in traffic

  23. Related Work • Enforcing Atomicity • Valgrind [Nethercote et al. PLDI ‘07] through thread serialization • Not efficient • TM [Chung et al. HPCA ‘08] can be used. • Requires additional HW changes • Support for rollback and re-execution. • Address Translation • Valgrind [Nethercote VEE ’07] software page table structure • Proposed application specific optimizations • Still inefficient • Half-and-Half scheme [Qin et al MICRO ’07] • Divides virtual address space • Not Robust

  24. Conclusion • SM used extensively for performing monitoring • Performance • Security • Debugging • Support for improving SM performance • ISA Support • Coupled coherence  atomicity • Coupled allocation  efficient addressing • Significant performance advantage • Future Work • Extend system to not only symmetric SMIs • Look at other techniques for providing atomicity without changes to coherence protocol

  25. Questions?

More Related