1 / 40

ABACUS: A Hardware-Based Software Profiler for Modern Processors

ABACUS: A Hardware-Based Software Profiler for Modern Processors. Sergey Blagodurov • Sergey Zhuravlev • Alexandra Fedorova School of Computing Science. Eric Matthews • Lesley Shannon School of Engineering Science. Simon Fraser University, Vancouver, BC, Canada. Overview.

ilya
Télécharger la présentation

ABACUS: A Hardware-Based Software Profiler for Modern Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ABACUS: A Hardware-Based Software Profiler for Modern Processors Sergey Blagodurov • Sergey Zhuravlev • Alexandra Fedorova School of Computing Science Eric Matthews • Lesley Shannon School of Engineering Science Simon Fraser University, Vancouver, BC, Canada

  2. Overview • Legendary Introduction to ABACUS • Delicious Profiling Units • Epic Conclusion 2

  3. Introduction to ABACUS 3

  4. Introduction to ABACUS 4

  5. Introduction to ABACUS 5

  6. Introduction to ABACUS 6

  7. ABACUS 7

  8. ABACUS ASPLOS rocks! 8

  9. ABACUS 9

  10. Performance comparison • Memory Reuse Profile • ABACUS avg runtime: 48.5seconds • Simics avg runtime: 1 hour 6minutes ABACUS Simics 10

  11. Conclusion • ABACUS is a generic profiler that can be easily integrated into modern processors • It can be used by the O/S to obtain runtime information about a thread’s behaviour to make better thread assignments 11

  12. Thank you! Questions?

  13. Motivation • Future systems will be multi-core and heterogeneous • How does the OS place threads on this architecture? • Characterize thread behaviour • Instruction Mix • Memory Reuse Profile • Effectiveness of pre-fetching • Memory bandwidth utilization 13

  14. Motivation (cont'd) • How are these metrics collected? • Offline analysis • Code Instrumentation • Simulation (e.g., Simics) • Software-based instruction set simulator • Models systems with full OS support 14

  15. Motivation (cont'd) • Why not use current hardware counters? • Architecture-specific • Not all desired metrics provided • Help detect symptoms, not causes • Limited in number and in concurrent use 15

  16. Goal • Create a hardware profiler to collect thread characteristics at runtime • Imposed constraints • External to processor • Minimally invasive • Cycle accurate • OS controllable 16

  17. ABACUS • hArdware-Based Analyzer for the Characterization of User Software • A collection of runtime configurable profiling units • Collects metrics useful for thread placement • Controllable through the O/S 17

  18. Hardware Platform • Proof-of-concept System • LEON3 Sparc v8 Instruction Set Architecture • Single core, single threaded • Test System • OpenSparc Niagara T1 soft processor • 1 to 4 hardware threads • Multi-core Multi-board support 18

  19. Hardware Platform (cont'd) 19

  20. ABACUS 20

  21. External Interface • Bus slave and master modules • Processing required on processor signals • Designed such that only external interface changes with different processor/system 21

  22. Portability • Previously integrated with a LEON3 (Sparc v8 ISA) based system • Differences: • AMBA Advanced High-performance Bus (AHB) vs Processor Local Bus (PLB) • Processor internals 22

  23. Controller • Starts or stops profiling • Can limit profiling to a specific address range • DMA interface for retrieving collected data • Linux device driver support 23

  24. Profiling Units • Operate on one or more processor signals: • Instruction • PC • Cache Reuse Distance • etc. • Store data in a collection of counters 24

  25. Profiling Units (cont'd) • Focus on two dimensional metrics • Gives bigger picture / greater insight • Aim to be as architecture independent as possible 25

  26. Profile Unit • Behaves like a traditional software profiler • Operates on Program Counter Code Space Range Overlap Range Non-Overlap Trace 26

  27. Memory Reuse Unit • Collects a measure of code or data reuse • Utilizes Least Recently Used (LRU) stack • Reuse distance is movement in the LRU stack or a miss • Uses in cache contention management 27

  28. Memory Reuse Unit • Creates histogram of cache reuse pattern • Range: [0, set associativity – 1] or cache miss 4-way set-associative reuse profile Reuse Distance 28

  29. Instruction Mix • Identify current instruction subset in use • Divide instructions into logical categories • Load/Store • Floating Point • Control Flow • Opcode-based table lookup 29

  30. Latency Unit • Break down miss latency into constituent sources • Bus contention • DRAM latency • etc. • For each category create a histogram of latency in cycles 30

  31. Stall Unit • Break down Cycles Per Instruction • Attribute cycles to their sources • Cache miss • Translation Lookaside Buffer (TLB) miss • Floating Point busy stalls • etc. 31

  32. Verification • Run a subset of the SPECCPU2006 benchmarks • Those with memory usage within board specs • Collect metrics with ABACUS and Simics • Profile for a few billion instructions • Limited by Simics performace 32

  33. Test Platform • Proof-of-concept System • Single core, single threaded XUP V2Pro: 90% slice utilization 33

  34. Simulation Platform • Simics System: • Differences: • SPARC v9 ISA (64-bit processor) • Local filesystem vs NFS 34

  35. LEON3 Comparison ABACUS Simics 35

  36. LEON3 Comparison (cont'd) • DC Memory Reuse Profile ABACUS Simics 36

  37. Resource Usage • Default: 2–way LRU Instruction Cache 2–way LRU Data Cache 5 Instruction Types 32bit counters 40bit counters 32bit counters Profile Unit added 37 37

  38. Conclusion • ABACUS is a generic profiler that can be easily integrated into modern processors • It can be used by the O/S to obtain runtime information about a thread’s behaviour to make better thread assignments 38

  39. Future Plans • Move to multi-core/multi-threaded system • Memory reuse distance independent of existing cache implementation • Process tracking • Integrate results into OS scheduler 39

  40. Questions ?

More Related