Download
cycle accurate performance measurement n.
Skip this Video
Loading SlideShow in 5 Seconds..
Cycle Accurate Performance Measurement PowerPoint Presentation
Download Presentation
Cycle Accurate Performance Measurement

Cycle Accurate Performance Measurement

108 Vues Download Presentation
Télécharger la présentation

Cycle Accurate Performance Measurement

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Cycle Accurate Performance Measurement Richard Hough Phillip Jones, Scott Friedman, Roger Chamberlain, Jason Fritts, John Lockwood, and Ron Cytron rh3@wustl.edu http://liquid.arl.wustl.edu/ Funded by NSF Grant ITR-0313203

  2. Outline • Introduction • Motivation • Background • Architecture • Usage • Results • Future Work • Related Work • Conclusion

  3. Introduction – What Are We Doing? • Creating a module for capturing cycle-accurate profiles of hardware events during the runtime of programs on real systems

  4. Introduction – What Are We Doing? • Creating a module for capturing cycle-accurate profiles of hardware events during the runtime of programs on real systems Statistics Module

  5. Introduction – What Are We Doing? Program Bottlenecks Program Runtime • Creating a module for capturing cycle-accurate profiles of hardware events during the runtime of programs on real systems Statistics Module

  6. Introduction – What Are We Doing? Program Bottlenecks Memory Accesses ISA Decoding Program Runtime Cache Hits • Creating a module for capturing cycle-accurate profiles of hardware events during the runtime of programs on real systems Statistics Module

  7. Introduction – What Are We Doing? Program Bottlenecks Memory Accesses ISA Decoding Program Runtime Cache Hits • Creating a module for capturing cycle-accurate profiles of hardware events during the runtime of programs on real systems Statistics Module

  8. Background - FPX • Designed and implemented on the FPX platform • The FPX platform is: • Designed for developing pluggable network circuits • Contains a Virtex 2000e FPGA for design deployment • Possesses a smaller FPGA used as a network interface device • Can potentially operate at gigabit line rates

  9. Background - LEON2 • Developed by Gaisler Research • Sparc-V8 • Open-Source VHDL • Widely used • European Space Agency, etc. • Second in popularity only to the Microblaze

  10. Motivation – Why Not Use Software? • Software Profiling Is: • Inaccurate • Many data points estimated • Time slices not absolute • Profiling affects results • Inefficient • Unreasonable for real-system deployment • Ineffective • Difficult to separate OS overhead

  11. Motivation – Why Not Use Simulation? • Simulation is: • Slow • A simple simulation could require 100X more time than running the program • Bound by the quality of the model • The model used may be inaccurate • Processors often tweaked without updating the documentation [Larus]

  12. Motivation – Why Use FPGAs? • ASICs are expensive • FPGAs provide good blend of cost and accuracy • Software simulation of processors is incredibly slow • Allows for easy prototyping • Test new caching methods, tweak the ISA, etc.

  13. Motivation – Why Put Statsmod In A FPGA? • The Statistics Module Allows You To: • Pull Event Signals from anywhere • Evaluate both software and hardware optimizations • Tweak the architecture • Integrate hardware accelerated modules into software solutions • Adjust the software algorithm • Gather repeatable and reliable results

  14. Architecture – Naïve Solution • Interested in 10 events and counters • Naïve solution implements a counter for each possibility • 100 counters! • Not scalable for large systems

  15. Architecture – Our Solution • Better Approach • Associate counters to events and methods at run time • Covers the problem area, but uses less chip space

  16. Architecture – An In Depth Look

  17. Architecture – Scalability Naïve Approach Address Range Registers Counters Events

  18. Usage

  19. Results – What do we get? • The next few slides contain data from the Linpack benchmark running on the FPGA • Linpack is a FPU intensive benchmark • While the following slides focus on runtime, it is important to remember that the graphs could in principle be of *any* event

  20. Results 323,686,726 Clock Cycles

  21. Results

  22. Results

  23. Results

  24. Future Work – Where can we go? • As of a week ago, the StatsMod was successfully integrated into a Linux 2.6.11 OS running on Leon • Changes have been made to allow a clear separation between Process IDs • OS, background tasks, threads • A device driver allows any program, including the program being profiled, to gather the statistics

  25. Future Work – Where can we go? • Programs could now potentially collect statistics on themselves perform runtime introspection • Adjust operation to conserve power, memory accesses, etc. • Deeper integration could occur at the kernel level to affect scheduler decisions • Adds a new dimension for slicing resources • Network activity, device activity, page faults, etc.

  26. Related Work • SnoopP • Developed by Lesley Shannon and Paul Chow at the University of Toronto • Collects timing characteristics of programs running on a Microblaze processor • Focuses on clock cycles only • Integrated into the EDK

  27. Conclusion In closing, I would like to thank: • Phillip Jones for his hard work and support • Ron Cytron for his mentoring and persistence • Scott Friedman for his work on the web interface • The rest of the Liquid Architecture team • And WISA for the invitation to present

  28. Questions?

  29. Background – Liquid

  30. Usage • Connect to a secure web server controlling the FPGA hardware • Upload the desired binary executable, associated mapfile, and desired programming bitfile • A perl script parses the map file and provides a graphical interface for selecting the desired address ranges and events • Statistic results are tabulated at the end of the program’s execution