1 / 31

The Return of Synthetic Benchmarks

The Return of Synthetic Benchmarks. January 28, 2008. Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture Department of Electrical & Computer Engineering The University of Texas at Austin. Outline.

finian
Télécharger la présentation

The Return of Synthetic Benchmarks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Return of Synthetic Benchmarks January 28, 2008 Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture Department of Electrical & Computer Engineering The University of Texas at Austin

  2. Outline • The Need for Synthetic Benchmarks • BenchMaker Framework for Benchmark Synthesis • Workload Characteristics Used in Synthesis • Synthetic Benchmark Construction • Evaluation of BenchMaker • Applications • Summary

  3. Benchmark Spectrum Complete Application Code Application Suites e.g. SPEC CPU Kernel Codes e.g. Livermore Loops Synthetic Benchmarks e.g. Dhrystone, Whetstone Microbenchmarks e.g. STREAM Toy Benchmarks e.g. Heap sort Less Development Effort More Scalable More Maintainable Less Representative More Development Effort Less Scalable Less Maintainable More Representative

  4. Focus on Simulation Time Reduction • Statistical Sampling [Conte et al., ICCD’96 ] [Wunderlich et al., ISCA’03] • Representative Sampling [Sherwood et al., ASPLOS’02] • Reduced Input Set [ KleinOsowski, CAN’04] • Statistical Simulation & Synthetic Workloads [Oskin et al., ISCA’00] [ Eeckhout et al., ISPASS’00] [Nussbaum et al., PACT’01] [Bell et al., ICS’05] Benchmark Subsetting [Eeckhout et al., PACT’02] [Vandierendonck et al., CAECW’04] [Phansalkar et al., ISPASS’05] [Eeckhout et al. IISWC’05] • Analytical Modeling [Noonburg et al., MICRO’94] [Karkhanis et al., ISCA’04] • Speedup Simulation[Schnarr et al., ASPLOS’98] [Loh et al., SIGMETRICS’01]

  5. Motivation : Benchmarking Challenges • Using Real-World Applications as Benchmarks Proprietary Nature of Real-World Applications • Single-Point Performance Characterization Application Benchmarks are Rigid • Applications Evolve Faster than Benchmarks Benchmark Suites are Costly to Develop, Maintain, and Upgrade • Studying Commercial Workload Performance Early Design Stage Power/Performance Studies Usefulness of Synthetic Benchmarks Beyond Simulation Time Reduction

  6. Resurgence of Synthetic Benchmarks….. IEEE Computer, August 2003

  7. Outline • The Need for Synthetic Benchmarks • BenchMaker Framework for Benchmark Synthesis • Workload Characteristics Used in Synthesis • Synthetic Benchmark Construction • Evaluation of BenchMaker • Applications • Summary

  8. Workload Synthesis: Central Idea Just 40 workload characteristics

  9. Modeling Real-World Applications Microarchitecture-Independent Workload Profiling Modeling Workload Attributes into Synthetic Workload Experiment Environment Real World Proprietary Workload Workload Profiler Binary Instrumentation OR Simulation Real Hardware Workload Synthesizer Synthetic Benchmark Clone Workload Profile = Workload Attributes + Distribution Of Attribute Values Execution Driven Simulator

  10. Outline • The Need for Synthetic Benchmarks • BenchMaker Framework for Benchmark Synthesis • Workload Characteristics Used in Synthesis • Synthetic Benchmark Construction • Evaluation of BenchMaker • Applications • Summary

  11. Workload Characteristics as ‘Knobs’

  12. Capturing The Essence of Workloads • Attributes to capture inherent workload behavior – Data Locality: Dominant strides of static Load/Store – Control Flow Predictability: Branch transition rate • Modeling Locality & Control Flow Predictability – Data Locality of Integer, Scientific, and Embedded Workloads effectively modeled using circular streams – Replicating transition-rate of static branches

  13. Modeling Data Access Pattern • Identify streams of data references • A Stream? • – Sequence of memory addresses in an arithmetic progression • – Elements of arrays A, B, and C form 3 streams • for( ii = 0; ii < N; ii ++) • A [ii]= B [ii] +C [ii] • 200, 204, 208 .. 320, 324, 328 .. 404, 408, 412 ... • Issuing Sequence :320,404,200,324,408,204…. • Streams are interleaved and may contain noise • 4, 8, 12, 16, 1, 3, 20, 24, 5, 7, 2, 9, 11, 28 …

  14. Extracting Streams • Reference pattern of static Load / Store Instructions – PC-correlated spatial locality -Dependence on address referenced by nearby Ld / St - Programs with pointer chasing codes – PC-correlated temporal locality - Dependence on previous address generated by same Ld / St - Programs with multidimensional arrays • Could static Load / Store instructions be natural sources of streams ? • Profile every static Load / Store instruction –Number of different strides with which it accesses data

  15. Modeling Instruction Level Parallelism Dependency Distance ADD R1, R3,R4 MUL R5,R3,R2 ADD R5,R3,R6 LD R4, (R1) SUB R8,R2,R1 Read After Write Dependency Distance = 3 Measure Distribution of Dependency Distances Upto 1, Upto 2, Upto 4, Upto 8, Upto 16, Upto 32, >32

  16. Modeling Control Flow Predictability • Capture behavior of easy and difficult to predict branches • Inherent program feature that captures branch behavior • Transition Rate [ Haungs et al. HPCA’00 ] # of Taken-Not Taken transitions / # of times executed • Branches with low transition-rate (easier to predict) TTTTTTTTTN, NNNNNNNNNT • Branches with high transition-rate (easier to predict) TNTNTNTNTN • Branches with moderate transition-rate (tougher to predict)

  17. Outline • The Need for Synthetic Benchmarks • BenchMaker Framework for Benchmark Synthesis • Workload Characteristics Used in Synthesis • Synthetic Benchmark Construction • Evaluation of BenchMaker • Applications • Summary

  18. Workload Synthesis (1) Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities A B 1 Big Loop A D A B BR 0.8 0.2 D B C Synthetic Clone Generation A BR BR C 1.0 1.0 D D A BR 0.1 B 0.9 D Workload Profile

  19. Workload Synthesis (2) Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities Memory Access Model (Strides) A B 1 Big Loop A D A B BR 0.8 0.2 D B C Synthetic Clone Generation A BR BR C 1.0 1.0 D D A BR 0.1 B 0.9 D Workload Profile

  20. Workload Synthesis (3) Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities Memory Access Model (Strides) A B 1 Big Loop A D A B BR 0.8 0.2 D B C Synthetic Clone Generation A BR BR C 1.0 1.0 D D A Branching Model – Based on Transition Rate BR 0.1 B 0.9 D Workload Profile

  21. Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities A BR 0.8 0.2 B C BR BR 1.0 1.0 D BR 0.1 0.9 Workload Synthesis (4) Memory Access Model (Strides) A B 1 Big Loop D A B D Synthetic Clone Generation A C D A Branching Model – Based on Transition Rate B D Workload Profile Register Assignment C code with asm & volatile constructs

  22. Outline • The Need for Synthetic Benchmarks • BenchMaker Framework for Benchmark Synthesis • Workload Characteristics Used in Synthesis • Synthetic Benchmark Construction • Evaluation of BenchMaker • Applications • Summary

  23. Evaluation of BenchMaker • SPEC CPU2000, SPECjbb2005, and DBT2 workloads • Validated Sim-Alpha Performance Model of Alpha 21264

  24. Performance Correlation Trade Accuracy for Flexibility – Average Error of 11%

  25. Energy/Power Correlation Average Error of 13%

  26. Outline • The Need for Synthetic Benchmarks • BenchMaker Framework for Benchmark Synthesis • Workload Characteristics Used in Synthesis • Synthetic Benchmark Construction • Evaluation of BenchMaker • Applications • Summary

  27. Altering Individual Program Characteristics

  28. Interaction of Program Characteristics

  29. Modeling Impact of Benchmark Drift Increase in Code Footprint (hypothetical) Increase in Data Footprint from SPEC CPU95 to SPEC CPU2000 for gcc (Model with 7% accuracy)

  30. Summary • Synthetic Benchmarks to Address Benchmarking Challenges • Constructing Synthetic Benchmarks from Hardware-Independent Characteristics • Applications of Synthetic Benchmarks - Altering Program Characteristics - Studying Interaction of Program Characteristics - Modeling Benchmark Drift

  31. Questions? Ajay’s email: ajoshi@ece.utexas.edu

More Related