slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
ARCHITECTURE PERFORMANCE EVALUATION Matthew Jacob PowerPoint Presentation
Download Presentation
ARCHITECTURE PERFORMANCE EVALUATION Matthew Jacob

play fullscreen
1 / 81

ARCHITECTURE PERFORMANCE EVALUATION Matthew Jacob

148 Views Download Presentation
Download Presentation

ARCHITECTURE PERFORMANCE EVALUATION Matthew Jacob

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. ARCHITECTURE PERFORMANCE EVALUATION Matthew Jacob SERC, Indian Institute of Science, Bangalore

  2. Architecture Performance Evaluation 1. Introduction: Modeling, Simulation 2. Benchmark programs and suites 3. Fast simulation techniques 4. Analytical modeling

  3. Evaluating Computer Systems: When? System not available • Designer: During design • Administrator: Before purchase • Administrator: While tuning/configuring • User: In deciding which system to use System available

  4. Performance Evaluation • Performance measurement • Performance modeling

  5. Performance Evaluation. • Performance measurement • Time, space, power … • Using hardware or software probes • Example: Pentium hardware performance counters • Performance modeling

  6. Performance Evaluation.. • Performance measurement • Time, space, power … • Using hardware or software probes • Example: Pentium hardware performance counters • Performance modeling • Model • Representation of the system under study • A simplifying set of assumptions about how it behaves • Interacts with the outside world • Changes with time through the interactions between its own components

  7. Performance Evaluation…. • Performance measurement • Time, space, power … • Using hardware or software probes • Example: Pentium hardware performance counters • Performance modeling • Kinds of Models • Physical or scale model • Analytical model: Using mathematical equations • Simulation model: Computer based approach; using a computer program to mimic behaviour of system We will first look at Simulation, then at Analytical Modeling

  8. Simulation • Imitation of some real thing, state of affairs, or process (Wikipedia) • Using a system model instead of the actual physical system • The act of simulating something generally entails representing certain key characteristics or behaviours of a selected physical or abstract system • State of the system

  9. State • State of a system • at a moment in time • a function of the values of the attributes of the objects that comprise the system • Example: Consider a coffee shop, where there is a cashier and a coffee dispenser • State can be described by (Number of customers at Cashier, Number of Customers at Coffee dispenser)

  10. State Transition Diagram Customer Departs from System (0,0) (1,0) (0,1) New Customer Arrives (2,0) (1,1) (3,0) • Change of state occurs due to 2 kinds of events • Arrival or Departure of a customer • Can label each state transition arc as A or D

  11. Event • An incident or situation which occurs in a particular place during a particular interval of time • Example: Cashier is busy between times t1 and t2 time 0 t1 t2

  12. Discrete Event • An incident or situation which occurs at a particular instant in time • Example: Cashier becomes busy at time t1 • System state only changes instantaneously at such moments in time • Discrete Event System Model • States • Discrete events and corresponding state changes time 0 t1

  13. Discrete Event Simulation Involves keeping track of • System state • Pending events • Each event has an associated time (Event type, Time) • Simulated time (Simulation Clock)

  14. The DES Algorithm Variables: SystemState, SimnClock, PendingEventList Initialize variables Insert first event into PendingEventList while (not done) { • Delete event E with lowest time t from PendingEventList • Advance SimnClock to that time t • Update SystemState by calling event handler of event E }

  15. Example: Cashier at Coffee Shop • Events? • State? • Event Handlers?

  16. Example: Cashier at Coffee Shop • Events? • Arrival of customer, Departure of customer • State? • boolean CashierBusy? • queue CashQueue • Info in each queue item: arrival time of that customer • Operations: EnQueue, DeQueue, IsEmpty • Keeping track of properties of interest • e.g., Cashier utilization, Average wait time in cash queue • Event Handlers?

  17. Example: Handler for Arrival (time t) if (CashierBusy?){ EnQueue(CashQueue, t ) } else { CashierBusy? = TRUE TimeCashierBecameBusy = t NumThroughQueue++ ScheduleEvent(D, t + SERVICETIME) }

  18. Example: Handler Departure (time t) if (IsEmpty(CashQueue)){ CashierBusy? = FALSE TotalCashierBusyTime += (t – TimeCashierBecameBusy) } else { next = DeQueue(CashQueue) NumThroughQueue++ TotalTimeInQueue += (t – next.arrivaltime) ScheduleEvent(D, t + SERVICETIME) }

  19. The DES Algorithm Variables: SystemState, SimnClock, PendingEventList Initialize variables Insert first event into PendingEventList while (not done) { • Delete event E with lowest time t from PendingEventList • Advance SimnClock to that time t • Update SystemState by calling event handler of event E }

  20. Architectural Simulation • Example: Simulation of memory system behaviour during execution of a given program • Objective: Average memory access time, Number of cache hits, etc. • At least 3 different ways to do this

  21. Architectural Simulation. • Trace Driven Simulation • Stochastic Simulation • Execution Driven Simulation

  22. Architectural Simulation.. • Trace Driven Simulation • Trace: A log or record of all the relevant events that must be simulated • Example: (R, 0x1279E, 1B), (R, 0xAB7800, 4B),…

  23. Architectural Simulation… • Trace Driven Simulation • Trace: A log or record of all the relevant events that must be simulated • Example: (R, 0x1279E, 1B), (R, 0xAB7800, 4B),… • Stochastic Simulation • Driven by random number generators • Example: Addresses are uniformly distributed between 0 and 232-1; 45% of memory operations are Reads

  24. Architectural Simulation…. • Trace Driven Simulation • Trace: A log or record of all the relevant events that must be simulated • Example: (R, 0x1279E, 1B), (R, 0xAB7800, 4B),… • Stochastic Simulation • Driven by random number generators • Example: Addresses are uniformly distributed between 0 and 232-1; 45% of memory operations are Reads • Execution Driven Simulation • Where you interleave the execution of the program (whose execution is being simulated) with the simulation of the target architecture

  25. Example: SimpleScalar • A widely used execution driven architecture simulator (www.simplescalar.com) • Tool set: compiler, assembler, linker, simulation and visualization tools • Facilitates simulation of real programs on a range of modern processors • Fast functional simulator • Detailed out-of-order issue processor with non-blocking caches, speculative execution, branch prediction, etc. 10 MIPS 1 MIPS How fast are they?

  26. SimpleScalar. Program whose execution is being simulated (MIPS) Emulates execution of the instructions of the program Interleaved with updating architectural state and statistics System calls are executed on the host system where the simulation is running (e.g., P4 Linux) From Austin, Larsen, Ernst, IEEE Computer, Feb 2002

  27. What programs are used? • Performance can vary substantially from program to program • To compare architectural alternatives, it would be good if a standard set of programs was used • This has led to some degree of consensus on what programs to use in architectural studies • Benchmark programs

  28. Kinds of Benchmark Programs • Toy Benchmarks • Factorial, Quicksort, Hanoi, Ackerman, Sieve • Synthetic Benchmarks • Dhrystone, Whetstone • Benchmark Kernels • DAXPY, Livermore loops • Benchmark Suites • SPEC benchmarks

  29. Synthetic Benchmarks: Whetstone • Created in Whetstone Lab, UK, 1970s • Synthetic, originally in Algol 60 • Floating point, math libraries Synthetic Benchmarks: Dhrystone • Pun on Whetstone; Weicker (1984) • Integer performance • “Typical" application mix of mathematical and other operations (string handling)

  30. Kernel Benchmarks: Livermore Loops • Fortran DO loops extracted from frequently used programs at Lawrence Livermore National Labs, USA • To assess floating point arithmetic performance • http://www.netlib.org/benchmark/livermorec • Hydro fragment DO 1 L = 1, Loop DO 1 k = 1, n * 1 X(k) = Q + Y(k) * (R * ZX(k+10) + T * ZX(k+11)) • ICCG excerpt (Incomplete Cholesky Conjugate Gradient) • Inner product • Banded linear equations • Tri-diagonal elimination, below diagonal • General linear recurrence equations

  31. SPEC Benchmark Suites Standard Performance Evaluation Corporation • `Non-profit corporation formed to establish, maintain and endorse a standardized set of relevant benchmarks that can be applied to the newest generation of high-performance computers’ • `Develops suites of benchmarks and also reviews and publishes submitted results from member organizations and other benchmark licensees

  32. SPEC Consortium Members Acer Inc, Action S.A., AMD, Apple Inc, Azul Systems, Inc, BEA Systems, BlueArc, Bull S.A., Citrix Online, CommuniGate Systems, Dell, EMC, Fujitsu Limited, Fujitsu Siemens, Hewlett-Packard, Hitachi Data Systems, Hitachi Ltd., IBM, Intel, ION Computer Systems, Itautec S/A, Microsoft, NEC – Japan, NetEffect, Network Appliance. NVIDIA, Openwave Systems, Oracle, Panasas, Pathscale, Principled Technologies, QLogic Corporation, The Portland Group, Rackable Systems, Red Hat, SAP AG, Scali, SGI, Sun Microsystems, Super Micro Computer, Inc., SWsoft, Symantec Corporation, Trigence, Unisys

  33. SPEC Benchmark Suites … • CPU • Enterprise Services • Graphics/Applications • High Performance Computing • Java Client/Server • Mail Servers • Network File System • Web Servers

  34. Example: SPEC CPU2000 26 Programs with source code, input data sets, makefiles CINT2000 • gzip C Compression • vpr C FPGA Circuit Placement and Routing • gcc C C Programming Language Compiler • mcf C Combinatorial Optimization • crafty C Game Playing: Chess • parser C Word Processing • eon C++ Computer Visualization • perlbmk C PERL Programming Language • gap C Group Theory, Interpreter • vortex C Object-oriented Database • bzip2 C Compression • twolf C Place and Route Simulator

  35. SPEC CPU2000 … CFP2000 • wupwise Fortran 77 Quantum Chromodynamics • swim Fortran 77 Shallow Water Modeling • mgrid Fortran 77 Multi-grid Solver: 3D Potential Field • applu Fortran 77 Parabolic/Elliptic PDEs • mesa C 3-D Graphics Library • galgel Fortran 90 Computational Fluid Dynamics • art C Image Recognition / Neural Networks • equake C Seismic Wave Propagation Simulation • facerec Fortran 90 Face Recognition • ammp C Computational Chemistry • lucas Fortran 90 Number Theory / Primality Testing • fma3d Fortran 90 Finite-element Crash Simulation • sixtrack Fortran 77 Hi Energy Phys Accelerator Design • apsi Fortran 77 Meteorology: Pollutant Distribution

  36. More Recently: SPEC CINT006 • perlbench C PERL Programming Language • bzip2 C Compression • gcc C C Compiler • mcf C Combinatorial Optimization • gobmk C Artificial Intelligence: go • hmmer C Search Gene Sequence • sjeng C Artificial Intelligence: chess • libquantum C Physics: Quantum Computing • h264ref C Video Compression • omnetpp C++ Discrete Event Simulation • astar C++ Path-finding Algorithms • xalancbmk C++ XML Processing

  37. SPEC CFP2006 • bwaves Fortran Fluid Dynamics • gamess Fortran Quantum Chemistry • milc C Physics: Quantum Chromodynamics • zeusmp Fortran Physics/CFD • gromacs C/Fortran Biochemistry/Molecular Dynamics • cactusADM C/Fortran Physics/General Relativity • leslie3d Fortran Fluid Dynamics • namd C++ Biology/Molecular Dynamics • dealll C++ Finite Element Analysis • soplex C++ Linear Programming, Optimization • povray C++ Image Ray-tracing • calculix C/Fortran Structural Mechanics • GemsFDTD Fortran Computational Electromagnetics • tonto Fortran Quantum Chemistry • lbm C Fluid Dynamics • wrf C/Fortran Weather Prediction • sphinx3 C Speech recognition

  38. Problem: SPEC program execution duration • In term of instructions executed • CPU2000 Average: ~300 billion • Simulated at a speed of 1MIPS • Programs to be simulated are getting larger • SPEC CPU2006: increase in program execution length by an order of magnitude • Even more detailed simulation is needed • System level simulation, which takes operating system into account: 1000 times slower than SimpleScalar would take 4 days

  39. Approaches to Address this Problem Purpose of simulation: to estimate program CPI • Use (small) input data so that there is reduced execution time • Don’t simulate entire program execution • Example: Skip initial 1Billion instructions and then estimate CPI by simulating only the next 1Billion instructions • Simulate (carefully) selected parts of program execution on the regular input data • Example: SimPoint, SMARTS

  40. Reference: Wunderlich, Wenisch, Falsafi and Hoe, `SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling’, 30th ISCA (ACM/IEEE International Symposium on Computer Architecture) 2003 The Problem: A lot of computer architecture research is done through simulation Microarchitecture simulation is extremely time consuming

  41. Architecture Conferences • ISCA: International Symposium on Computer Architecture • ASPLOS: ACM Symposium on Architectural Support for Programming Languages and Operating Systems • HPCA: International Symposium on High Performance Computer Architecture • MICRO: International Symposium on Microarchitecture

  42. SMARTS Framework Complete program execution Not time line, but instruction line From Wunderlich et al, 30th ISCA 2003

  43. SMARTS Framework Must simulate more than 1 instruction to estimate CPI Let U be the number of instructions simulated in a sample U, Sampling Unit size, Number of instructions that are simulated in detail in each sample From Wunderlich et al, 30th ISCA 2003

  44. SMARTS Framework . U Must simulate more than 1 instruction to estimate CPI Let U be the number of instructions simulated in a sample U, Sampling Unit size, Number of instructions that are simulated in detail in each sample N, Benchmark length in terms of Sampling Units From Wunderlich et al, 30th ISCA 2003

  45. SMARTS Framework .. Systematic Sampling: Every kth sampling unit is simulated in detail From Wunderlich et al, 30th ISCA 2003

  46. SMARTS Framework … Systematic Sampling: Every kth sampling unit is simulated in detail From Wunderlich et al, 30th ISCA 2003

  47. SMARTS Framework …. Systematic Sampling: Every kth sampling unit is simulated in detail W, Number of instructions that detailed warming is done before each sample is taken From Wunderlich et al, 30th ISCA 2003

  48. SMARTS Framework ….. Systematic Sampling: Every kth sampling unit is simulated in detail W, Number of instructions that detailed warming is done before each sample is taken From Wunderlich et al, 30th ISCA 2003

  49. SMARTS Framework …… Systematic Sampling: Every kth sampling unit is simulated in detail W, Number of instructions that detailed warming is done before each sample is taken Functional Warming: Functional simulation + maintenance of selected microarchitecture state (such as cache hierarchy state, branch predictor state) n, Total number of samples taken From Wunderlich et al, 30th ISCA 2003

  50. Choice of Sample Size U From Wunderlich et al, 30th ISCA 2003