Performance Modeling Bo Faser Lockheed Martin Management & Data Systems Intelligence, Surveillance, and Reconnaissance Systems Litchfield Park, Arizona October 18, 2004 firstname.lastname@example.org
Topics • Performance Modeling Overview • Tool “Demonstration” • System Model Examples • Modeling Storage Systems • Project Overview/Discussion
Performance Modeling • What is it? • Performance modeling is a method of characterizing and understanding system behavior in order to predict system performance. • Generally refers to timeline performance.
Performance Modeling • Why do we do it? • We use performance models to make good design decisions • Perform trade studies (processing power vs. bandwidth, disk storage vs. tape storage, microcoding vs. high level programming, vendor A vs. vendor B, everything vs. cost) • Find bottlenecks in the system (load balance) • Predict performance of different input scenarios/test robustness of system (sensitivity studies) • We use performance models to predict the effect of changes to operational systems • Bottom line: We need to ensure that systems will satisfy timeline performance requirements while taking into account system design constraints.
Performance Modeling • Performance modeling is necessary in all program phases • Proposal Phase • Early Program Phase • Preliminary Design Phase • Detailed Design Phase
Proposal Phase • In the proposal phase, performance modeling is used to estimate at a very high level the hardware required to meet the proposal requirements in order to develop the bid. • Roughly estimate the algorithm based on past work and engineering estimates • For example to calculate the number of compute strings: calculate number of FLOPs (floating point operations) for the estimated number of FFTs in the algorithm, divide by the sustained FLOP rate of a known high performance computer scaled to future performance. • This is generally done in a very short period of time in a spreadsheet.
Early Program Phase • In the early part of a program, performance modeling is used to determine the right performance requirements to levy on the system. • Two ways to look at performance • Timelines for specific jobs (e.g., job type A needs to be done in 5 minutes) • Throughput of system (e.g., the system must be able to process 1000 job type A per day)
Preliminary Design Phase • The focus of this discussion will be predominantly hardware (high performance computers, storage and communication) • Usually we have to begin hardware procurement long before we have a well defined algorithm. • We work with the algorithm group to determine the driving algorithm areas. • Within each area, we determine the driving algorithm functions. • In Synthetic Aperture Radar (SAR) data processing, the driver is usually a combination of FFTs and memory management.
Preliminary Design Phase • Develop Algorithm Processing Opscon • Determine the areas of the algorithm that are parallelizable. • Definition: In this discussion, a compute string or just a string is a high performance computer that has main memory and many processors (generally 8-256 processors per string). • Some functions are not parallelizable (for example: processing of a vector depends on the results of the previous vector). • Many functions are parallelizable but the communication required makes parallelization infeasible. • Interprocessor communication becoming less of a big deal as compute strings are designed with large main memory that can (theoretically) be accessed by every processor. • Inter-string communication is usually very slow and is therefore avoided.
Preliminary Design Phase • We need to refine our estimates from the proposal phase where we simply looked at FLOPs and determine clock cycles • Processors generally can perform more than one operation per clock cycle (e.g., 4 multiplies and two adds may be performed simultaneously). Great! If the algorithm can exploit this. • We need to determine how the algorithm can map to the various vendors’ processors. • The modeling of this mapping is getting more difficult as vendors are using operation scheduling and doing out of order operations.
Preliminary Design Phase • Now you have string processing time. • We can do that in a spreadsheet. • There is a lot more to the system latency than just string processing time. • To understand the performance of the system, we need to look at the interaction of the jobs in the system to understand and design for resource contention (queues).
Timeline Performance Models • The type of models that we are talking about are: • Dynamic: Represents system as it changes over time • Stochastic: Has components that are subject to chance • Discrete Event: State of system changes instantaneously at the times that events occur
Timeline Performance Models • Discrete event simulation modeling could theoretically be done by hand • Example: • The system: • 1 Compute String • Interrarrival times of jobs are exponentially distributed with mean 2 minutes • Service Times are exponentially distributed with mean 1 minute • We want to know average latency for a job (time it takes to get through the system) • We will run the simulation through the first three jobs
Timeline Performance Models • Let interarrival times and service times for first three jobs be • The event list
Timeline Performance Models • Generally the systems are too complex and the number of events that we want to observe are too many to do the modeling by hand so we develop computer program simulations (models)
Timeline Performance Models • What does a performance model need? • Main Program to execute model including initialization routine and output report generator • Event List • Simulation Clock • Routines for advancing clock,executing events, and generating random variables • Storage for state variables and statistics (things you want to measure) • Performance modeling can be done in any programming language but there are packages specifically designed to take care of the bookkeeping for us.
COTS Modeling Packages • There are numerous commercial off the shelf modeling packages available • We use Hyperformix Workbench • Discrete Event Simulation Package built on C code • Not particular to any industry modeling (very powerful) • GUI interface • High level models can be built very quickly • Models are easy to understand • Models can be run in “animate” mode • Useful for explanation/demonstration • Useful for debug/test • Models can be complied into executable code that can be run on platform without Workbench tool • Provides easy interface statistics gathering and reporting
SES Workbench Model Main Module Dependence Arc Create modules under model tab
SES Workbench Model Node Palette sub models
SES Workbench Model Response arc (gathers stats) declaration node service node sink node source node
SES Workbench Model Exponentially dist. mean 2 Can track statistics for different job types
SES Workbench Model Queuing discipline
SES Workbench Model Module time unit Right click on module to get specification
SES Workbench Model Will run until no events on event list or 100 minutes (sim time) Whichever is less Will give status every 10 minutes (sim time)
SES Workbench Model Reports status at report interval
SES Workbench Model MM1_model.rpt file reports all statistics that have been gathered Output from response arc
What kind of systems do we model? • Complex End to End Systems • Compute Intensive/Data Intensive Functional Block Diagram Data Capture Data Storage Data Processing Data Distribution Data Archive
What kind of questions are we trying to answer? • How many compute servers do we need to meet our timeline requirements? • What is the processor utilization? • What is the average wait in the processor queue? • What is the average wait in the output queue? • What is the maximum amount of bandwidth the disk needs to provide? • What is the optimal compute server configuration? • Should we overlap I/O and processing? • What is the 90th percentile for end-to-end latency? • How much working memory do I need on my processors? • How much data should I leave on disk/tape? • Which algorithm will run the fastest?
Modeling Process • Performance Requirements • Candidate Hardware/Software Specifications • Design Constraints • System Usage Scenario • System Operational Concept Performance Model • Timeline Performance Predictions • Utilization Predictions
The Marketing Pitch By doing detailed performance modeling you WILL: Avoid making stupid decisions Avoid buying more hardware than you need Avoid buying more bandwidth than you need Have a tool for determining the best operating concept for your system Have the ability to “try before you buy” Give your customers peace of mind that their complex system will meet its requirements.
Why is modeling a cool job? • Understand the system from end to end • Breadth not necessarily depth knowledge • Can influence architecture design and opscon design • Get to “see” the system performance before the system is operational • Get to work with the latest HPC technology.