CSE 567M Computer Systems Analysis

CSE 567MComputer Systems Analysis

Text Book • R. Jain, “Art of Computer Systems Performance Analysis,” Wiley, 1991, ISBN:0471503363(Winner of the “1992 Best Computer Systems Book” Award from Computer Press Association”)

Objectives: What You Will Learn • Specifying performance requirements • Evaluating design alternatives • Comparing two or more systems • Determining the optimal value of a parameter (system tuning) • Finding the performance bottleneck (bottleneck identification) • Characterizing the load on the system (workload characterization) • Determining the number and sizes of components (capacity planning) • Predicting the performance at future loads (forecasting).

Basic Terms • System: Any collection of hardware, software, and firmware • Metrics: Criteria used to evaluate the performance of the system. components. • Workloads: The requests made by the users of the system.

Main Parts of the Course • An Overview of Performance Evaluation • Measurement Techniques and Tools • Experimental Design and Analysis

Measurement Techniques and Tools • Types of Workloads • Popular Benchmarks • The Art of Workload Selection • Workload Characterization Techniques • Monitors • Accounting Logs • Monitoring Distributed Systems • Load Drivers • Capacity Planning • The Art of Data Presentation • Ratio Games

Example • Which type of monitor (software or hardware) would be more suitable for measuring each of the following quantities: • Number of Instructions executed by a processor? • Degree of multiprogramming on a timesharing system? • Response time of packets on a network?

Example • The performance of a system depends on the following three factors: • Garbage collection technique used: G1, G2, or none. • Type of workload: editing, computing, or AI. • Type of CPU: C1, C2, or C3. How many experiments are needed? How does one estimate the performance impact of each factor?

Example • The average response time of a database system is three seconds. During a one-minute observation interval, the idle time on the system was ten seconds. Using a queueing model for the system, determine the following: • System utilization • Average service time per query • Number of queries completed during the observation interval • Average number of jobs in the system • Probability of number of jobs in the system being greater than 10 • 90-percentile response time • 90-percentile waiting time

Common Mistakes in Evaluation • No Goals • No general purpose model • Goals  Techniques, Metrics, Workload • Not trivial • Biased Goals • ``To show that OUR system is better than THEIRS'‘ • Analysts = Jury • Unsystematic Approach • Analysis Without Understanding the Problem • Incorrect Performance Metrics • Unrepresentative Workload • Wrong Evaluation Technique

Common Mistakes (Cont) • Overlook Important Parameters • Ignore Significant Factors • Inappropriate Experimental Design • Inappropriate Level of Detail • No Analysis • Erroneous Analysis • No Sensitivity Analysis • Ignoring Errors in Input • Improper Treatment of Outliers • Assuming No Change in the Future • Ignoring Variability • Too Complex Analysis

Common Mistakes (Cont) • Improper Presentation of Results • Ignoring Social Aspects • Omitting Assumptions and Limitations

Checklist for Avoiding Common Mistakes • Is the system correctly defined and the goals clearly stated? • Are the goals stated in an unbiased manner? • Have all the steps of the analysis followed systematically? • Is the problem clearly understood before analyzing it? • Are the performance metrics relevant for this problem? • Is the workload correct for this problem? • Is the evaluation technique appropriate? • Is the list of parameters that affect performance complete? • Have all parameters that affect performance been chosen as factors to be varied?

Checklist (Cont) • Is the experimental design efficient in terms of time and results? • Is the level of detail proper? • Is the measured data presented with analysis and interpretation? • Is the analysis statistically correct? • Has the sensitivity analysis been done? • Would errors in the input cause an insignificant change in the results? • Have the outliers in the input or output been treated properly? • Have the future changes in the system and workload been modeled? • Has the variance of input been taken into account?

Checklist (Cont) • Has the variance of the results been analyzed? • Is the analysis easy to explain? • Is the presentation style suitable for its audience? • Have the results been presented graphically as much as possible? • Are the assumptions and limitations of the analysis clearly documented?

A Systematic Approach to Performance Evaluation • State Goals and Define the System • List Services and Outcomes • Select Metrics • List Parameters • Select Factors to Study • Select Evaluation Technique • Select Workload • Design Experiments • Analyze and Interpret Data • Present Results Repeat

Criteria for Selecting an Evaluation Technique TexPoint fonts used in EMF: AAAAAAA

Three Rules of Validation • Do not trust the results of an analytical model until they have been validated by a simulation model or measurements. • Do not trust the results of a simulation model until they have been validated by analytical modeling or measurements. • Do not trust the results of a measurement until they have been validated by simulation or analytical modeling.

Selecting Performance Metrics

Selecting Metrics • Include: • Performance Time, Rate, Resource • Error rate, probability • Time to failure and duration • Consider including: • Mean and variance • Individual and Global • Selection Criteria: • Low-variability • Non-redundancy • Completeness

Case Study: Two Congestion Control Algorithms • Service: Send packets from specified source to specified destination in order. • Possible outcomes: • Some packets are delivered in order to the correct destination. • Some packets are delivered out-of-order to the destination. • Some packets are delivered more than once (duplicates). • Some packets are dropped on the way (lost packets).

Case Study (Cont) • Performance: For packets delivered in order, • Time-rate-resource  • Response time to deliver the packets • Throughput: the number of packets per unit of time. • Processor time per packet on the source end system. • Processor time per packet on the destination end systems. • Processor time per packet on the intermediate systems. • Variability of the response time  Retransmissions • Response time: the delay inside the network

Case Study (Cont) • Out-of-order packets consume buffers Probability of out-of-order arrivals. • Duplicate packets consume the network resources Probability of duplicate packets • Lost packets require retransmission Probability of lost packets • Too much loss cause disconnection Probability of disconnect

Case Study (Cont) • Shared Resource  Fairness • Fairness Index Properties: • Always lies between 0 and 1. • Equal throughput  Fairness =1. • If k of n receive x and n-k users receive zero throughput: the fairness index is k/n.

Case Study (Cont) • Throughput and delay were found redundant ) Use Power. • Variance in response time redundant with the probability of duplication and the probability of disconnection • Total nine metrics.

Commonly Used Performance Metrics • Response time and Reaction time

Response Time (Cont)

Capacity

Common Performance Metrics (Cont) • Nominal Capacity: Maximum achievable throughput under ideal workload conditions. E.g., bandwidth in bits per second. The response time at maximum throughput is too high. • Usable capacity: Maximum throughput achievable without exceeding a pre-specified response-time limit • Knee Capacity: Knee = Low response time and High throughput

Common Performance Metrics (cont) • Turnaround time = the time between the submission of a batch job and the completion of its output. • Stretch Factor: The ratio of the response time with multiprogramming to that without multiprogramming. • Throughput: Rate (requests per unit of time) Examples: • Jobs per second • Requests per second • Millions of Instructions Per Second (MIPS) • Millions of Floating Point Operations Per Second (MFLOPS) • Packets Per Second (PPS) • Bits per second (bps) • Transactions Per Second (TPS)

Common Performance Metrics (Cont) • Efficiency: Ratio usable capacity to nominal capacity. Or, the ratio of the performance of an n-processor system to that of a one-processor system is its efficiency. • Utilization: The fraction of time the resource is busy servicing requests. Average fraction used for memory.

Common Performance Metrics (Cont) • Reliability: • Probability of errors • Mean time between errors (error-free seconds). • Availability: • Mean Time to Failure (MTTF) • Mean Time to Repair (MTTR) • MTTF/(MTTF+MTTR)

Utility Classification of Metrics

Setting Performance Requirements • Examples: “ The system should be both processing and memory efficient. It should not create excessive overhead” “ There should be an extremely low probability that the network will duplicate a packet, deliver a packet to the wrong destination, or change the data in a packet.” • Problems: Non-Specific Non-Measurable Non-Acceptable Non-Realizable Non-Thorough  SMART

Case Study 3.2: Local Area Networks • Service: Send frame to D • Outcomes: • Frame is correctly delivered to D • Incorrectly delivered • Not delivered at all • Requirements: • Speed • The access delay at any station should be less than one second. • Sustained throughput must be at least 80 Mbits/sec. • Reliability: Five different error modes. • Different amount of damage • Different level of acceptability.

Case Study (Cont) • The probability of any bit being in error must be less than 1E-7. • The probability of any frame being in error (with error indication set) must be less than 1%. • The probability of a frame in error being delivered without error indication must be less than 1E-15. • The probability of a frame being misdelivered due to an undetected error in the destination address must be less than 1E-18. • The probability of a frame being delivered more than once (duplicate) must be less than 1E-5. • The probability of losing a frame on the LAN (due to all sorts of errors) must be less than 1%.

Case Study (Cont) • Availability: Two fault modes – Network reinitializations and permanent failures • The mean time to initialize the LAN must be less than 15 milliseconds. • The mean time between LAN initializations must be at least one minute. • The mean time to repair a LAN must be less than one hour. (LAN partitions may be operational during this period.) • The mean time between LAN partitioning must be at least one-half a week.

Measurement Techniques and Tools Measurements are not to provide numbers but insight - Ingrid Bucher • What are the different types of workloads? • Which workloads are commonly used by other analysts? • How are the appropriate workload types selected? • How is the measured workload data summarized? • How is the system performance monitored? • How can the desired workload be placed on the system in a controlled manner? • How are the results of the evaluation presented?

Terminology • Test workload: Any workload used in performance studies.Test workload can be real or synthetic. • Real workload: Observed on a system being used for normal operations. • Synthetic workload: • Similar to real workload • Can be applied repeatedly in a controlled manner • No large real-world data files • No sensitive data • Easily modified without affecting operation • Easily ported to different systems due to its small size • May have built-in measurement capabilities.

Test Workloads for Computer Systems • Addition Instruction • Instruction Mixes • Kernels • Synthetic Programs • Application Benchmarks

Addition Instruction • Processors were the most expensive and most used components of the system • Addition was the most frequent instruction

Instruction Mixes • Instruction mix = instructions + usage frequency • Gibson mix: Developed by Jack C. Gibson in 1959 for IBM 704 systems.

Instruction Mixes (Cont) • Disadvantages: • Complex classes of instructions not reflected in the mixes. • Instruction time varies with: • Addressing modes • Cache hit rates • Pipeline efficiency • Interference from other devices during processor-memory access cycles • Parameter values • Frequency of zeros as a parameter • The distribution of zero digits in a multiplier • The average number of positions of preshift in floating-point add • Number of times a conditional branch is taken

Instruction Mixes (Cont) • Performance Metrics: • MIPS = Millions of Instructions Per Second • MFLOPS = Millions of Floating Point Operations Per Second

Kernels • Kernel = nucleus • Kernel= the most frequent function • Commonly used kernels: Sieve, Puzzle, Tree Searching, Ackerman's Function, Matrix Inversion, and Sorting. • Disadvantages: Do not make use of I/O devices

Synthetic Programs • To measure I/O performance lead analysts ) Exerciser loops • The first exerciser loop was by Buchholz (1969) who called it a synthetic program. • A Sample Exerciser: See program listing Figure 4.1 in the book

Synthetic Programs • Advantage: • Quickly developed and given to different vendors. • No real data files • Easily modified and ported to different systems. • Have built-in measurement capabilities • Measurement process is automated • Repeated easily on successive versions of the operating systems • Disadvantages: • Too small • Do not make representative memory or disk references • Mechanisms for page faults and disk cache may not be adequately exercised. • CPU-I/O overlap may not be representative. • Loops may create synchronizations ) better or worse performance.

Application Benchmarks • For a particular industry: Debit-Credit for Banks • Benchmark = workload (Except instruction mixes) • Some Authors: Benchmark = set of programs taken from real workloads • Popular Benchmarks

Sieve • Based on Eratosthenes' sieve algorithm: find all prime numbers below a given number n. • Algorithm: • Write down all integers from 1 to n • Strike out all multiples of k, for k=2, 3, …, n. • Example: • Write down all numbers from 1 to 20. Mark all as prime: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 • Remove all multiples of 2 from the list of primes: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20

Sieve (Cont) • The next integer in the sequence is 3. Remove all multiples of 3: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20 • 5 >20  Stop • Pascal Program to Implement the Sieve Kernel:See Program listing Figure 4.2 in the book

CSE 567M Computer Systems Analysis

CSE 567M Computer Systems Analysis

Presentation Transcript

CSE 380 Computer Operating Systems

CSE 380 Computer Operating Systems

CSE 380 Computer Operating Systems

CSE 380 Computer Operating Systems

CSE 502: Computer Architecture

CSE 502: Computer Architecture

CSE 380 Computer Operating Systems

CSE 221: Probabilistic Analysis of Computer Systems

CSE 502: Computer Architecture

CSE 422 Computer Networks

CSE 380 Computer Operating Systems

CSE 221: Probabilistic Analysis of Computer Systems

CSE 30264 Computer Networks

CSE 30264 Computer Networks

CSE 3504: Probabilistic Analysis of Computer Systems

CSE 221: Probabilistic Analysis of Computer Systems

CSE 3504: Probabilistic Analysis of Computer Systems

CSE 599F: Formal Verification of Computer Systems

CSE 221: Probabilistic Analysis of Computer Systems

CSE 410: Computer Systems

CSE 410: Computer Systems