Understanding the Top500 List Evolution in Computer Systems Research

Looking at Data Dror Feitelson Hebrew University

Disclaimer No connection to www.lookingatdata.com They have neat stuff – recommended But we’ll just use very simple graphics

The Agenda To promote the collection, sharing, and use of real data about computer systems, in order to ensure that our research is relevant to real-life situations (as opposed to doing research based on assumptions)

Computer “Science” • Mathematics = abstract thought • Engineering = building things • Science = learning about the world • Observation • Measurement • Experimentation • The scientific method is also required for the study of complex computer systems (including complexity arising from humans)

Example 1: The Top500 list

The Top500 List • List of the 500 most powerful supercomputers in the world • As measured by Linpack • Started in 1993 by Dongarra, Meuer, Simon, and Strohmaier • Updated twice a year at www.top500.org • Contains data about vendors, countries, and machine types • Egos and politics in the top spots

November 2002 list

Top500 Evolution: Scalar vs. Vector 1993 – 1998: Number of vector machines plummets: MPPs instead of Crays

Top500 Evolution: Scalar vs. Vector 1998 – 2003: Vector machines stabilize • Earth simulator • Cray X1

Top500 Evolution: Scalar vs. Vector 2003 – 2007: Vectors all but disappear What happened?

Top500 Evolution: Parallelism Most attention typically given to largest machines

Top500 Evolution: Parallelism But let’s focus on the smallest ones: We need more and more proc’s to stay on the list

Top500 Evolution: Parallelism Vectors needed double every 18 months Microproc’s double every 2-3 years So microproc’s are improving faster Implication: in 2008 microprocessors finally closed the performance gap

Historical Perspective Figure from a 1994 report

Top500 Evolution: Parallelism Need more proc’s to stay on list Implication: performance grows faster than Moore’s law

Top500 Evolution: Parallelism Need more proc’s to stay on list = performance grows faster than Moore’s law Since 2003 slope increased due to slowing of micro improvements

Top500 Evolution: Parallelism BTW: largest machines stayed flat for 7 years Everything else grew exponentially Implication: indicates difficulty in usage and control

Example 1: The Top500 list • Example 2: Parallel workload patterns

Parallel Workloads Archive • All large scale supercomputers maintain accounting logs • Data includes job arrival, queue time, runtime, processors, user, and more • Many are willing to share them (and shame on those who are not) • Collection at www.cs.huji.ac.il/labs/parallel/workload/ • Uses standard format to ease use

NASA iPSC/860 trace

Parallelism Assumptions • Large machines have thousands of processors • Cost many millions of dollars • So expected to be used for large-scale parallel jobs (Ok, maybe also a few smaller debug runs)

Parallelism Data

Parallelism Data On all machines 15-50% of jobs are serial Also very many small jobs Implication: bad news: small jobs may block out large jobs Implication: good news: small jobs are easy to pack

Parallelism Data On all machines 15-50% of jobs are serial Also very many small jobs Majority of jobs use power of 2 nodes • No real application requirements • Hypercube tradition • We think in binary Implication: regardless of reason, reduces fragmentation

Size-Runtime Correlation • Parallel jobs require resources in two dimensions: • A number of processors • For a duration of time • Assuming the parallelism is used for speedup, we can expect large jobs to run for less time • Important for scheduling, because job size is known in advance Potential implication: scheduling large jobs first also schedules short jobs first!

Size-Runtime Correlation Data

“Distributional” Correlation • Partition jobs into two groups based on size • Small jobs (less than median) • Large jobs (more than median) • Find distribution of runtimes for each group • Measure fraction of support where one distribution dominates the other

“Distributional” Correlation Implication: large jobs first ≠ short jobs first (maybe even long first)

Example 1: The Top500 list • Example 2: Parallel workload patterns • Example 3: “Dirty” data

Beware Dirty Data • Looking at data is important • But is all data worth looking at? • Errors in data recording • Evolution and non-stationarity • Diversity between different sources • Multi-class mixtures • Abnormal activity • Need to select relevant data source • Need to clean dirty data

Abnormality Example Some users are much more active than others So much so that they single-handedly affect workload statistics • Job arrivals (more) • Job sizes (modal?) Probably not generally representative Implication: we may be optimizing for user 2

Workload Flurries • Bursts of activity by a single user • Lots of jobs • All these jobs are small • All of them have similar characteristics • Limited duration (day to weeks) • Flurry jobs may be affected as a group, leading to potential instability (butterfly effect) • This is a problem with evaluation methodology more than with real systems

Workload Flurries

Instability Example Simulate scheduling of parallel jobs with EASY scheduler Use CTC SP2 trace as input workload Change load by systematically modifying inter-arrival times Leads to erratic behavior

Instability Example Simulate scheduling of parallel jobs with EASY scheduler Use CTC SP2 trace as input workload Change load by systematically modifying inter-arrival times Leads to erratic behavior Removing a flurry by user 135 solves the problem Implication: using dirty data may lead to erroneous evaluation results

Example 1: The Top500 list • Example 2: Parallel workload patterns • Example 3: “Dirty” data • Example 4: User behavior

Independence vs. Feedback • Modifying the offered load by changing inter-arrival times assumes an open system model • Large user population insensitive to system performance • Jobs are independent of each other • But real systems are often closed • Limited user population • New jobs submitted after previous ones terminate • This leads to feedback from system performance to workload generation

Evidence for Feedback Implication: jobs are not independent modifying inter-arrivals is problematic

The Mechanics of Feedback • If users perceive the system as loaded, they will submit less jobs • But what exactly do users care about? • Response time: how long they wait for results • Slowdown: how much longer than expected • Answer needed to create a user model that will react correctly to load conditions

Data Mining • Available data: system accounting log • Need to assess user reaction to momentary condition • The idea: associate the user’s think time with the performance of the previous job • Good performance  satisfied user  continue work session  short think time • Bad performance  dissatisfied user  go home  long think time • “performance” = response time or slowdown

The Data Implication: response time is a much better predictor of user behavior

Predictability = Locality • Predicting the future is good • Avoid constraints of on-line algorithms • Approximate performance of off-line algorithms • Ability to plan ahead • Implies a correlation between events • Application behavior characterized by locality of reference • User behavior characterized by locality of sampling

Locality of Sampling Workload attributes are modeled by a marginal distribution But at different times the distributions may be quite distinct Implication: the notion that more data is better is problematic

Locality of Sampling Workload attributes are modeled by a marginal distribution But at different times the distributions may be quite distinct Implication: the assumption of stationarity is problematic

Locality of Sampling Workload attributes are modeled by a marginal distribution But at different times the distributions may be quite distinct Thus the situation changes with time Implication: locality is required to evaluate adaptive systems

Example 1: The Top500 list • Example 2: Parallel workload patterns • Example 3: “Dirty” data • Example 4: User behavior • Example 5: Mass-count disparity

Variability in Workloads • Changing conditions • locality of sampling • Variability between different periods • Heavy-tailed distributions • Unique “high weight” samples • Samples may be so big that they dominate the workload

File Sizes Example USENET survey by Gordon Irlam in 1993 Distribution of file sizes is concentrated around several KB

File Sizes Example USENET survey by Gordon Irlam in 1993 Distribution of file sizes is concentrated around several KB Distribution of disk space spread over many MB This is mass-count disparity

File Sizes Example Joint ratio of 11/89 89% of files have 11% of bytes, while other 11% of files have 89% of bytes (generalization of 20/80 principle and 10/90 principle)

Understanding the Top500 List Evolution in Computer Systems Research