Improving System Performance: Throughput, Response Time, and Parallelism Techniques

Advanced Systems Lab G. Alonso, D. Kossmann Systems Group http://www.systems.ethz.ch

Reading • Read Chapter 4, 5, and 6 in the text book

Throughput and Response Time

Understanding Performance • Response Time • criticalpathanalysis in a taskdependencygraph • „partition“ expensivetasksintosmallertasks • Throughput • queueingnetwork model analysis • „replicate“ resources at bottleneck

Response Times msecs #servers

Whyareresponsetimeslong? • Becauseoperationstakelong • cannottravelfasterthan light • delayseven in „single-user“ mode • possibly, „parallelize“ long-runningoperations • „intra-requestparallelism“ • Becausethereis a bottleneck • contention of concurrentrequests on a resource • requestswait in queuebeforeresourceavailable • addresources to parallelizerequests at bottleneck • „inter-requestparallelism“

CriticalPath • Directed Graph of Tasks and Dependencies • response time = max { length of path } • assumptions: no resourcecontention, no pipelining, ... • Whichtaskswouldyoutry to optimizehere? Start A (3ms) C (9ms) B (1ms) End

QueueingNetwork Models • Graph of resources and flow of requests • Bottleneck=Resourcedefinestput of wholesystem • (analysistechniquesdescribedlater in thecourse) Start Server A (3 req/ms) Server C (20 req/ms) Server B (5 req/ms) End

Forms of Parallelism • Inter-requestParallelism • severalrequestshandled at thesame time • principle: replicateresources • e.g., ATMs • (Independent) Intra-requestParallelism • principle: divide & conquer • e.g., printpieces of document on severalprinters • Pipelining • each „item“ isprocessedbyseveralresources • process „items“ at different resources in parallel • canlead to both inter- & intra-requestparallelism

Inter-requestParallelism Req 1 Resp. 1 Resp. 2 Resp. 3

Intra-requestParallelism Req 1 split Req 1.1 Req 1.2 Req 1.3 Res 1.1 Res 1.2 Res 1.3 merge Response 1

Pipelining (Intra-request) Req 1 split Req 1.1 merge Response 1

Speed-up • Metricforintra-requestparallelization • Goal: test ability of SUT to reduceresponse time • measureresponse time with 1 resource • measureresponse time with N resources • SpeedUp(N) = RT(1) / RT(N) • Ideal • SpeedUp(N) is a linear function • canyouimagine super-linear speed-ups?

Speed Up #servers

Scale-up • Test how SUT scaleswithsize of theproblem • measureresponse time with 1 server, unitproblem • measureresponse time with N servers, N unitsproblem • ScaleUp(N) = RT(1) / RT(N) • Ideal • ScaleUp(N) is a constantfunction • Canyouimagine super scale-up?

Scale Up Exp.: Response Time msecs #servers

Scale Out • Test how SUT behaveswithincreasingload • measurethroughput: 1 server, 1 user • measurethroughput: N servers, N users • ScaleOut(N) = Tput(1) / Tput(N) • Ideal • Scale-OutshouldbehavelikeScale-Up • (oftentermsareusedinterchangeably; butworth-while to noticethedifferences) • Scale-out and down in Cloud Computing • theability of a system to adapt to changes in load • oftenmeasured in $ (or at least involvingcost)

Whyisspeed-up sub-linear? Req 1 split Req 1.1 Req 1.2 Req 1.3 Res 1.1 Res 1.2 Res 1.3 merge Response 1

Whyisspeed-up sub-linear? • Costfor „split“ and „merge“ operation • thosecanbeexpensiveoperations • try to parallelizethem, too • Interference: serversneed to synchronize • e.g., CPUs accessdatafromsamedisk at same time • shared-nothingarchitecture • Skew: worknot „split“ intoequal-sizedchunks • e.g., somepiecesmuchbiggerthanothers • keepstatistics and plan better

Summary • Improve Response Times by „partitioning“ • divide & conquerapproach • Works well in manysystems • ImproveThroughputbyrelaxing „bottleneck“ • addresources at bottleneck • Fundamental limitations to scalability • resourcecontention (e.g., lock conflicts in DB) • skew and poorloadbalancing • Special kinds of experimentsforscalability • speed-up and scale-upexperiments

Metrics and workloads

Metrics and Workloads • Definingmoreterms • Workload • Parameters • ... • ExampleBenchmarks • TPC-H, etc. • Learnmoremetrics and traps

Ingredients of an Experiment (rev.) • System(s) Under Test • The (real) systemswewouldlike to explore • Workload(s) = User model • Typicalbehavior of users / clients of thesystem • Parameters • The „itdepends“ part of theanswer to a perf. question • System parameters vs. Workloadparameters • Test database(s) • For databaseworkloads • Metrics • Definingwhat „better“ means: speed, cost, availability, ...

System under Test • Characterizedbyits API (services) • set of functionswithparameters and resulttypes • Characterizedby a set of parameters • Hardware characteristics • E.g., networkbandwidth, number of cores, ... • Software characteristics • E.g., consistencylevelfor a databasesystem • Observableoutcomes • Droppedrequests, latency, systemutilization, ... • (results of requests / API calls)

Workload • A sequence of requests (i.e., API/servicecalls) • Includingparametersettings of calls • Possibly, correlationbetweenrequests (e.g., sessions) • Possibly, requestsfrom different geographiclocations • Workloadgenerators • Simulate a clientwhichissues a sequence of requests • Specify a „thinking time“ orarrival rate of requests • Specify a distributionforparametersettings of requests • Open vs. Closed System • Number of „active“ requestsis a constantorbound • Closedsystem = fixed #clients, eachclient 0,1 pendingreq. • Warning: Oftenmodel a closedsystemwithoutknowing!

Closed system • Load comes from a limited set of clients • Clients wait for response before sending next request • Load is self-adjusting • System tends to stability • Example: database with local clients

Open system • Load comes from a potentially unlimited set of clients • Load is not limited by clients waiting • Load is not self-adjusting (load keeps coming even if SUT stops) • Tests system’s stability • Example: web server

Parameters • Manysystem and workloadparameters • E.g., size of cache, locality of requests, ... • Challenge is to find the ones that matter • understandingthesystem + commonsense • Computethestandarddeviation of metric(s) whenvarying a parameter • iflow, theparameterisnotsignificant • if high, theparameterissignificant • importantareparameterswhichgenerate „cross-overpoints“ between System A and B whenvaried. • Carefulaboutcorrelations: varycombinations of params

Test Database • Manysystemsinvolve „state“ • Behaviordepends on state of database • E.g., longresponsetimesforbigdatabases • Database is a „workloadparameter“ • Butverycomplex • And withcompleximplications • Criticaldecisions • Distribution of values in thedatabase • Size of database (performancewhengenerating DB) • Ref.: J. Gray et al.: SIGMOD 1994.

PopularDistributions • Uniform • Choose a range of values • Eachvalue of rangeischosenwiththesame prob. • Zipf (self-similarity) • Frequency of valueisinverse proportional to rank • F(V[1]) ~ 2 x F(V[2]) ~ 4x F(V[4]) ... • Skewcanbecontrolledby a parameterz • Default: z=1; uniform: z=0; high zcorresponds to high skew • Independent vs. Correlations • In reality, thevalues of 2 (ormore) dim. arecorrelated • E.g., peoplewhoare good in mathare good in physics • E.g., a carwhichis good in speedis bad in price

Independent Correlated Anti-correlated Multi-dimensional Distributions Ref.: Börszönyi et al.: „TheSkyline Operator“, ICDE 2001.

Metrics • Performance; e.g., • Throughput (successfulrequests per second) • Bandwidth (bits per second) • Latency / Response Time • Cost; e.g., • Cost per request • Investment • Fix cost • Availability; e.g., • Yearlydowntime of a singleclient vs. wholesystem • % droppedrequests (orpackets)

Metrics • How to aggregatemillions of measurements • classic: median + standarddeviation • Whyis median betterthanaverage? • Whyisstandarddeviation so important? • Percentiles (quantiles) • V = Xthpercentileif X% of measurmentsare < V • Max ~ 100th percentile; Min ~ 0th percentile • Median ~ 50th percentile • Percentiles good fit for Service Level Agreements • Mode: Most frequent (probable) value • Whenisthe mode the best metric? (Give an example)

PercentileExample

AmazonExample (~2004) • Amazon lost about 1% of shoppingbaskets • Acceptablebecauseincrementalcost of IT infrastructure to secure all shoppingbasketsmuchhigherthan 1% of therevenue • Someday, somebodydiscoveredthatthey lost the *largest* 1% of theshoppingbaskets • Not okay becausethosearethepremiumcustomers and theynever come back • Result in muchmorethan 1% of therevenue • Be carefulwithcorrelationswithinresults!!!

Wheredoes all this come from? • Real workloads • Usetracesfromexisting (production) system • Use real databasesfromproductionsystem • Syntheticworkloads • Usestandardbenchmark • Inventsomethingyourself • Tradeoffs • Real workloadisalways relevant • Syntheticworkload good to study „cornercases“ • Makesitpossible to vary all „workloadparameters“ • Ifpossible, useboth!

Benchmarks • Specifythewholeexperimentexcept SUT • Sometimesspecifysettings of „systemparameters“ • E.g., configure DBMS to run at isolationlevel 3 • Designedfor „is System A betterthan B“ questions • Report oneortwonumbers as metricsonly • Usecomplexformula to computethesenumbers • Zero oroneworkloadparametersonly • Standardization and notaries to publishresults • Misusedbyresearch and industry • Implementonly a subset • Inventnewmetrics and workloadparameters • Violation of „system parameter“ settings and fineprint

Benchmarks: Good, bad, and ugly • Good • Helpdefine a field: giveengineers a goal • Great formarketing and salespeople • Even ifmisused, greattoolforresearch and teaching • Bad • Benchmarkwarsarenotproductive • Misleadingresults – hugedamageif irrelevant • Ugly • Expensive to becompliant (legal fineprint) • Irreproducibleresultsdue to complexconfigurations • Vendorshavecomplexlicenseagreements (DeWittclause) • Single numberresultfavors „elephants“ • Difficult to demonstrateadvantages in the „niche“

Benchmarks • Conjecture „Benchmarks are a series of tests in order to obtain prearranged results not available on competitive systems.“ (S. Kelly-Bootle) • Corollary „I only trust statistics that I have invented myself.“(folklore)

ExampleBenchmarks • CPU • E.g., „g++“, Ackermann, SPECint • Databases (www.tpc.org) • E.g., TPC-C, TPC-E, TPC-H, TPC-W, ... • Parallel Systems • NAS Parallel Benchmark, Splash-2 • Other • E.g., CloudStone, LinearRoad • Microbenchmarks • E.g., LMBench

SPECint • Goal: Study CPU speed of different hardware • SPEC = Standard Performance Eval. Corp. • www.spec.org • Long history of CPU benchmarks • First version CPU92 • Currentversion: SPECint2006 • SPECint2006 involves 12 tests (all in C/C++) • perlbench, gcc, bzip2, ..., xalancbmk • Metrics • Comparerunning time to „referencemachine“ • E.g., 2000secs vs. 8000secs forgccgivesscore of 4 • Overall score = geometricmean of all 12 scores

SPECintResults • Visit: http://www.spec.org/cpu2006/results/cint2006.html

TPC-H Benchmark • Goal: Evaluate DBMS + hardwarefor OLAP • Find the „fastest“ systemfor a given DB size • Find the best „speed / $“ systemfor a givensize • See to which DB sizesthesystemsscale • TPC-H models a company • Orders, Lineitems, Customers, Products, Regions, ... • TPC-H specifiesthefollowingcomponents • Dbgen: DB generatorwith different scalingfactors • Scalingfactor of DB istheonlyworkloadparameter • Mix of 22 queries and 2 update functions • Executioninstructions and metrics

TPC-H Fineprint • Physical Design • E.g., youmustnotverticallypartition DB • (manyresultsviolatethat, i.e., all columnstores) • Executionrules • Specifyexactlyhow to executequeries and updates • Specifiyexactlywhich SQL variantsareallowed • Results • Specifiesexactlyhow to computemetrics and how to publishresults • Thespecificationisabout 150 pageslong (!)

TPC-H Results • Visit: http://www.tpc.org/tpch/results/tpch_price_perf_results.asp

Microbenchmarks • Goal: „Understandfullbehavior of a system“ • Not good fordecision „System A vs. System B“ • Good forcomponenttests and unittests • Design Principles • Manysmall and simple experiments, manyworkloadparameters • Report all results (ratherthanonebignumber) • Eachexperimenttests a different feature (service) • E.g., tablescan, indexscan, joinfor DB • E.g., specificfunctioncalls, representativeparametersettings • Isolatethisfeature as much as possible • Design requiresknowledge of internals of SUT • Designedfor a specificstudy, benchmarknotreusable

How to improveperformance? • Find bottleneck • Throw additional resources at the bottleneck • Find the new bottleneck • Throw additional resources at the bottleneck • ...

Improving System Performance: Throughput, Response Time, and Parallelism Techniques

Improving System Performance: Throughput, Response Time, and Parallelism Techniques

Presentation Transcript

Advanced Operating Systems

Advanced Systems

Advanced Operating Systems

Advanced Operating Systems

Advanced Systems Lab

Advanced Systems Lab

Advanced Database Systems

Advanced Systems Lab

Advanced terminology systems

Advanced Cognitive Psychology Lab

Advanced Operating Systems

Advanced Cognitive Psychology Lab

Advanced Operating Systems

Advanced Lab introduction

ADVANCED OPERATING SYSTEMS

Advanced Computer Engineering Lab

Advanced terminology systems

Advanced Operating Systems

Communication Systems Lab 3: Advanced Topics in Distributed Systems – 16.05.06 Your Title

Advanced Embedded Systems

Advanced Embedded Systems

Sea Ice

Sea Ice