470 likes | 611 Vues
This document discusses critical concepts in system performance, focusing on throughput and response time. It explores methods for analyzing performance bottlenecks using critical path analysis in task dependency graphs. The text outlines strategies for partitioning tasks, replicating resources at bottlenecks, and employing different forms of parallelism: inter-request and intra-request parallelism. The piece emphasizes scaling up and scaling out in cloud computing, speed-ups, and the challenges posed by resource contention and skew in workloads. It includes metrics for measuring system efficiency and outlines how to conduct scalability experiments effectively.
E N D
Advanced Systems Lab G. Alonso, D. Kossmann Systems Group http://www.systems.ethz.ch
Reading • Read Chapter 4, 5, and 6 in the text book
Understanding Performance • Response Time • criticalpathanalysis in a taskdependencygraph • „partition“ expensivetasksintosmallertasks • Throughput • queueingnetwork model analysis • „replicate“ resources at bottleneck
Response Times msecs #servers
Whyareresponsetimeslong? • Becauseoperationstakelong • cannottravelfasterthan light • delayseven in „single-user“ mode • possibly, „parallelize“ long-runningoperations • „intra-requestparallelism“ • Becausethereis a bottleneck • contention of concurrentrequests on a resource • requestswait in queuebeforeresourceavailable • addresources to parallelizerequests at bottleneck • „inter-requestparallelism“
CriticalPath • Directed Graph of Tasks and Dependencies • response time = max { length of path } • assumptions: no resourcecontention, no pipelining, ... • Whichtaskswouldyoutry to optimizehere? Start A (3ms) C (9ms) B (1ms) End
QueueingNetwork Models • Graph of resources and flow of requests • Bottleneck=Resourcedefinestput of wholesystem • (analysistechniquesdescribedlater in thecourse) Start Server A (3 req/ms) Server C (20 req/ms) Server B (5 req/ms) End
Forms of Parallelism • Inter-requestParallelism • severalrequestshandled at thesame time • principle: replicateresources • e.g., ATMs • (Independent) Intra-requestParallelism • principle: divide & conquer • e.g., printpieces of document on severalprinters • Pipelining • each „item“ isprocessedbyseveralresources • process „items“ at different resources in parallel • canlead to both inter- & intra-requestparallelism
Inter-requestParallelism Req 1 Resp. 1 Resp. 2 Resp. 3
Intra-requestParallelism Req 1 split Req 1.1 Req 1.2 Req 1.3 Res 1.1 Res 1.2 Res 1.3 merge Response 1
Pipelining (Intra-request) Req 1 split Req 1.1 merge Response 1
Speed-up • Metricforintra-requestparallelization • Goal: test ability of SUT to reduceresponse time • measureresponse time with 1 resource • measureresponse time with N resources • SpeedUp(N) = RT(1) / RT(N) • Ideal • SpeedUp(N) is a linear function • canyouimagine super-linear speed-ups?
Speed Up #servers
Scale-up • Test how SUT scaleswithsize of theproblem • measureresponse time with 1 server, unitproblem • measureresponse time with N servers, N unitsproblem • ScaleUp(N) = RT(1) / RT(N) • Ideal • ScaleUp(N) is a constantfunction • Canyouimagine super scale-up?
Scale Up Exp.: Response Time msecs #servers
Scale Out • Test how SUT behaveswithincreasingload • measurethroughput: 1 server, 1 user • measurethroughput: N servers, N users • ScaleOut(N) = Tput(1) / Tput(N) • Ideal • Scale-OutshouldbehavelikeScale-Up • (oftentermsareusedinterchangeably; butworth-while to noticethedifferences) • Scale-out and down in Cloud Computing • theability of a system to adapt to changes in load • oftenmeasured in $ (or at least involvingcost)
Whyisspeed-up sub-linear? Req 1 split Req 1.1 Req 1.2 Req 1.3 Res 1.1 Res 1.2 Res 1.3 merge Response 1
Whyisspeed-up sub-linear? • Costfor „split“ and „merge“ operation • thosecanbeexpensiveoperations • try to parallelizethem, too • Interference: serversneed to synchronize • e.g., CPUs accessdatafromsamedisk at same time • shared-nothingarchitecture • Skew: worknot „split“ intoequal-sizedchunks • e.g., somepiecesmuchbiggerthanothers • keepstatistics and plan better
Summary • Improve Response Times by „partitioning“ • divide & conquerapproach • Works well in manysystems • ImproveThroughputbyrelaxing „bottleneck“ • addresources at bottleneck • Fundamental limitations to scalability • resourcecontention (e.g., lock conflicts in DB) • skew and poorloadbalancing • Special kinds of experimentsforscalability • speed-up and scale-upexperiments
Metrics and Workloads • Definingmoreterms • Workload • Parameters • ... • ExampleBenchmarks • TPC-H, etc. • Learnmoremetrics and traps
Ingredients of an Experiment (rev.) • System(s) Under Test • The (real) systemswewouldlike to explore • Workload(s) = User model • Typicalbehavior of users / clients of thesystem • Parameters • The „itdepends“ part of theanswer to a perf. question • System parameters vs. Workloadparameters • Test database(s) • For databaseworkloads • Metrics • Definingwhat „better“ means: speed, cost, availability, ...
System under Test • Characterizedbyits API (services) • set of functionswithparameters and resulttypes • Characterizedby a set of parameters • Hardware characteristics • E.g., networkbandwidth, number of cores, ... • Software characteristics • E.g., consistencylevelfor a databasesystem • Observableoutcomes • Droppedrequests, latency, systemutilization, ... • (results of requests / API calls)
Workload • A sequence of requests (i.e., API/servicecalls) • Includingparametersettings of calls • Possibly, correlationbetweenrequests (e.g., sessions) • Possibly, requestsfrom different geographiclocations • Workloadgenerators • Simulate a clientwhichissues a sequence of requests • Specify a „thinking time“ orarrival rate of requests • Specify a distributionforparametersettings of requests • Open vs. Closed System • Number of „active“ requestsis a constantorbound • Closedsystem = fixed #clients, eachclient 0,1 pendingreq. • Warning: Oftenmodel a closedsystemwithoutknowing!
Closed system • Load comes from a limited set of clients • Clients wait for response before sending next request • Load is self-adjusting • System tends to stability • Example: database with local clients
Open system • Load comes from a potentially unlimited set of clients • Load is not limited by clients waiting • Load is not self-adjusting (load keeps coming even if SUT stops) • Tests system’s stability • Example: web server
Parameters • Manysystem and workloadparameters • E.g., size of cache, locality of requests, ... • Challenge is to find the ones that matter • understandingthesystem + commonsense • Computethestandarddeviation of metric(s) whenvarying a parameter • iflow, theparameterisnotsignificant • if high, theparameterissignificant • importantareparameterswhichgenerate „cross-overpoints“ between System A and B whenvaried. • Carefulaboutcorrelations: varycombinations of params
Test Database • Manysystemsinvolve „state“ • Behaviordepends on state of database • E.g., longresponsetimesforbigdatabases • Database is a „workloadparameter“ • Butverycomplex • And withcompleximplications • Criticaldecisions • Distribution of values in thedatabase • Size of database (performancewhengenerating DB) • Ref.: J. Gray et al.: SIGMOD 1994.
PopularDistributions • Uniform • Choose a range of values • Eachvalue of rangeischosenwiththesame prob. • Zipf (self-similarity) • Frequency of valueisinverse proportional to rank • F(V[1]) ~ 2 x F(V[2]) ~ 4x F(V[4]) ... • Skewcanbecontrolledby a parameterz • Default: z=1; uniform: z=0; high zcorresponds to high skew • Independent vs. Correlations • In reality, thevalues of 2 (ormore) dim. arecorrelated • E.g., peoplewhoare good in mathare good in physics • E.g., a carwhichis good in speedis bad in price
Independent Correlated Anti-correlated Multi-dimensional Distributions Ref.: Börszönyi et al.: „TheSkyline Operator“, ICDE 2001.
Metrics • Performance; e.g., • Throughput (successfulrequests per second) • Bandwidth (bits per second) • Latency / Response Time • Cost; e.g., • Cost per request • Investment • Fix cost • Availability; e.g., • Yearlydowntime of a singleclient vs. wholesystem • % droppedrequests (orpackets)
Metrics • How to aggregatemillions of measurements • classic: median + standarddeviation • Whyis median betterthanaverage? • Whyisstandarddeviation so important? • Percentiles (quantiles) • V = Xthpercentileif X% of measurmentsare < V • Max ~ 100th percentile; Min ~ 0th percentile • Median ~ 50th percentile • Percentiles good fit for Service Level Agreements • Mode: Most frequent (probable) value • Whenisthe mode the best metric? (Give an example)
AmazonExample (~2004) • Amazon lost about 1% of shoppingbaskets • Acceptablebecauseincrementalcost of IT infrastructure to secure all shoppingbasketsmuchhigherthan 1% of therevenue • Someday, somebodydiscoveredthatthey lost the *largest* 1% of theshoppingbaskets • Not okay becausethosearethepremiumcustomers and theynever come back • Result in muchmorethan 1% of therevenue • Be carefulwithcorrelationswithinresults!!!
Wheredoes all this come from? • Real workloads • Usetracesfromexisting (production) system • Use real databasesfromproductionsystem • Syntheticworkloads • Usestandardbenchmark • Inventsomethingyourself • Tradeoffs • Real workloadisalways relevant • Syntheticworkload good to study „cornercases“ • Makesitpossible to vary all „workloadparameters“ • Ifpossible, useboth!
Benchmarks • Specifythewholeexperimentexcept SUT • Sometimesspecifysettings of „systemparameters“ • E.g., configure DBMS to run at isolationlevel 3 • Designedfor „is System A betterthan B“ questions • Report oneortwonumbers as metricsonly • Usecomplexformula to computethesenumbers • Zero oroneworkloadparametersonly • Standardization and notaries to publishresults • Misusedbyresearch and industry • Implementonly a subset • Inventnewmetrics and workloadparameters • Violation of „system parameter“ settings and fineprint
Benchmarks: Good, bad, and ugly • Good • Helpdefine a field: giveengineers a goal • Great formarketing and salespeople • Even ifmisused, greattoolforresearch and teaching • Bad • Benchmarkwarsarenotproductive • Misleadingresults – hugedamageif irrelevant • Ugly • Expensive to becompliant (legal fineprint) • Irreproducibleresultsdue to complexconfigurations • Vendorshavecomplexlicenseagreements (DeWittclause) • Single numberresultfavors „elephants“ • Difficult to demonstrateadvantages in the „niche“
Benchmarks • Conjecture „Benchmarks are a series of tests in order to obtain prearranged results not available on competitive systems.“ (S. Kelly-Bootle) • Corollary „I only trust statistics that I have invented myself.“(folklore)
ExampleBenchmarks • CPU • E.g., „g++“, Ackermann, SPECint • Databases (www.tpc.org) • E.g., TPC-C, TPC-E, TPC-H, TPC-W, ... • Parallel Systems • NAS Parallel Benchmark, Splash-2 • Other • E.g., CloudStone, LinearRoad • Microbenchmarks • E.g., LMBench
SPECint • Goal: Study CPU speed of different hardware • SPEC = Standard Performance Eval. Corp. • www.spec.org • Long history of CPU benchmarks • First version CPU92 • Currentversion: SPECint2006 • SPECint2006 involves 12 tests (all in C/C++) • perlbench, gcc, bzip2, ..., xalancbmk • Metrics • Comparerunning time to „referencemachine“ • E.g., 2000secs vs. 8000secs forgccgivesscore of 4 • Overall score = geometricmean of all 12 scores
SPECintResults • Visit: http://www.spec.org/cpu2006/results/cint2006.html
TPC-H Benchmark • Goal: Evaluate DBMS + hardwarefor OLAP • Find the „fastest“ systemfor a given DB size • Find the best „speed / $“ systemfor a givensize • See to which DB sizesthesystemsscale • TPC-H models a company • Orders, Lineitems, Customers, Products, Regions, ... • TPC-H specifiesthefollowingcomponents • Dbgen: DB generatorwith different scalingfactors • Scalingfactor of DB istheonlyworkloadparameter • Mix of 22 queries and 2 update functions • Executioninstructions and metrics
TPC-H Fineprint • Physical Design • E.g., youmustnotverticallypartition DB • (manyresultsviolatethat, i.e., all columnstores) • Executionrules • Specifyexactlyhow to executequeries and updates • Specifiyexactlywhich SQL variantsareallowed • Results • Specifiesexactlyhow to computemetrics and how to publishresults • Thespecificationisabout 150 pageslong (!)
TPC-H Results • Visit: http://www.tpc.org/tpch/results/tpch_price_perf_results.asp
Microbenchmarks • Goal: „Understandfullbehavior of a system“ • Not good fordecision „System A vs. System B“ • Good forcomponenttests and unittests • Design Principles • Manysmall and simple experiments, manyworkloadparameters • Report all results (ratherthanonebignumber) • Eachexperimenttests a different feature (service) • E.g., tablescan, indexscan, joinfor DB • E.g., specificfunctioncalls, representativeparametersettings • Isolatethisfeature as much as possible • Design requiresknowledge of internals of SUT • Designedfor a specificstudy, benchmarknotreusable
How to improveperformance? • Find bottleneck • Throw additional resources at the bottleneck • Find the new bottleneck • Throw additional resources at the bottleneck • ...