370 likes | 477 Vues
This presentation focuses on the critical aspects of performance optimization in software architecture, specifically through the analysis of file server operations. It highlights the importance of selecting appropriate algorithms and architectures to enhance the speed of processing requests. By measuring and analyzing throughput, response times, and service demands, we uncover key insights into how to effectively manage server loads. The study emphasizes the need for a balanced configuration to improve service levels, accommodating a higher number of users while maintaining response time standards.
E N D
CSC407: Software ArchitectureSummer 2006Performance Greg Wilson BA 3230 gvwilson@cs.utoronto.ca
Introduction • Getting the right answer is important • Getting the right answer quickly is also important • If we didn’t care about speed, we’d do things by hand • Choosing the right algorithm is part of the battle • Choosing a good architecture is the other part • Only way to tell good from bad is to analyze and measure actual performance
Example: File Server • Dedicated server handing out PDF and ZIP files • One CPU • 4 disks: PDFs on #1 and #2, ZIPs on #3 and #4 • Have to know the question to get the right answer • How heavy a load can it handle? • Would it make more sense to spread all files across all disks?
We Call It Computer Science… • …because it’s experimental • Collect info on 1000 files downloaded in 200 sec
Summary Statistics • Analyze all 1000 downloads in a spreadsheet • Yes, computer scientists use spreadsheets… • We’re justified in treating each type of file as a single class
Modeling Requests • The concurrency level is the number of things of a particular class going on at once • Estimate by adding up total download time for PDF and ZIP files separately, and dividing by the actual elapsed time • NPDF = 731.5/200 = 3.7 • NZIP = 3207.7/200 = 16.1 • Round off: download ratio is 4:1
Measuring Service Demands • What load does each request put on the disk and CPU? • Create N files of various sizes: 10KB, 100KB, 200KB, …, 1GB • Put them on a single-CPU, single-disk machine • That’s doing nothing else • Measure download times • TCPU = 0.1046σ – 0.0604 • Hm… • Tdisk = 0.4078σ + 0.2919
Back To The Data • Use Mean Value Analysis to calculate service demands • Remember to divide disk requirements by 2
Observations • After ~20 users, the server saturates • Maximum throughput for PDF files: • 12 files/sec in original configuration • 5 files/sec in balanced configuration • Maximum throughput for ZIP files: • 4.2 files/sec in original configuration • 6.6 files/sec in balanced configuration
Service Level Agreements • SLA requires average download times of 20 sec (ZIP files) and 7 sec (PDF files) • Original configuration: ZIP threshold reached at approximately 100 users, when PDF download time still only ~3 sec • Balanced configuration: ZIP threshold reached at ~165 users, and PDF download time is 6.5 sec • Balanced configuration is strictly superior
How Did We Do That? • Key concern is quality of service (QoS) • Throughput: transactions/second, pages/second, etc. • Response time • And variation in response time • People would rather wait 10 minutes every day than 1 minute 9 days, and 20 minutes the tenth • Availability • 99.99% available = 4.5 minutes lost every 30 days • That’s not good enough for 911
A Simple Database Server • Circles show resources • Boxes show queues • Throughput and response times depend on: • Service demand: how much time do requests need from resources? • System load: how many requests are arriving per second? CPU disk
Classes of Model • An open class is specified by the rate at which requests arrive • Throughput is an input parameter • A closed class is specified by the size of the customer population • E.g., total number of queries to be processed, or total number of system users • Throughput is an output • Can also have load-dependent and load-independent resources, mixed models, etc.
Values We Can Measure • T: length of observation period • K: number of resources in the system • Bi: total busy time of resource i in observation period • Ai: number of request arrivals for resource i • A0 is total number of request arrivals for whole system • Ci: number of service completions for resource i • C0 is completions for whole system • In steady state for large T, Ai = Ci
Values We Can Calculate • Si: mean service time at resource i (Bi/Ci) • Ui: utilization of resource i (Bi/T) • Xi: throughput of resource i (Ci/T) • In steady state, Xi = Ai = Ci = λ • Vi: average visit count for resource i (Ci/C0)
Utilization Law • Utilization Ui = Bi/T = (Bi/Ci)/(T/Ci) • But Bi/Ci is Si, and T/Ci is just 1/λ • So Ui = λSi • I.e., utilization is the throughput times the service time, which makes sense
Service Demand Law • Service demand Di is the total average time required per request from resource i • Di = UiT/C0 • I.e., fraction of time busy, times total time, over number of requests • But UiT/C0 = Ui/(C0/T) = Ui/ λ • I.e., service demand is utilization over throughput • Ui/X0 = (Bi/T)/(C0/T) = Bi/C0 = ViSi • So service demand is average number of visits times mean service time per visit
Little’s Law • Average number of requests being processed at any time = throughput × average time each request stays in the system • So: • 0.5 requests per second (= throughput) • 10 second response time (= time each request stays in system) • There must be 5 servers
Interactive Response Time Law • S clients accessing a database • Each client thinks for Z seconds between requests • Average database response time is R seconds • If M is the average number of clients thinking, and N is the average number of requests at the database, then S = M+N • Little’s Law applied to clients: M = λZ • Little’s Law applied to database: N = λR • So M+N = S = λ(Z+R) • Or R = S/ λ - Z
The Weakest Link • X0 = Ui/Di 1/Di for all resources • So X0 1/max{Di} • Remember Little's Law: N = RX0 • I.e., number of concurrent transactions is response time throughput • But R is at least the sum of the service demand times • So N (Di) X0 • Or X0 N/(Di) • So X0 min[1/max{Di}, N/(Di)]
Amdahl's Law • Let: • t1 be a program’s runtime on one CPU • tp be its runtime on p CPUs • ß be the algorithm’s serial fraction
…Amdahl's Law • Example: • Want 32 speedup on 64-processor machine • So ß must be 0.984 • I.e., 98% of the code must run in parallel • Ouch • What if only half the code can run in parallel? • s32 is 1.97 • Ouch again
Hockney's Measures • Every pipeline has some startup latency • So characterize pipelines with two measures: • r is the rate on an infinite data stream • n1/2 is the data volume at which half that rate is achieved • Improve real-world performance by: • Increasing throughput • Decreasing latency r n
Some Quotations • Philosophers have only interpreted the world in various ways; the point, however, is to change it. • Karl Marx • You cannot manage what you do not measure. • Bill Hewlett • Measure twice, tune once. • Greg Wilson
A Simple CGI 5.1 browser /var/apache/httpd 5.3 /local/bin/python 3.3 2.7 /site/cgi-bin/app.cgi 1.8 0.7 0.2 disk I/O /usr/bin/psql 0.3
How Did I Get These Numbers? • Shut down everything else on the test machine • Use ps and truss on Unix • sysinternals.org has lots of tools to help you find things • Use a script instead of a browser • Insert timers in Python and recompile • Could wrap in a timing script, but that distorts things • Measure import times in my own script • Rely on PostgreSQL's built-in monitors • Use a profiler
Profiling • A profiler is a tool that can build a histogram showing how much time a program spent where • Can either instrument or sample the program • Both affect the program's performance • The more information you collect, the more distortion there is • Heisenberg's Law • Most can accumulate data over many program runs • Often want to distinguish the first run(s) from later ones • Caching, precompilation, etc.
A Simple CGI Revisited Can't do much about this 0.2 5.1 browser /var/apache/httpd 5.3 1.8 fork/exec is expensive /local/bin/python 3.3 import 0.6 what's going on here? 2.7 /site/cgi-bin/app.cgi 0.9 1.8 waiting out turn at DB 0.7 0.2 disk I/O /usr/bin/psql 0.3 how many transactions? are they one class?
Room for Improvement • Forking a new Python interpreter for each request is expensive • So keep an instance of Python running permanently beside the web server, and re-initialize it for each request • FCGI/SCGI • Tomcat is usually run this way • The ability to do this is one of the reasons VM-based languages won the server wars
…Room for Improvement • Reimporting the libraries is expensive, too • Rely on cached .pyc files • Or rewrite application around a request-handling loop • Modularity is your friend • Tightly-coupled components cannot be tuned independently • On the other hand, machine-independent code has machine-independent performance
After Our Changes was 5.3 0.2 2.6 browser /var/apache/httpd 2.8 0.1 /local/bin/python 2.5 0.6 this has to be the next target 1.9 /site/cgi-bin/app.cgi 0.1 1.8 0.7 0.2 disk I/O /usr/bin/psql 0.3
When Do You Stop? • An optimization problem on its own • Time invested vs. likely performance improvements • Plan A: stop when you satisfy SLAs • Or beat them—always nice to have some slack • Plan B: stop when there are no obvious targets • Flat performance profiles are hard to improve • Plan C: stop when you run out of time • Plan D: stop when performance is "good enough"
Five Timescales • Human activities fall into natural cognitive categories: • Continuous • Sip of coffee • Fresh pot • Buy some more beans • Harvest time • Tuning a well-written application usually just improves its performance within its category • Revolutions happen when things are moved from one category to another