Maximizing System Performance: Essential Factors and Strategies

Performance Week 12 Lecture 1

There are two general measures of performance • The time an individual takes to complete a task – RESPONSE TIME • The number of transactions the system can process in a given time period - THROUGHPUT

Some general observations • Performance is a design criteria from the start – determine the required response times and throughput before design starts • It is very difficult to estimate, but you still need to know what you are trying to achieve • The 80:20 rule applies. • 80% of the transactions use 20% of the components – as always the trick is, which 20% • 20% of the transactions generate 80% of the work

All aspects of the system affect performance. Technical experts concentrate on getting their bit right, and tend to leave performance to the end or to someone else • They pass the parcel • Or demand more iron • Layering manages complexity but adds to processing time. Every interface, every message constructed, passed and parsed, every database session initiated, costs time.

Performance monitoring is a continuous process • DBA and Network manager will have lots of stats to manage their bit • The application should collect stats on the areas defined in response time requirements. This could be: • Identifying the key transactions and queries – the 80% • For each create a statistical record recording transaction type, date, time, user ID and start and end times • This can affect times as well so have a switch to turn it on and off • Analyse the stats in terms the user can understand - a report for each user perhaps

All areas of the system affect performance • User Interface Design • System design • Programming • System architecture • Database implementation • Operating system, Middleware and Server hardware • Network

User Interface Design • Primary objective is making it easy to use and to learn • But efficiency is integral to this • Some points to consider: • Multiple forms take resources and time for the user to re-orientate • Keep the number of processing steps to a minimum • The system should know the user and complete as much of the form as possible, and then allow change

Allow the user to store and reload parameters that are commonly used. Many users have a small number of queries they commonly make, or generate transactions within a specific area • Allow users quick access to functions they commonly use • When the system is checking a completed form, check all parameters and report all mistakes • Minimise the amount of “Pretties” on a form. They take resources, distract the user and don’t help the task at hand. Keep the form clean

System design • Keep it simple! When you are reviewing your dataflow diagrams, if it is complex to you or there is a large number of layers, it is probably wrong • Walk it through with someone you respect – not necessarily someone who knows the area – explaining will show up problem areas • Walk it through with the users – prototype or story board • $1 spent in design, is worth $10 in development or $100 after implementation

The time of the day is important • Most users work 9:00 to 5:00 • Transaction peaks are often between 10:00 and 12:00 and between 2:30 and 4:30 • A system serving multiple time zones has a smoothing effect of this factor • Peaks also occur at week or month end • These have to be designed for

Programming • Design issues apply here as well – including the 80:20 rule and you will not be able to tell which 20% • Most programmers test on small amounts of data and don’t see the inefficient areas of their code • With large databases, poorly written SQL is a common problem. It you are new to this, talk to the DBA. Read the manual sections that apply. Use stored procedures & triggers

System architecture • If multiple servers are even a remote possibility, then the system has to be architected for this • Do all of your estimates over the life of the system and double or treble them • Know the hardware and software platform that the system is to use. Know where the bottlenecks are when the platform scales • Design in middleware if multiple servers are possible • Design components that are logical and do a unit of work • Leave compute intensive work to the end of the day if time is not of the essence

If the application is to be purchased from a major supplier, they probably have it designed for large volumes • If the supplier is a niche player or the market is small, the system may not be designed for scalability but they may not tell you • Before any application is purchased from any supplier, performance & scalability must be evaluated and resolved. Visit existing users and get the specifics, don’t rely on the sales rep. He or she may fudge it. Make sure they use the target platform

Database • The development team develop the ER diagrams and then the conceptual & logical schemas. The DBA usually takes over from there and completes the physical schema • Given that the normalisation process is well known, the database tables should be structured correctly

Business Analyst Programmer Responsibility of people with different skills Application DBMS Query processor Experienced Programmer Indexes DBMS Concurrency control Recovery Database Administrator Operating system (file system) Hardware (disk etc)

Five Basic Principlesof tuning databases • Think globally, fix locally • Partitioning breaks bottlenecks • Start up cost are high, running costs are low • Render unto server what is due unto server • Be prepared for trade-offs

What are indexes? • Like the index in a book, they are tables that point to records, that can be read more quickly than scanning the whole book • They are usually B-trees • The key can be one attribute or made up of multiple attributes • There is usually one or more indexes for each table

If we used the following SQL SELECT name FROM Student WHERE SID = 9912345 Then if we had an index, the query processor would use The B-Tree index to find the record quickly Otherwise the student table would have to be scanned to Find the record with an SID = 9912345

Indexes & usage plans • Index definition is not an intuitive process • Indexes made up of more than one attribute need care to design • More indexes is not the answer – more to update and queries are not necessarily faster • Study query optimisation plans for common queries and test alternatives • Different DBMS will produce different optimisation plans

Databases can be tuned for update or query • If the indexes required for query optimisation affect the update times, consider; • Duplicating some tables or perhaps the whole database and tune them differently • Have minimal indexes on the update tables, and more indexes on the query tables • Update the query tables on a trigger or subsequent event • Flatten or de-normalise the tables in the query database to optimise execution of common queries • I am a believer in de-normalising and carrying some redundancy if that improves performance

Other database issues • Number and size of memory pools given to databases and processes • Consider the order in which data should be held in a table – affects update and query times • Avoid records with very high hit rates – major cause of locking – hot spots • Give programmers a table locking plan

Ensure tables that are often used together and log files are on different devices to avoid head thrash (there are also facilities in DBMS where related records can be held in the same or contiguous pages as another method of overcoming head thrash) • Cause the physical re-organisation of data in the database to ensure free space and to ensure records are held in the correct sequence in contiguous pages

Platform evaluation • The platform consists of: • Operating systems • Middleware (MOM, TP Monitors, Distributed Component services) • Server computers • Usually best evaluated as a unit • Sometimes all or some of the suppliers of these elements are organisation standards • But the precise platform still needs to be specified and evaluated for suitability for the application

How do we specify and evaluate the platform? • Again, with difficulty • Because of the overall complexity of the relationship of these platform components, benchmarking is usually the approach • That is, how many transaction for this application, can the recommended platform process per hour i.e. What is its throughput?

Benchmarks are not easy • At the time the benchmark needs to be done, the application code is usually not written. So we can’t benchmark the actual application. • Setting up quantities of benchmark data, meeting the structure of the new database is a difficult and time consuming task • An alternative is to use TPC benchmarks

What are TPC benchmarks? • The Transaction Processing Council is an independent organisation that prepares and audits benchmarks of combinations of Operating system, DBMS and Server and publishes those benchmarks in a comparative form. • It has been functioning for 10+ years • It specifies a number of benchmarks, related as far as possible to real world situation • It monitors and audits tests by manufacturers to ensure all conditions are met and the results are comparative • Website is www.tpc.org

Four current benchmarks in use • TPC-C an order entry transaction processing benchmark with a primary metric of transaction processed per minute - tpmC • TPC-H a decision support benchmark with a primary metric of Queries per hour - QphH • TPC-R similar to H but allows greater optimisation based on advance knowledge of the queries - QphR • TPC-W a transactional web benchmark with a primary metric of Web Interactions Per second - WIPS

TPC-C • TPC-C simulates an order entry environment • Involves a mix of five transaction types of different complexity • Multiple on-line terminal sessions • Moderate system and application execution time • Significant disk input/output • Transaction integrity (ACID properties) • Non-uniform distribution of data access through primary and secondary keys • Databases consisting of many tables with a wide variety of sizes, attributes, and relationships • Contention on data access and update

TPC-H and TPC-R • illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. • Use variable sized databases • Tests queries submitted sequentially and concurrently

TPC-W • Web benchmark simulating a business orientated transactional web server • Multiple on-line browser sessions • Dynamic page generation with database access and update • The simultaneous execution of multiple transaction types that span a breadth of complexity with contention on data access and update • On-line transaction execution modes • Databases consisting of many tables with a wide variety of sizes, attributes, and relationships • Transaction integrity (ACID properties)

What’s wrong with the TPC tests? • The main criticism is that it does not match the actual application • But it does: • Provide cross platform comparison with different combinations of system software and hardware, in a completely objective manner, • Because it is well documented, the test application allows some comparison with the subject application • Measure the complete system, with the exception of the network, and does not use measures of processor speed MIPs and SPEC rates • Provide a very low cost method of benchmarking

Table 1 SPEC rate tpmC $/tpmC SPARCserver 1000 8-way 10,113 1,079.4 $1,032 HP 9000 H70 3,757 1,290.9 $961 Example • The SPEC rate would suggest the SPARC server 2.7 times better • The tpmC rate suggests the HP has a higher throughput

Issues with some benchmarks • Suppliers can configure their machines to achieve an optimal performance • Many machines use about half the processors their design permits • This suggests that because of the SMP bottleneck on memory buses, the best performance is around half the maximum

End-to-end transmission rate is affected by two factors • The bandwidth on each link • The latency at each switch • Of these latency is becoming the more important • But how can we estimate end-to-end transmission time as part of the estimate for response time?

Determine whether it matters • Isolate the network element and express it as an overhead on transactions executed on the LAN • Carry out tests locally and then across the links that are expected to be a problem • On an Australian Frame relay network with relatively low bandwidth links, we had a network overhead of between 20% and 50% • If the response time is relatively low anyway, say 1-2 seconds, then this overhead may not be noticeable in many systems

Check other applications using the same network – you may not be the first • Talk to the Network Administrator or Supplier, and carry out tests that will indicate any links that may be a problem • Simulate the application as closely as possible and test the network • If the network is complex and the load high then full simulation may be necessary – specialist organisations have test laboratories that can test for sensitivity under load – this can be expensive

Network monitoring has to be continuous • We know that many factors not within our control, are going to affect our end-to-end rates • So continuous monitoring is necessary, both by the Network administrator in general and as part of our application’s response time • Planning and managing networks is a complex and specialised job, so use the experts, but know enough to understand the issues and to discuss solutions

Conclusion • Accurately estimating response time is difficult, but you have to plan to achieve a given response time and throughput rate • Clearly defined requirements are absolutely necessary • Be very specific about what the response time refers to • Ensure all technical experts know what is expected • Over provide for each key element – you will need it • Simulate or test any area that is expected to be a problem • Have fall-back plans

Maximizing System Performance: Essential Factors and Strategies