Week 11

Week 11 Improving Database Performance

Improving Database Performance So far, we have looked at many aspects of designing, creating, populating and querying a database. We have (briefly) explored ‘optimisation’ which is used to ensure that query execution time is minimised In this lecture we are going to look at some techniques which are used to improve performance and availability WHY ? Because databases are required to be available, in many installations and applications, 24 hours a day, 7 days a week, 52 weeks every year - think of the ‘user’ demands in e-business

Improving Database Performance There are many ‘solutions’ - including parallel processors faster processors higher speed communications more memory faster disks more disk units on line higher capacity disks any others ?

Improving Database Performance We are going to look at a technique called ‘clustering’ - an architecture for improving ‘power’ and availability What are the ‘dangers’ to non-stop availability Try these :- • System outages (planned) • Maintenance, tuning • System outages (unplanned) • hardware failure, bugs, virus attacks

Improving Database Performance E-business is not the only focus Businesses are tending to be ‘global organisations’ - remember one of the early lectures ? So what is one of the solutions’ ? In a single word - clustering Clustering is based on the premise that multiple processors can provide better, faster and more reliable processing than a single computer

Improving Database Performance However, as in most ‘simple’ solutions in Information Technology, the problem is in the details How can clustering be achieved ? Which technologies and architectures off the best approach to clustering ? • and, what is the measure, or metric, of ‘best’ ?

Improving Database Performance What are some of the advantages of clustering ? • Improved availability of services • Scalability Clustering involves multiple and independent computing systems which work together as one When one of the independent systems fails, the cluster software can distribute work from the failing system to the remaining systems in the cluster

Improving Database Performance ‘Users’ normally would not notice the difference • They interact with a cluster as if it were a single server - and importantly the resources they require will still be available Clustering can provide high levels of availability

Improving Database Performance What about ‘scalability’ ? Loads will (sometimes) exceed the capabilities which make up the cluster. Additional facilities can be incrementally added to increase the cluster’s computing power and ensure processing requirements are met As transaction and processing loads become established, the cluster (or parts of it) can be increased in size or number

Improving Database Performance Clustering is NOT a ‘new’ concept A company named DEC introduced them for VMS systems in the early 1980’s - about 20 years ago Which firms offer clustering packages now ? IBM, Microsoft and Sun Microsystems

Improving Database Performance What are the different types of Clustering ? There are 2 architectures; • Shared nothing and • Shared disk • In a shared nothing architecture, each system has its own private memory and one or more disks And each server in the cluster has its own independent subset of the data it can work on independently without meeting resource contention from other servers

Improving Database Performance This might explain better:- CPU 1 CPU 2 CPU 3 Memory 1 Memory 2 Memory ..n A Shared Nothing Architecture Interconnection network

Improving Database Performance As you saw on the previous overhead, a shared nothing environment, each system has its own ‘private memory’ and one or more disks And each server in the cluster has its own independent subset of the data it can work on without meeting resource conflicts from other servers The clustered processors communicate by passing messages through a network which interconnects the computers

Improving Database Performance Client requests are automatically directed to the system which owns the particular resource Only one of the clustered systems can ‘own’ and access a particular resource at a time. When a failure occurs, resource ownership can be dynamically transferred to another system in the cluster Theoretically, a shared nothing multiprocessor could scale up to thousands of processors - the processors don’t interfere with one another - no resources are shared

Improving Database Performance CPU 1 CPU 2 CPU …n Memory 1 Memory 2 Memory ..n A Shared All Environment Interconnecting Network

Improving Database Performance In a ‘shared all’ environment, you noticed that all of the connected systems shared the same disk devices Each processor has its own private memory, but all the processors can directly access all the disks In addition, each server has access to all the data

Improving Database Performance In this arrangement, ‘shared all’ clustering doesn’t scale as effectively as shared-nothing clustering for small machines. All the nodes have access to the same data, so a controlling facility must be used to direct processing to make sure that all nodes have a consistent view of the data as it changes Attempts by more than one nodes to update the same data need to be prohibited. This can cause performance and scalability problems (similar to the concurrency aspect)

Improving Database Performance Shared-all architectures are well suited to the large scale processing found in main frame environments Main frames are large processors capable of high work loads. The number of clustered PC’s and midrange processors, even with the newer, faster processors, which would equal the computing power from a few clustered mainframes, would be high - about 250 nodes.

Improving Database Performance This chart might help : Shared DiskShared Nothing Quick adaptability to High possibility of changing workloads simpler, cheaper hardware High availability Almost unlimited scalability Data need not be Data may need to be partitioned partitioned across the cluster

Improving Database Performance There is another technique - InfiniBand architecture which can reduce bottlenecks in the Input/Output level, and which has a further appeal of reducing the cabling, connector and administrative overheads of the database infrastructure It is an ‘intelligent’ agent - meaning software. Its main attraction is that it can change the way information is exchanged in applications. It removes unnecessary overheads from a system

Improving Database Performance Peripheral Interconnect (PCI) remains a bus-based system - this allows the transfer of data between 2 (yes, 2!) of the members at a time Many PCI buses cause bottlenecks in the bridge to the memory subsystems. Newer versions of PCI allowed only a minor improvement - only 2 64 bit 66MKz adapters on the bus A bus allows only a small number of devices to be interconnected, is limited in its electrical paths, and cannot adapt to meet high availability demands

Improving Database Performance A newer device, called a fabric, can scale to thousands of devices with parallel communications between each node A group formed in 1999 (Intel/Microsoft, IBM, Compaq and Sun Future IO) to form the InfiniBand Trade Association Their objective was to develop and ensure one standard for communication interconnects and system I/O. One of their early findings was that replacing a bus architecture by fabric was not the full story - or solution

Improving Database Performance Their early solution needed to be synchronised with software changes. If not, a very high speed network could be developed, but actual application demands would not be met InfinBand is comprised of 1 Host Channel Adapters (HCA) 2. Target Channel Adapters (TCA) 3. Switches 4. Routers 1 and 2 define end nodes in the fabric, 3 and 4 are interconnecting devices

Improving Database Performance The HCA (host channel adapter) manages a connection a connection and interfaces with a fabric The TCA (target channel adapter) delivers required data such as a disk interface which replaces the existing SCSI interface An InfiniBand switch links HCAs and TCAs into a network The router allows the interface to other networks AND the translation of legacy components and networks. It can be used for MAN and WAN interfaces.

Improving Database Performance InfiniBand link speeds are identified in multiples of the base 1x - (0.5 Gb full duplex link - 0.25Gb in each direction) Other defined sizes are $x (2 Gb full duplex) and 12x (6Gb full duplex). Just for size : A fast SCSi adapter could accommodate a throughput rate of 160Mb per second A single InfinBand adapter 4x can deliver between 300 and 500 Mb per second

Improving Database Performance End Nodes Routers Switch

Improving Database Performance So far we have looked at improving database performance by 1. The use of ‘shared-all’ or ‘shared-nothing’ architectures 2. Implementing an InfiniBand communications interface and network facility

Improving Database Performance Now we are going to look at another option It’s known as the ‘Federated Database’ environment So, what is a ‘Federated Database’ ? Try this: It is a collection of data stored on multiple autonomous computing systems connected to a network. The user or users is presented with what appears to be one integrated database

Improving Database Performance A federated database presents ‘views’ to users which look exactly the same as views of data from a centralised database This is very similar to the use of the Internet where many sites have multiple sources - but the user doesn’t see them In a federated database approach, each data resource is defined (as you have done) by means of table schemas, and the user is able to access and manipulate data

Improving Database Performance The ‘queries’ actually access data from a number of databases at a number of locations One of the interesting aspects of a federated database is that the individual databases may consist of any DBMS (IBM, Oracle, SQL Server, possibly MS Access) and run on any operating system (Unix, VMS, MS-XP) and on different hardware ( Hewlett-Packhard servers, Unisys, IBM, Sun Microsystems …..

Improving Database Performance However, there a some reservations : Acceptable performance requires the inclusion of a smart optimiser using the cost-based technique which has intelligence about both the distribution (perhaps a global data dictionary) and also the different hardware and DBMS at each accessed site.

Improving Database Performance Another attractive aspect of the federated arrangement is that additional database servers can be added to the federation at any time - and servers can also be deleted. As a general comment, any multisource database can be implemented in either a centralised or federated architecture In the next few overheads, there are some comments on this

Improving Database Performance The centralised approach has some disadvantages, the major one being that investment is large, and the return on investment may take many months, or years The process includes these steps: 1. Concept development and data model for collecting data needed to support business decisions and processes 2. Identification of useful data sources (accurate, timely, comprehensive, available …)

Improving Database Performance 3. Obtain a database server platform to support the database (and probably lead to data warehousing). 4. Capture data, or extract data, from the source(s) 5. Clean, format, and transform data to the quality and formats required 6. Design an integrated database to hold this data 7. Load the database (and review quality)

Improving Database Performance 8. Develop systems to ensure that content is current (probably transaction systems) From this point, that database becomes ‘usable’ So, what is different with the Federated Database approach 1. Firstly, the economics are different - the investment in the large, high speed processor is not necessary 2. Data is not centralised - it remains with and on the systems used to maintain it

Improving Database Performance 3. The database server can be a mid-range, or several servers. 4. Another aspect is that it is probably most unlikely to run a query which regularly needs access to all of the individual databases - but with the centralised approach all of the data needs to be ‘central’. 5. Local database support local queries - that’s probably why the local databases were introduced.

Improving Database Performance The Internet offers the capability of large federations of content servers Distributed application architectures built around Web servers and many co-operating databases are (slowly) becoming common both • within and • between enterprises (companies). Users are normally unaware of the interfacing and supporting software necessary for federated databases to be accessible

Improving Database Performance Finally, there is another aspect which is used to improve the availability and performance of a database This occurs at the ‘configuration stage’ which is when the database and its requirements are being ‘created’ - quite different from the ‘create table’ which you have used It is the responsibility of the System Administration and Database Management (and of course Senior / Executive Management)

Improving Database Performance Physical Layouts The physical layout very much influences • How much data a database can hold • The number of concurrent and database users • How many concurrent processes can execute • Recovery capability • Performance (response time) • Nature of Database Administration • Cost • Expansion

Improving Database Performance Oracle Architecture Oracle8i and 9i are object-relational database management systems. They contain the capabilities of relational and object-oriented database systems They utilise database servers for many types of business applications including • On Line Transaction Processing (OLTP) • Decision Support Systems • Data Warehousing

Improving Database Performance Oracle Architecture In perspective, Oracle is NOT a ‘high end’ application DBMS A high end system has one or more of these characteristics: • Management of a very large database (VLDB) - probably hundreds of gigabytes or terabytes • Provides access to many concurrent users - in the thousands, or tens of thousands • Gives a guarantee of constant database availability for mission critical applications - 24 hours a day, 7 days a week.

Improving Database Performance Oracle Architecture High end applications environments are not normally controlled by Relational Database Management Systems High end database environments are controlled by mainframe computers and non-relational DBMSs. Current RDBMSs cannot manage very large amounts of data, or perform well under demanding transaction loads.

Improving Database Performance Oracle Architecture There are some guidelines for designing a database with files distributed so that optimum performance, from a specific configuration, can be achieved The primary aspect which needs to be clearly understood is the nature of the database • Is it transaction oriented ? • Is it read-intensive ?

Improving Database Performance The key items which need to be understood are • Identifying Input/Output contention among datafiles • Identifying Input/Output bottlenecks among all database files • Identifying concurrent Input/Output among background processes • Defining the security and performance goals for the database • Defining the system hardware and mirroring architecture • Identifying disks which can be dedicated to the database

Improving Database Performance Let’s look at tablespaces : These ones will be present in some combination System Data dictionary Data Standard-operation tables Data_2 Static tables used during standard operation Indexes Indexes for the standard-operation tables Indexes_2 Indexes for the static tables RBS Standard-operation RollBack Segments RBS_2 Special RollBack segments used for data loads Temp Standard operation temporary segments Temp_user Temporary segments created by a temporary user

Improving Database Performance Tools RDBMS tools tables Tools_1 Indexes for the RDBMS tools tables Users User objects in development tables Agg_data Aggregation data and materialised views Partitions Partitions of a table or index segments; create multiple tablespaces for them Temp_Work Temporary tables used during data load processing

Improving Database Performance (A materialised view stores replicated data based on an underlying query. A materialised view stores data which is replicated from within the current database). A Snapshot stores data from a remote database. The system optimiser may choose to use a materialised view instead of a query against a larger table if the materialised view will return the same data and thus improve response time. A materialised view does however incur an overhead of additional space usage, and maintenance)

Improving Database Performance Each of the tablespaces will require a separate datafile Monitoring of I/O performance among datafiles is done after the database has been created, and the DBA must estimate the I/O load for each datafile (based on what information ?) The physical layout planning is commenced by estimating the relative I/O among the datafiles, with the most active tablespace given a weight of 100. Estimate the I/O from the other datafiles relative to the mostactive datafile

Improving Database Performance Assign a weight of 35 for the System tablespace files and the index tablespaces a value of 1/3 or their data tablespaces Rdb’s may go as high as 70 (depending on the database activity) - between 30 and50 is ‘normal’ In production, Temp will be used by large sorts Tools will be used rarely in production - as will the Tools_2 tablespace

Improving Database Performance So, what do we have ? - Something like this - Tablespace Weight % of Total Data 100 45 Rbs 40 18 System 35 16 Indexes 33 15 Temp 5 2 Data_2 4 2 Indexes_2 2 1 Tools 1 1 (220)

Week 11

Week 11

Presentation Transcript

Week 11

Week 11

Week 11

Week 11

Week 11

Week 11

Week 11

Week 11

Week 11

Week 11

Week 11

Week 11

Week 11

WEEK 11

Week 11

Week 11

Week 11

Week 11

Week 11

Week 11

Week 11

WEEK 11