290 likes | 509 Vues
Issues in Database Performance. Performance in Read / write are hardware issues => throw money at it Performance of DB = ability of engine to locate data Factors affecting speed of retrieval Cache (sizing of objects) Access method (table scan / indexes…) Contention between processes
E N D
Issues in Database Performance • Performance in Read / write are hardware issues => throw money at it • Performance of DB = ability of engine to locate data • Factors affecting speed of retrieval • Cache (sizing of objects) • Access method (table scan / indexes…) • Contention between processes • Indirect processes (roll back, archiving…)
Reading data in Oracle 1 – determine how to access block where data is located • read data dictionary stored in separate part of DB • DD may be loaded up in cache (aka row cache) to limit I/O activity 2 - DD indicates preferred access method for block • B tree index, partitioning, hash clusters etc… 3 - Search begins either in full scan or with index until data found
Designing DB for reading • Supply methods for high precision access to data • But some queries will defeat the strategies • E.g. credit cards transactions – monthly report of scattered items • No solution = take off-line
Changing data • Oracle makes hard work of changes • Rollback data (immediate) • Log files (long term) • Changed blocks read and updated in buffer • Released to disk as buffer is cleared • But rollback info generate most I/O operations • In sensitive environments, simultaneous archiving makes it worse (ARCHIVELOG mode)
Indexing Problems • Super-fast indexes need updating as data is changed => DB slows down. • More complex index = more complex update mechanism = more rollback …. • DB physical structure degrades, so does index (eg split blocks) • Performance decreases over time • Rebuild needed (which interferes with operations) INDEX_STATS tells you how big the index has grown
Side effects • Role of DBMS enforce data consistency • A reading process may need an older version of the data • Need to create a private version of the data => processes that should be Read Only require writes Attempt to describe all required operations in executing a query requiring old data
Solution • To create an older version of data • Must apply roll back • find old roll back block (I/O) • Roll back index (I/O) • Find data (I/O) • Roll back data • Read old data • Reverse all changes (multiple I/O) • Significant I/O implications + buffer full of old stuff
Conclusions • Writers and readers DO interfere with each other • The mechanisms used by Oracle to bypass locking have performance side effects • Performance come with minimising I/O • i.e. with good access techniques • Precision of data location • Physical proximity of related data (cf: caching) • However: Techniques to reduce I/O numbers tend to reduce the speed of access!
Indexing example (see figure 3.1): • Alphabetic search using B tree index for name = Oscar Smith • Split the table in sections (eg: half) and read until find start beyond letter “S” + go back one • Do same in branch block • Smith, N is the one • Then look for page with Smith, N in header • Scan for actual entry in index • Read address • Move to table • Read
About Btree indexes • Root and Branch blocks = approx 2% of index (small) • In frequent hit situations, both blocks loaded in Data Buffer all the time • Then only 2 I/O may be required: • One to read leaf block • One to read the table • In practice, read in index and in table may require reading several blocks • ?
Creating and Using Indexes • Important for live access, • Even more for querying multiple tables • Value matching is costly process in RDB • No pointers • Connection purely on comparison basis only • One value against all values in joint field • If link between 2 huge tables, perf is low • All RDBs use some form of indexing • Some complexity involved as index can reduce physical I/O at the cost of logical I/O (CPU time)
Btree indexing • Creating an index means creating a table with X+1 columns • X = number of columns in index • Rowid (added field) [table block + row] • Index is then copied into consecutive blocks • PCTFREE function leaves space for data growth • high value will generate many leaf blocks but reduce occurrence of split blocks in time • Pointer is added to previous and next leaf blocks in header of block
Btree indexing (2) • Then branch layer is built: • If index > one block • Collect all first entries + block address of each leaf block • Write down into the first level branch block (packed) • If branch block is full, initiates second level of branch blocks etc…. • Room is saved in branch blocks: • No forward and backward pointer in branch blocks • Entries are “trimmed” to the bare minimum • First entries are omitted • See figure 6.1
Syntax • CREATE INDEX name ON table name (field1, field2 …) PCTFREE 50; • utility programmes to assess performance of indexes – eg INDEX_STATS View
Updating indexes • Index entries are NEVER changed • Marked as deleted and re-inserted • Space made available cannot be used until after index is re-built • Inserts that don’t fit split the block (rarely 50/50!) • If a blocks becomes empty, it is marked as free, but is never removed • Also, blocks never merge automatically
Some problems • Some situations cannot be addressed with indexes • e.g. In a FIFO processing situation (e.g. a queue), indexes will prove counterproductive • Index may grow to stupid proportions even with small error rate (unsuccessful processing of data) • Every time a transaction is added or processed (deleted) the index must change
Alternative: Bitmap indexing • Imagine following query in huge table Find customers living in London, with 2 cars and 3 children occupying a 4 bed house • Index not useful – why? • Too big • If query changes in any way =>new index needed • Maintaining a set of indexes for each query would just be too costly • Use a bitmap (see table 6.1, 6.2 and 6.3)
Bitmap indexes (2) • Special for data warehouse type DBs • Build one bitmap for each relevant parameter • Combine bitmaps using the “and” SQL keyword • Also possible to use “not”ing of bitmap (see table 6.4, 6.5)
Key points to remember • What is the key advantage of a bitmap index? • What situation does it best suit ? • Bitmaps can also be packed by Oracle compression features • But size is unpredictable – why?
Example: • Table with 1,000,000 rows • Bitmap on one column that can contain one of 8 different values (e.g. city names) • Data is such that all same city together 125,000 times • Write the bitmap • Imagine what compression can be achieved • Data is such that cities are in random order, but same number of each • Same questions
solution First scenario • Bitmap for first city: 125,000 ones and 875,000 zeros [trimmed off] • Size ~ 125,000 bits or approx 18Kbytes • Full bitmap = 156 Kbytes Second scenario: • Bitmap for first city sequences of 1’s and 7 zeros, repeated 125,000 times • Size ~ 1,000,000 bits or approx 140 Kbytes • Full bitmap = 1.12 Mbytes • But BTree index for such data would be around 12MB
Conclusion • Bitmap indexes work best when combined • They are very quick to build • Up to a million rows for 10 seconds • Work best when limited number of values + when high repetitions • Best way to deal with huge volumes => make drastic selection of interesting rows before reading the table • Warning: one entry in a Bitmap = hundreds of records = > locking can be crazy (OLTP systems) • => for datawarehouse type applications (no contention)
What oracle says about it • Use index for queries with low hit ratio or when queries access < 2 - 4% of data • Index maintenance is costly so index “just in case” is silly • Must analyse the type of data when deciding what kind of index • Do NOT use columns with loads of changes in an index • Use indexed fields in “where” statement • Can also write queries with NO_INDEX
Administering indexes • Indexes degrade over time • Should stabilise around 75% efficiency, but don’t • Run stats: Analyse index NAME validate structure Analyse index NAME compute statistics Analyse index NAME estimate statistics sample 1 percent See table 6.6