Enhancing Database Performance and Data Manipulation Techniques

I/O Trap • Reading existing data • Changing existing data • Update existing records • Adding new records • Deleting records • All these involve going to disk => slowest device by considerable margin • Minimise / reschedule physical I/O

Issues in Database Performance • Performance in Read / write are hardware issues => throw money at it • Performance of DB = ability of engine to locate data • Factors affecting speed of retrieval • Cache (sizing of objects) • Access method (table scan / indexes…) • Contention between processes • Indirect processes (roll back, archiving…)

Reading data in Oracle 1 – determine how to access block where data is located • read data dictionary stored in separate part of DB • DD may be loaded up in cache (aka row cache) to limit I/O activity 2 - DD indicates preferred access method for block • B tree index, partitioning, hash clusters etc… 3 - Search begins either in full scan or with index until data found

Designing DB for reading • Supply methods for high precision access to data • But some queries will defeat the strategies • E.g. credit cards transactions – monthly report of scattered items • No solution = take off-line

Changing data • Oracle makes hard work of changes • Rollback data (immediate) • Log files (long term) • Changed blocks read and updated in buffer • Released to disk as buffer is cleared • But rollback info generate most I/O operations • In sensitive environments, simultaneous archiving makes it worse (ARCHIVELOG mode)

Indexing Problems • Super-fast indexes need updating as data is changed => DB slows down. • More complex index = more complex update mechanism = more rollback …. • DB physical structure degrades, so does index (eg split blocks) • Performance decreases over time • Rebuild needed (which interferes with operations) INDEX_STATS tells you how big the index has grown

Side effects • Definition of DB = consistency • A reading process may need an older version of the data • Need to create a private version of the data • Cl: processes that should be Read only require writes Attempt to describe all required operations in finishing to execute query on old data

Effects of read consistency • Must find an older version of data • Must apply roll back • Must find old roll back block (I/O) • Roll back index (I/O) • Find data (I/O) • Roll back data • Read old data • Reverse all changes • Significant I/O implications + buffer full of old stuff

Conclusions • Writers and readers DO interfere with each other • The mechanisms used by Oracle to bypass locking have performance side effects • Performance come with minimising I/O • i.e. with good access techniques • Precision of data location • Physical proximity of related data (cf: caching) • However: Techniques to reduce I/O numbers tend to reduce the speed of access!

Indexing example (see figure): • Alphabetic search using B tree index for name = Oscar Smith • Split the table in sections and read until find start overshoot letter “S”: Smith, N is the one • Then look for page with Smith, N in header • Then scan for actual entry in index • Read pointer • Move to table • Read

About Btree indexes • Root and Branch blocks = approx 2% of index (small) • In frequent hit situations, both blocks loaded in Data Buffer all the time • Then only 2 I/O may be required: • One to read leaf block • One to read the table • In practice, read in index and in table may require reading several blocks • E.g: Several similar names • Index spans two (or more) leaf blocks • Each record is in a different data block

Creating and Using Indexes • Important for live access, but even more for querying with multiple tables • Value matching is costly process in RDB • No pointers • Connection purely on comparison basis only • One row with all other rows • If link between 2 huge tables, perf is low • All RDBs use some form of indexing • Some complexity involved as index can reduce physical I/O while increasing logical I/O (ie CPU time)

Btree indexing • Creating an index means creating a table with X+1 columns • X = number of columns in index • Rowid (added field) [table block + row] • Index is then copied into consecutive blocks • PCTFREE function leaves space for growth of data (but high value will generate many leaf blocks) • Pointer is added to previous and next leaf blocks in header of block

Btree indexing (2) • Then branch layer is built: • If index > one block • Collect all first entries + block address of each leaf block • Write down into the first level branch block (packed) • If branch block is full, initiates second level of branch blocks etc…. • Room is saved in branch blocks: • No forward and backward pointer in branch blocks • Entries are “trimmed” to the bare minimum • First entries are omitted • See figure 6.1

Syntax • CREATE INDEX name ON table name (field1, field2 …) PCTFREE 80; • Oracle has many utility programmes to assess the performance of indexes – use INDEX_STATS (see handout) • Practical problem: is it easy to create a new index for a large table? NO!

Updating indexes • Index entries are NEVER changed • Marked as deleted and re-inserted • Space made available cannot be used until after index is re-built • Inserts that don’t fit split the block (rarely 50/50!) • If a blocks becomes empty, it is marked as free, but is never removed • Also, blocks never merge automatically

Some problems • Some situations cannot be addressed with indexes • e.g. In a FIFO processing situation (e.g. a queue), indexes will prove counterproductive • Index may grow to stupid proportions even with small error rate (unsuccessful processing of data) • Every time a transaction is added or processed (deleted) the index must change

Alternative: Bitmap indexing • Imagine following query in huge table Find customers living in London, with 2 cars and 3 children occupying a 4 bed house • Index not useful – why? • Too big • If query changes in any way =>new index needed • Maintaining a set of indexes for each query would just be too costly • Use a bitmap (see table 6.1, 6.2 and 6.3)

Bitmap indexes (2) • Special for data warehouse type DBs • Build one bitmap for each relevant parameter • Combine bitmaps using the “and” SQL keyword • Also possible to use “not”ing of bitmap (see table 6.4, 6.5)

Key points to remember • What is the key advantage of a bitmap index? • What situation does it best suit ? • Bitmaps can also be packed by Oracle compression features • But size is unpredictable – why?

Example: • Table with 1,000,000 rows • Bitmap on one column that can contain one of 8 different values (e.g. city names) • Data is such that all same city together 125,000 times • Write the bitmap • Imagine what compression can be achieved • Data is such that cities are in random order, but same number of each • Same questions

solution First scenario • Bitmap for first city: 125,000 ones and 875,000 zeros [trimmed off] • Size ~ 125,000 bits or approx 18Kbytes • Full bitmap = 156 Kbytes Second scenario: • Bitmap for first city sequences of 1’s and 7 zeros, repeated 125,000 times • Size ~ 1,000,000 bits or approx 140 Kbytes • Full bitmap = 1.12 Mbytes • But BTree index for such data would be around 12MB

Conclusion • Bitmap indexes work best when combined • They are very quick to build • Up to a million rows for 10 seconds • Work best when limited number of values + when high repetitions • Best way to deal with huge volumes => make drastic selection of interesting rows before reading the table • Warning: one entry in a Bitmap = hundreds of records = > locking can be crazy (OLTP systems) • => for datawarehouse type applications (no contention)

What oracle says about it • Use index for queries with low hit ratio or when queries access < 2 - 4% of data • Index maintenance is costly so index “just in case” is silly • Must analyse the type of data when deciding what kind of index • Do NOT use columns with loads of changes in an index • Use indexed fields in “where” statement • Can also write queries with NO_INDEX

Administering indexes • Indexes degrade over time • Should stabilise around 75% efficiency, but don’t • Run stats: Analyse index NAME validate structure Analyse index NAME compute statistics Analyse index NAME estimate statistics sample 1 percent See table 6.6

Partitioned tables • Partitions / partitioning / partitioned tables • For very large tables • Improve querying • Easier admin • Backup and recovery easier • Optimiser knows when partitioning used • Can use in SQL also

Creating a PT • Create table FRED ( ID number name varchar2(25) age number constraint fred_pk primary key (ID) ) partition by range (age) (partition PART1 values less than (21) partition PART2 values less than (40) partition PART3 values less than (maxvalue)

Warning • Specification of partition is exclusive E.g. partition by range (name) (partition part1 values less than (‘F’) implies that f is excluded • Maxvalue is a general term to pick up anything that failed so far • Works for text as well as number

Hash partition • Only in Oracle 8i and above • Uses a numerical algorithm based on partition key to determine where to place data • Range partition = consecutive values together • Hash = consecutive values may be in different partitions • Also gives more partitions = reduces the risk of contention

What is Hash? • Imagine 8GB table – split in 8 / 1 GB • No intuitively clever way to split data • Or obvious way is totally imbalanced • 1 partition 7BG + 7 140MB • Huge variations in performance • Randomise breakdown of data so objects of similar size • Select one column • Select number of chuncks • Oracle does the rest!

Mechanics of hashing • Each record is allocated into a bucket based on key value – e.g. Name = Joe • Applying the hashing function to the value Joe uniquely returns the bucket number where the record is located: • E.g. using prime number • divide KEY by a prime number • If text, translation into numeric value using ASCII code • use remainder of the division = address on the disk • if record already at same address - pointer to overflow area.

Hash partition - SQL Create table FRED ( Name varchar2(25) primary key, Age number, Years abroad number ) Partition by hash (age) Partitions 2 Store in (Part1_fred, Part2_fred); (Not compulsory)

Sub-partitions Create table FRED ( Name varchar2(25) primary key, Age number, Years abroad number ) Partition by range (years abroad) Subpartition by hash (name) Subpartitions 5 (partition Part1 values less than (1) partition Part2 values less than (3) partition Part3 values less than (6) partition Part4 values less than (MAXVALUE));

Indexing partitions • Performance requirements may mean Partitioned tables should be indexed (separate issue) Create index FRED_NAME on FRED (name) Local Partitions (Part1, Part2, Part3, Part4) • Local means create separate index for each partition of the table • Alternative is to create a global index with values from different partitions • Global indexes cannot be created for Hash partitions

Enhancing Database Performance and Data Manipulation Techniques