160 likes | 308 Vues
This presentation by Asst. Prof. Lipyeow Lim from the University of Hawaii at Manoa explores methodologies to enhance query speed in database management. It covers concepts such as Binary Search Trees, ISAM (Indexed Sequential Access Method), and B+ Trees, detailing their structures, insertion/deletion processes, and how they efficiently handle queries. The session also addresses handling duplicates and introduces the process of Bulk Loading for B+ Trees to optimize performance when managing large datasets.
E N D
ICS 421 Spring 2010Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa Lipyeow Lim -- University of Hawaii at Manoa
How to speed up queries? SELECT* FROMSailors WHEREage>40 Array of Sailor Tuples/Records File of Record for Sailors Lipyeow Lim -- University of Hawaii at Manoa
Binary Search Trees • Given search value • if value < node.value, then follow left pointer • Else follow right pointer • How do generalize each index node to an index page ? • How do we generalize this to search pages of records ? 26 20 34 30 41 28 21 18 20 21 21 26 27 28 30 31 34 37 41 45 Lipyeow Lim -- University of Hawaii at Manoa
Indexes • What do we store in the index nodes ? Let k be the key value for an index entry: • Data record with key value k • <k, rid of data record with key value k> • <k, list of rids of data records with key value k> • What kind of queries does the index support? • Range • Point (or equality) Lipyeow Lim -- University of Hawaii at Manoa
Overflow page index entry Indexed Sequential Access Method (ISAM) P K P K P P K m 0 1 2 1 m 2 • Static (m+1)-way Search Tree Non-leaf Pages Leaf Pages Primary pages Lipyeow Lim -- University of Hawaii at Manoa
ISAM: Example Insert new record with age 98 40 • Store data record at the leaf pages • Do we still need the file of record ? 20 33 51 63 age 40 98 63 10 20 51 33 46 55 97 15 27 37 rating sname sid Lipyeow Lim -- University of Hawaii at Manoa
ISAM Facts • File creation: Leaf (data) pages allocated sequentially, sorted by search key; then index pages allocated, then space for overflow pages. • Index entries: <search key value, page id>; they `direct’ search for data entries, which are in leaf pages. • Search: Start at root; use key comparisons to go to leaf. Cost=O(log F N) ; F = # entries/index pg, N = # leaf pgs • Insert: Find leaf data entry belongs to, and put it there. If full, allocate and put in overflow page • Delete: Find and remove from leaf; if empty overflow page, de-allocate. • Static tree structure: inserts/deletes affect only leaf pages. Lipyeow Lim -- University of Hawaii at Manoa
B+ Tree Index B+ Tree Index • Insert/delete at log F N cost; keep tree height-balanced. (F = fanout, N = # leaf pages) • Minimum 50% occupancy (except for root). Each node contains d <= m <= 2d entries. The parameter d is called the order of the tree. • Supports equality and range-searches efficiently. Index Entries Data Entries/Leaf Pages (“Sequence Set”) Lipyeow Lim -- University of Hawaii at Manoa
B+ Tree: Search Example 13 17 24 30 • Leaf entries store <key,rid> pairs • What is the order ? • Search for: age=5, age=15, age>=24 14 16 19 20 22 24 27 29 2 3 5 7 33 34 38 39 Lipyeow Lim -- University of Hawaii at Manoa
Inserting a new data entry • Find correct leaf L. • Put data entry onto L. • If L has enough space, done! • Else, must splitL (into L and a new node L2) • Redistribute entries evenly, copy upmiddle key. • Insert index entry pointing to L2 into parent of L. • This can happen recursively • To split index node, redistribute entries evenly, but push upmiddle key. (Contrast with leaf splits.) • Splits “grow” tree; root split increases height. • Tree growth: gets wider or one level taller at top. Lipyeow Lim -- University of Hawaii at Manoa
Example: Insert 8* 17 17 pushed up into parent node 13 17 24 30 14 16 19 20 22 24 27 29 5 2 2 5 24 7 3 30 3 13 5 8 7 33 34 38 39 5 copied up to parent node 5 Lipyeow Lim -- University of Hawaii at Manoa
Deleting a data entry • Start at root, find leaf L where entry belongs. • Remove the entry. • If L is at least half-full, done! • If L has only d-1 entries, • Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). • If re-distribution fails, mergeL and sibling. • If merge occurred, must delete entry (pointing to L or sibling) from parent of L. • Merge could propagate to root, decreasing height. Lipyeow Lim -- University of Hawaii at Manoa
Miscellaneous • How do we handle data with duplicates ? • Overflow buckets • Make rid part of the key • Each data entry stores <key, list of rids> • Clustered vsUnclustered indexes Index entries UNCLUSTERED CLUSTERED direct search for data entries Data entries Data entries (Index File) (Data file) Data Records Data Records Lipyeow Lim -- University of Hawaii at Manoa
3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35* Bulk Loading a B+ Tree • If we have a large collection of records, and we want to create a B+ tree on some field, doing so by repeatedly inserting records is very slow. • Bulk Loadingcan be done much more efficiently. • Initialization: Sort all data entries, insert pointer to first (leaf) page in a new (root) page. Root Sorted pages of data entries; not yet in B+ tree Lipyeow Lim -- University of Hawaii at Manoa
Bulk Loading (cont.) Root 10 20 • Index entries for leaf pages always entered into right-most index page just above leaf level. When this fills up, it splits. (Split may go up right-most path to the root.) • Much faster than repeated inserts, especially when one considers locking! Data entry pages 6 12 23 35 not yet in B+ tree 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35* Root 20 10 Data entry pages 35 not yet in B+ tree 6 23 12 38 Lipyeow Lim -- University of Hawaii at Manoa 3* 6* 9* 10* 11* 12* 13* 23* 31* 36* 38* 41* 44* 4* 20* 22* 35*
Creating Indexes • Most DBMS (eg. DB2) supports only B+ tree indexes: CREATE INDEX myIdxONmytable(col1, col3) CREATE UNIQUE INDEX myUniqIdxONmytable(col2, col5) CREATE INDEX myIdxONmytable(col1, col3) CLUSTER • If a primary key is specified in the CREATE TABLE statement, an (unclustered) index is automatically created for the PK. • To create a clustered PK index: • Create table without PK constraint • Create index on PK with cluster option • Alter table to add PK constraint • To get rid of unused indexes: DROP INDEX myIdx; Lipyeow Lim -- University of Hawaii at Manoa