90 likes | 204 Vues
This document explores search strategies and cost estimation methods in database systems, focusing on B+-Trees and the external sort-merge algorithm. It details the search key retrieval process, discussing the logarithmic complexity expected in the worst-case scenario. Additionally, the paper covers the phases of external sorting, including sorting and merging, analyzing their respective costs based on the number of pages and memory constraints. It also examines selectivity and cardinality for query optimization, providing examples of cost functions for various selection methods.
E N D
B+-Trees: Search • If there are n search-key values in the file, • the path is no longer than log f/2(n) (worst case).
External Sort-Merge • Sorting phase: • Sorts nB pages at a time • nB = # of main memory pages buffer • creates nR = b/nBinitialsorted runs on disk • b = # of file blocks (pages) to be sorted • Sorting Cost = read b blocks + write b blocks = 2 b
External Sort-Merge • Merging phase: • The sorted runs are merged during one or more passes. • The degree of merging (dM) is the number of runs that can be merged in each pass. • dM = Min (nB-1, nR) • nP = (logdM(nR)) • nP: number of passes. • In each pass, • One buffer block is needed to hold one block from each of the runs being merged, and • One block is needed for containing one block of the merged result.
External Sort-Merge • Degree of merging (dM) • # of runs that can be merged together in each pass = min (nB - 1, nR) • Number of passes nP = (logdM(nR)) • In our example • dM = 4 (four-way merging) • min (nB-1, nR) = min(5-1, 205) = 4 • Number of passes nP = (logdM(nR)) = (log4(205)) = 4 • First pass: • 205 initial sorted runs would be merged into 52 sorted runs • Second pass: • 52 sorted runs would be merged into 13 • Third pass: • 13 sorted runs would be merged into 4 • Fourth pass: • 4 sorted runs would be merged into 1
External Sort-Merge • External Sort-Merge: Cost Analysis • Disk accesses for initial run creation (sort phase) as well as in eachmergepass is 2b • reads every block once and writes it out once • Initial # of runs is nR = b/nB and # of runs decreases by a factor of nB - 1 in each merge pass, then the total # of merge passes is np = logdM(nR) • In general, the cost performance of Merge-Sort is • Cost = sort cost + merge cost • Cost = 2b + 2b * np • Cost = 2b + 2b * logdM nR • =2b(logdM(nR) + 1)
Catalog Information • Attribute • d: # of distinct values of an attribute • sl (selectivity): • the ratio of the # of records satisfying the condition to the total # of records in the file. • s (selection cardinality) = sl * r • average # of records that will satisfy an equality condition on the attribute • For a key attribute: • d = r, sl = 1/r, s = 1 • For a nonkey attribute: • assuming that d distinct values are uniformly distributed among the records • the estimated sl = 1/d, s = r/d
Using Selectivity and Cost Estimates in Query Optimization • Examples of Cost Functions for SELECT • S1. Linear search (brute force) approach • CS1a = b; • For an equality condition on a key, CS1a = (b/2) if the record is found; otherwise CS1a = b. • S2. Binary search: • CS2 = log2b + (s/bfr) –1 • For an equality condition on a unique (key) attribute, CS2 =log2b • S3. Using a primary index (S3a) or hash key (S3b) to retrieve a single record • CS3a = x + 1; CS3b = 1 for static or linear hashing; • CS3b = 1 for extendible hashing;
Using Selectivity and Cost Estimates in Query Optimization • Examples of Cost Functions for SELECT (contd.) • S4. Using an ordering index to retrieve multiple records: • For the comparison condition on a key field with an ordering index, CS4 = x + (b/2) • S5. Using a clustering index to retrieve multiple records: • CS5 = x + ┌ (s/bfr) ┐ • S6. Using a secondary (B+-tree) index: • For an equality comparison, CS6a = x + s; • For an comparison condition such as >, <, >=, or <=, • CS6a = x + (bI1/2) + (r/2)
Using Selectivity and Cost Estimates in Query Optimization • Examples of Cost Functions for SELECT (contd.) • S7. Conjunctive selection: • Use either S1 or one of the methods S2 to S6 to solve. • For the latter case, use one condition to retrieve the records and then check in the memory buffer whether each retrieved record satisfies the remaining conditions in the conjunction. • S8. Conjunctive selection using a composite index: • Same as S3a, S5 or S6a, depending on the type of index.