1 / 35

Chapter 15 Algorithms for Query Processing and Optimization

Chapter 15 Algorithms for Query Processing and Optimization. ICS 424 Advanced Database Systems Dr. Muhammad Shafique. Outline. Introduction Processing a query SQL queries and relational algebra Implementing basic query operations Heuristics-based query optimization

powa
Télécharger la présentation

Chapter 15 Algorithms for Query Processing and Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 15Algorithms for Query Processing and Optimization ICS 424 Advanced Database Systems Dr. Muhammad Shafique Query Processing and Optimization

  2. Outline • Introduction • Processing a query • SQL queries and relational algebra • Implementing basic query operations • Heuristics-based query optimization • Overview of query optimization in Oracle Query Processing and Optimization

  3. Material Covered from Chapter 15 • Pages 537, 538, 539 • Section 15.1 • Section 15.2 • Section 15.6 • Section 15.7 • Section 15.9 Query Processing and Optimization

  4. Introduction to Query Processing • Query optimization • The process of choosing a suitable execution strategy for processing a query. • Two internal representations of a query: • Query Tree • Query Graph Query Processing and Optimization

  5. Background Review • DDL compiler • DML compiler • Runtime database processor • System catalog Query Processing and Optimization

  6. Processing a Query • Tasks in processing a high-level query • Scanner scans the query and identifies the language tokens • Parser checks syntax of the query • The query is validated by checking that all attribute names and relation names are valid • An intermediate internal representation for the query is created (query tree or query graph) • Query execution strategy is developed • Query optimizer produces an execution plan • Code generator generates the object code • Runtime database processor executes the code • Query processing and query optimization Query Processing and Optimization

  7. Processing a Query • Typical steps in processing a high-level query • Query in a high-level query language like SQL • Scanning, parsing, and validation • Intermediate-form of query like query tree • Query optimizer • Execution plan • Query code generator • Object-code for the query • Run-time database processor • Results of query Query Processing and Optimization

  8. Query Processing and Optimization

  9. SQL Queries and Relational Algebra • SQL query is translated into an equivalent extended relational algebra expression --- represented as a query tree • In order to transform a given query into a query tree, the query is decomposed into query blocks • Query block: • The basic unit that can be translated into the algebraic operators and optimized. • A query block contains a single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clause if these are part of the block. • The query optimizer chooses an execution plan for each block Query Processing and Optimization

  10. COMPANY Relational Database Schema (1) Query Processing and Optimization

  11. COMPANY Relational Database Schema (2) Query Processing and Optimization

  12. SQL Queries and Relational Algebra (1) • Example SELECT Lname, Fname FROM EMPLOYEE WHERE Salary > ( SELECT MAX(Salary) FROM EMPLOYEE WHERE Dno = 5 ) • Inner block and outer block Query Processing and Optimization

  13. Translating SQL Queries into Relational Algebra SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > ( SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO = 5); SELECT LNAME, FNAME FROM EMPLOYEE WHERE SALARY > C SELECT MAX (SALARY) FROM EMPLOYEE WHERE DNO = 5 πLNAME, FNAME(σSALARY>C(EMPLOYEE)) ℱMAX SALARY(σDNO=5 (EMPLOYEE)) Query Processing and Optimization

  14. SQL Queries and Relational Algebra (2) • Uncorrelated nested queries Vs Correlated nested queries • Example Retrieve the name of each employee who works on all the projects controlled by department number 5. SELECT FNAME, LNAME FROM EMPLOYEE WHERE ( (SELECT PNO FROM WORKS_ON WHERE SSN=ESSN) CONTAINS (SELECT PNUMBER FROM PROJECT WHERE DNUM=5) ) Query Processing and Optimization

  15. SQL Queries and Relational Algebra (3) • Example For every project located in ‘Stafford’, retrieve the project number, the controlling department number and the department manager’s last name, address and birthdate. • SQL query: SELECT P.NUMBER,P.DNUM,E.LNAME, E.ADDRESS, E.BDATE FROM PROJECT AS P,DEPARTMENT AS D, EMPLOYEE AS E WHERE P.DNUM=D.DNUMBER AND D.MGRSSN=E.SSN AND P.PLOCATION=‘STAFFORD’; • Relation algebra: PNUMBER, DNUM, LNAME, ADDRESS, BDATE (((PLOCATION=‘STAFFORD’(PROJECT))DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE)) Query Processing and Optimization

  16. SQL Queries and Relational Algebra (4) Query Processing and Optimization

  17. Implementing Basic Query Operations • An RDBMS must provide implementation(s) for all the required operations including relational operators and more • External sorting • Sort-merge strategy • Sorting phase • Number of file blocks (b) • Number of available buffers (nB) • Runs --- (b / nB) • Merging phase --- passes • Degree of merging --- the number of runs that are merged together in each pass Query Processing and Optimization

  18. Algorithms for External Sorting (1) • External sorting: • Refers to sorting algorithms that are suitable for large files of records stored on disk that do not fit entirely in main memory, such as most database files. • Sort-Merge strategy: • Starts by sorting small subfiles (runs) of the main file and then merges the sorted runs, creating larger sorted subfiles that are merged in turn. Query Processing and Optimization

  19. Algorithms for External Sorting(2) Query Processing and Optimization

  20. Algorithms for External Sorting (3) • Analysis Number of file blocks = b Number of initial runs = nR Available buffer space = nB Sorting phase: nR = (b/nB) Degree of merging: dM = Min (nB-1, nR); Number of passes: nP = (logdM(nR)) Number of block accesses: (2 * b) + (2 * b * (logdM(nR))) • Example done in the class Query Processing and Optimization

  21. Implementing Basic Query Operations (cont.) • Estimates of selectivity • Selectivity is the ratio of the number of tuples that satisfy the condition to the total number of tuples in the relation. • SELECT (  ) operator implementation • Linear search • Binary search • Using a primary index (or hash key) • Using primary index to retrieve multiple records • Using clustering index to retrieve multiple records • Using a secondary index on an equality comparison • Conjunctive selection using an individual index • Conjunctive selection using a composite index • Conjunctive selection by intersection of record pointers Query Processing and Optimization

  22. Implementing Basic Query Operations (cont.) • JOIN operator implementation • Nested-loop join • Sort-merge join • Hash join • Partition Hash join • Hybrid hash join • PROJECT operator implementation • Set operator implementation • Implementing Aggregate operators/functions • Implementing OUTER JOIN Query Processing and Optimization

  23. Query Processing and Optimization

  24. Buffer Space and Join performance In the nested-loop join, it makes a difference which file is chosen for the outer loop and which for the inner loop. If EMPLOYEE is used for the outer loop, each block of EMPLOYEE is read once, and the entire DEPARTMENT file (each of its blocks) is read once for each time we read in ( nB - 2) blocks of the EMPLOYEE file. We get the following: Total number of blocks accessed for outer file = bE Number of times ( nB - 2) blocks of outer file are loaded = bE/ nB – 2  Total number of blocks accessed for inner file = bD * bE/ nB – 2  Hence, we get the following total number of block accesses: bE + ( bE/ nB – 2  * bD) = 2000 + ( (2000/5)  * 10) = 6000 blocks On the other hand, if we use the DEPARTMENT records in the outer loop, by symmetry we get the following total number of block accesses: bD + ( bD/ nB – 2  * bE) = 10 + ((10/5)  * 2000) = 4010 blocks Query Processing and Optimization

  25. Implementing Basic Query Operations (cont.) • Combining operations using pipelining • Temporary files based processing • Pipelining or stream-based processing • Example: consider the execution of the following query list of attributes( ( c1(R)  ( c2 (S)) Query Processing and Optimization

  26. General Transformation Rules for Relational Algebra Operations • Cascade of : A conjunctive selection condition can be broken up into a cascade (that is, a sequence) of individual  operations:  C1 AND C2 AND ….AND Cn (R) ≡ C1 (C2( …(Cn(R))…) • Commutativity of  : The  operation is commutative: C1(C2(R)) ≡C2(C1(R)) • Cascade of : In a cascade (sequence) of  operations, all but the last one can be ignored • Commuting  with : If the selection condition c involves only those attributes A1, ..., An in the projection list, the two operations can be commuted • And more … Query Processing and Optimization

  27. Heuristic-Based Query Optimization • Outline of heuristic algebraic optimization algorithm • Break up SELECT operations with conjunctive conditions into a cascade of SELECT operations • Using the commutativity of SELECT with other operations, move each SELECT operation as far down the query tree as is permitted by the attributes involved in the select condition • Using commutativity and associativity of binary operations, rearrange the leaf nodes of the tree • Combine a CARTESIAN PRODUCT operation with a subsequent SELECT operation in the tree into a JOIN operation, if the condition represents a join condition • Using the cascading of PROJECT and the commuting of PROJECT with other operations, break down and move lists of projection attributes down the tree as far as possible by creating new PROJECT operations as needed • Identify sub-trees that represent groups of operations that can be executed by a single algorithm Query Processing and Optimization

  28. Heuristic-Based Query Optimization: Example • Query "Find the last names of employees born after 1957 who work on a project named ‘Aquarius’." • SQL SELECT LNAME    FROM EMPLOYEE, WORKS_ON, PROJECT    WHERE PNAME=‘Aquarius’ AND PNUMBER=PNO AND ESSN=SSN AND BDATE.‘1957-12-31’; Query Processing and Optimization

  29. Query Processing and Optimization

  30. Query Processing and Optimization

  31. Query Processing and Optimization

  32. Query Processing and Optimization

  33. Query Processing and Optimization

  34. Overview of Query Optimization in Oracle • Rule-based query optimization: the optimizer chooses execution plans based on heuristically ranked operations. • May be phased out • Cost-based query optimization: the optimizer examines alternative access paths and operator algorithms and chooses the execution plan with lowest estimate cost. • The query cost is calculated based on the estimated usage of resources such as I/O, CPU and memory needed. • Application developers could specify hints to the ORACLE query optimizer. • application developer might know more information about the data. • SELECT /*+ ...hint... */ [rest of query] • SELECT /*+ index(t1 t1_abc) index(t2 t2_abc) */ COUNT(*)FROM t1, t2WHERE t1.col1 = t2.col1; Query Processing and Optimization

  35. Summary • Background review • Processing a query • SQL queries and relational algebra • Implementing basic query operations • Heuristics-based query optimization • Overview of query optimization in Oracle Query Processing and Optimization

More Related