Créer une présentation
Télécharger la présentation

Télécharger la présentation
## Query Processing – Query Trees

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Evaluation of SQL**• Conceptual order of evaluation • Cartesian product of all tables in from clause • Rows not satisfying where clauses eliminated • Rows grouped by • Groups not satisfying having eliminated • Select clause target list evaluated • If distinct eliminate duplicate rows • Union after each subselect evaluated • Rows sorted in order by**Actual Order of Evaluation**• Order chosen by query optimizer • Determines most efficient way • Syntax checking phase • scan - identify tokens of text • parse- check syntax • validate - check attributes and relation names • Query optimization phase • create internal representation - query tree • identify execution strategies - plan • Maintains statistics for tables and columns, indexes • choose suitable plan in query to optimize query e.g. order of execution of ops, use indexes, etc.**Evaluation cont’d**• Execution phase • query optimizer produces execution plan • code generator generates code • runtime db processor runs query code • To minimize run time • chosen strategy NOT optimal, but reasonably efficient For procedural languages limited need for query optimization**Heuristics in Query Optimization**• Apply select and project before join or other binary operations. Why? • select and project reduce size • strategy is obvious, but challenge was to show could be done with rules**Query Optimization**• Canonical form (initial query tree - conceptual order of evaluation) • Leaf nodes are tables • Internal nodes are operations • Begin by separating select conditions from joins (end up with X) • Combine all selects then all projects • transform to final query tree using general transformation rules for relational algebra**Query tree for SQL query**Select lname From employee, works_on, project Where pname=‘Aquarius’ and pnumber=pno and essn=ssn and bdate > ‘1957-12-31’**General Transformation Rules for Relational Algebra**• Cascade of s • Commutativity of s • Cascade of p • Commuting s with p • Commutativity of |X| (and X) • Commuting s with |X| (or X) • Commuting p with |X| (or X)**General Transformation Rules for Relational Algebra**8. Commutativity of set operations U and ∩ 9. Associativity of |X|, X, U and ∩ 10. Commuting s with set operations 11. The p operation commuted with U 12. Converting a (s, X) sequence into |X|**Outline of a Heuristic Algebraic Optimization Algorithm**• Use Rule 1 break up conjunctive s’s into cascades of s’s • Use Rules 2,4,6, 10 for commutativity of s to: • move s’s as far down tree as possible • Use Rules 5 and 9 for commutativity and associativity of binary operations to: • Place most restrictive s (and |X|) so executed first • fewest tuples, smallest absolute size or smallest selectivity • But make sure no cartesian products result**Outline of a Heuristic Algebraic Optimization Algorithm**• Use Rule 12, combine Cartesian product with s to: • create |X| • Use Rules 3, 4, 7, 11 concerning cascade of p’s and commuting p with other ops to: • move down tree as far as possible • Identify subtrees that represent groups of operations than can be executed by a single algorithm**Summary of Heuristics**• Apply first operations that reduce size of intermediate results • Perform s’s and p’s as early as possible (move down tree as far as possible) • Execute most restrictive s and |X| first (reorder leaf nodes but avoid cartesian products)**Multiple table joins**• Multiple table joins • Query plan identifies most efficient ordering of joins • may not have to materialize actual table resulting from join • instead use pipelining - successive rows output from one step fed into next plan**Converting trees into Query plans**• pipelined evaluation • Forward result from an operation directly to next operation • Result from s, placed in buffer • |X| consumes tuples from buffer • Result from |X| pipelined to p**Query Tree Question**• Should we do a ppname, pnumber then spname = ‘Aquarius’ then ppnumber ? • No, since the operations are done together • the processor would read a row of project, see if pname = ‘Aquarius’ then use pnumber to perform the join.**Algorithms**• DBMS has general access algorithms to implement select, join, or combinations of ops • Don't have separate access routines for each op • Creating temporary files is inefficient • Generate algorithms for combinations of operations • join, select, project - code is created dynamically to implement multiple operations**Materialized table**• Think about what operations require utilizing a materialized table • Input to select? • Input to project? • Input to join? • How is this implemented? Next topic