Optimizing SQL Query Processing & Execution: Heuristic Algebraic Algorithms

Query Processing – Query Trees

Evaluation of SQL • Conceptual order of evaluation • Cartesian product of all tables in from clause • Rows not satisfying where clauses eliminated • Rows grouped by • Groups not satisfying having eliminated • Select clause target list evaluated • If distinct eliminate duplicate rows • Union after each subselect evaluated • Rows sorted in order by

Actual Order of Evaluation • Order chosen by query optimizer • Determines most efficient way • Syntax checking phase • scan - identify tokens of text • parse- check syntax • validate - check attributes and relation names • Query optimization phase • create internal representation - query tree • identify execution strategies - plan • Maintains statistics for tables and columns, indexes • choose suitable plan in query to optimize query e.g. order of execution of ops, use indexes, etc.

Evaluation cont’d • Execution phase • query optimizer produces execution plan • code generator generates code • runtime db processor runs query code • To minimize run time • chosen strategy NOT optimal, but reasonably efficient For procedural languages limited need for query optimization

Heuristics in Query Optimization • Apply select and project before join or other binary operations. Why? • select and project reduce size • strategy is obvious, but challenge was to show could be done with rules

Query Optimization • Canonical form (initial query tree - conceptual order of evaluation) • Leaf nodes are tables • Internal nodes are operations • Begin by separating select conditions from joins (end up with X) • Combine all selects then all projects • transform to final query tree using general transformation rules for relational algebra

Query tree for SQL query Select lname From employee, works_on, project Where pname=‘Aquarius’ and pnumber=pno and essn=ssn and bdate > ‘1957-12-31’

General Transformation Rules for Relational Algebra • Cascade of s • Commutativity of s • Cascade of p • Commuting s with p • Commutativity of |X| (and X) • Commuting s with |X| (or X) • Commuting p with |X| (or X)

General Transformation Rules for Relational Algebra 8. Commutativity of set operations U and ∩ 9. Associativity of |X|, X, U and ∩ 10. Commuting s with set operations 11. The p operation commuted with U 12. Converting a (s, X) sequence into |X|

Outline of a Heuristic Algebraic Optimization Algorithm • Use Rule 1 break up conjunctive s’s into cascades of s’s • Use Rules 2,4,6, 10 for commutativity of s to: • move s’s as far down tree as possible • Use Rules 5 and 9 for commutativity and associativity of binary operations to: • Place most restrictive s (and |X|) so executed first • fewest tuples, smallest absolute size or smallest selectivity • But make sure no cartesian products result

Outline of a Heuristic Algebraic Optimization Algorithm • Use Rule 12, combine Cartesian product with s to: • create |X| • Use Rules 3, 4, 7, 11 concerning cascade of p’s and commuting p with other ops to: • move down tree as far as possible • Identify subtrees that represent groups of operations than can be executed by a single algorithm

Summary of Heuristics • Apply first operations that reduce size of intermediate results • Perform s’s and p’s as early as possible (move down tree as far as possible) • Execute most restrictive s and |X| first (reorder leaf nodes but avoid cartesian products)

Multiple table joins • Multiple table joins • Query plan identifies most efficient ordering of joins • may not have to materialize actual table resulting from join • instead use pipelining - successive rows output from one step fed into next plan

Converting trees into Query plans • pipelined evaluation • Forward result from an operation directly to next operation • Result from s, placed in buffer • |X| consumes tuples from buffer • Result from |X| pipelined to p

Query Tree Question • Should we do a ppname, pnumber then spname = ‘Aquarius’ then ppnumber ? • No, since the operations are done together • the processor would read a row of project, see if pname = ‘Aquarius’ then use pnumber to perform the join.

Algorithms • DBMS has general access algorithms to implement select, join, or combinations of ops • Don't have separate access routines for each op • Creating temporary files is inefficient • Generate algorithms for combinations of operations • join, select, project - code is created dynamically to implement multiple operations

Materialized table • Think about what operations require utilizing a materialized table • Input to select? • Input to project? • Input to join? • How is this implemented? Next topic

Optimizing SQL Query Processing & Execution: Heuristic Algebraic Algorithms

Optimizing SQL Query Processing & Execution: Heuristic Algebraic Algorithms

Presentation Transcript

New Features of the Query/View/Reports System (QVR)

Displaying Selected Data with Queries

More on Rankings

QUERY OPTIMIZATION AND QUERY PROCESSING

Introduction Background Distributed DBMS Architecture Distributed Database Design

Distributed Databases

SPARQL

Adaptive Query Processing

CPS216: Advanced Database Systems Notes 06:Query Execution (Sort and Join operators)

6 . Distributed Query Optimization

Outline

Adaptive Query Processing with Eddies

Structured Query Language (SQL)

Lecture 7: Query Execution

CS 245: Database System Principles

Data Mining 2

5 Min self introduction

Chapter 20

INTRODUCTION TO PEOPLESOFT QUERY

What’s in This Module?