The Query Compiler

The Query Compiler Prepared by : Ankit Patel (226)

References • H. Garcia-Molina, J. Ullman, and J. Widom, “Database System: The Complete Book,” second edition: p.897-913, Prentice Hall, New Jersey, 2008

Compilation of Queries • Compilation means turning a query into a physical query plan, which can be implemented by query engine. • Steps of query compilation : • Parsing • Semantic checking • Selection of the preferred logical query plan • Generating the best physical plan

The Parser • The first step of SQL query processing. • Generates a parse tree • Nodes in the parse tree corresponds to the SQL constructs • Similar to the compiler of a programming language

View Expansion • A very critical part of query compilation. • Expands the view references in the query tree to the actual view. • This introduces several opportunities to optimize the complete query..

Semantic Checking • Checks the semantics of a SQL query. • Examines a parse tree. • Checks : • Attributes • Relation names • Types • Resolves attribute references.

Conversion to a logical query plan • Converts a semantically parsed tree to a algebraic expression. • Conversion is straightforward but subqueries need to be optimized. • One approach is to introduce a two-argument selection that puts the subquery in the condition of the selection, and then apply appropriate transformations for the common special cases.

Algebraic transformation • Many different ways to transform a logical query plan to an actual plan using algebraic transformations. • The laws used for this transformation : • Commutative and associative laws • Laws involving selection • Pushing selection • Laws involving projection • Laws about joins and products • Laws involving duplicate eliminations • Laws involving grouping and aggregation

Estimating sizes of relations • True running time is taken into consideration when selecting the best logical plan. • Two factors the affects the most in estimating the sizes of relation : • Size of relations ( No. of tuples ) • No. of distinct values for each attribute of each relation • Histograms are used by some systems.

Cost based optimizing • Best physical query plan represents the least costly plan. • Factors that decide the cost of a query plan : • Order and grouping operations like joins,unions and intersections. • Nested loop and the hash loop joins used. • Scanning and sorting operations. • Storing intermediate results.

Histograms: Some system keep histograms of the values for a given attribute. This information can be used to obtain better estimates of intermediate relation sizes than the simple methods.

Plan enumeration strategies • Common approaches for searching the space for best physical plan . • Dynamic programming : Tabularizing the best plan for each sub expression • Selinger style programming : sort-order the results as a part of table • Greedy approaches : Making a series of locally optimal decisions • Branch-and-bound : Starts with enumerating the worst plans and reach the best plan

Left-Deep join trees • Left – Deep Join Trees are the binary trees with a single spine down the left edge and with leaves as right children. • This strategy reduces the number of plans to be considered for the best physical plan. • Restrict the search to Left – Deep Join Trees when picking a grouping and order for the join of several relations.

Physical Plans for Selection • Breaking a selection into an index-scan of relation, followed by a filter operation. • The filter then examines the tuples retrieved by the index-scan. • Allows only those to pass which meet the portions of selection condition.

Pipelining versus Materializing • An operator always consumes the result of other operator and is passed through the main memory. • This flow of data between the operators can be controlled to implement “ Pipelining “ . • The intermediate results should be removed from main memory to save space for other operators. • This techniques can implemented using “ materialization “ . • Both the pipelining and the materialization should be considered by the physical query plan generator.

The Query Compiler