330 likes | 489 Vues
BI515: Query Processing in Adaptive Server IQ. Steven Kirk Architect ASIQ Query Engine Group stevek@sybase.com. IQ Query Engine Differences from traditional database engines Differences from traditional query engines Query engine architecture Query optimizer IQ Query Plans
E N D
BI515:Query Processing in Adaptive Server IQ • Steven Kirk • Architect • ASIQ Query Engine Group • stevek@sybase.com
IQ Query Engine Differences from traditional database engines Differences from traditional query engines Query engine architecture Query optimizer IQ Query Plans How to get query plans, and how to read them IQ Query Performance IQ Query Processing Topics
Different Goals than Traditional Database Engines Designed from the ground up to maximize the performance of many users running ad-hoc queries Prioritized IQ Design Goals: Query Performance - very high priority Bulk Update Speed - high priority Small Update Speed - very low priority IQ Query Engine
Different Goals Yeild a Different Design than Traditional Databases Column-oriented storage instead of row-oriented IQ reads only the data for columns referenced Enables efficient data compression Most data pages compressed when written In practice, fully indexed databases are usually 60% to 120% of the raw ASCII data size Single row inserts and deletes are painfully slow IQ Query Engine
Different Design than Traditional Databases (cont.) Column-oriented storage instead of row-oriented Unique row identifiers for the life of a row Bit-map representation for sets of row identifiers Multiple patented bit-mapped index types Assumes majority of columns will have multiple indexes defined IQ Query Engine
Differences from Traditional Query Engines Effects of most predicates local to a table can be combined with each other before any tuples are actually projected from the table. Numerous query operators are available directly within the indexes, including: Grouping and distincting Inter-column comparisons Hash containment IQ Query Engine
Similarities with Traditional Query Engines Once IQ has completed all the work it can in the indexes for each table, tuples are projected and then joined or grouped in the same ways as in other query engines Whenever multiple access paths (or operator algorithms) are available, then a cost-based decision must be made, often based on predicate selectivity estimates IQ Query Engine
IQ Query Engine Architecture IQ Server Front End Shared with ASAnywhere Handles TDS and ODBC Parses Incoming Statements Text Substitution Transformations Cross-DB Decomposition (CIS) Security Checking Java Support Stored Procedures IQ Optimizer Predicate Inference Predicate Selectivity Estimation Join Optimization Grouping Algorithm Selection Subquery Optimization Index Access Selection IQ Run-Time Engine Prefetch Manager Predicate Execution Tuple Projection Join Execution Grouping Execution Sorting Subquery Execution
Predicate Inference Inference of additional predicates via transitive closure Inference of additional local predicates from complex multi-table predicates Elimination of any redundant predicates after join order has been decided IQ Query Optimizer
Predicate Selectivity Estimations Because so many columns are indexed, more predicates have accurate selectivity estimates If no complex indexes are available, or if the predicate expression is too complicated, then we use guesses based on the type of predicate Users can supply their own selectivity estimates thereby overriding the optimizer’s estimates IQ Query Optimizer
Join Optimization Applicable join indexes are considered All possible bushy join plans that avoid cartesian product joins are considered Evaluated join sub-plans are stored in a hash table Alpha-pruning is used to limit the search space IQ Query Optimizer
Join Optimization (cont.) Join Algorithm Selection from among Classic nested-loop Sort-merge Classic hash Nested-loop push-down Hash push-down semi-join Sort-merge push-down semi-join (new in 12.4.2) IQ Query Optimizer
Grouping Algorithm Selection Three different algorithms are compared for each GROUP BY or DISTINCT operator Sort-based Classic hash-based Index-based IQ Query Optimizer
Subquery Optimization Some EXISTS subqueries are flattened into distinct joins Results of uncorrelated subqueries are used directly in the indexes of tables in the parent query blocks Correlated subqueries Are migrated to minimize subquery re-executions Have a 1-value result cache, and outer references are ordered to further minimize re-executions Are optimized to maximize invariant operators IQ Query Optimizer
Index Access Selection The optimizer does a cost-based selection of what order to execute predicates and which index to use for each predicate, considering: Predicate type Indexes available on that column (or columns) Estimated selectivity of this predicate Estimated number of rows passing earlier predicates Whether invariant predicate results can be cached IQ Query Optimizer
IQ uses a Volcano style data-flow query engine Queries are transformed into an upside-down tree of operator nodes, each of which can accept tuples from any nodes immediately below it, and can pass tuples up to the node above it Each node transforms the tuple stream in some way, such as filtering some tuples out, unioning or joining multiple tuple streams together, etc. A ‘Query Plan’ is the textual display of a query tree IQ Query Plans
GROUP BY JOIN LEAF Table: partsupp LEAF Table: supplier IQ Query Plans The Query Tree: The Query: select s_suppkey, count(ps_partkey) from partsupp, supplier where ps_suppkey = s_suppkey and ps_availqty > 100 group by s_suppkey
GROUP BY JOIN LEAF Table: partsupp LEAF Table: supplier IQ Query Plans The Query Tree: The Query Plan: Query Plan: 1 #04: Group By (Hash) Child Node 1: #03 Query_Name: techwave_example_1 Output Vector: 2 entries (12 data bytes) 2 . #03: Join (Hash) Left Child Node: #01 Right Child Node: #02 Valid Join Algorithms: NLJ, NLPDJ, SMJ, SMPDJ, HJ, HPDJ Left Input Table 1: partsupp Right Input Table 1: supplier Join Result Constraint: Many to 1 Condition 1: (partsupp.ps_suppkey = supplier.s_suppkey) Output Vector: 2 entries (8 data bytes) 3 . . #01: Vertical Cursor Table Name: partsupp Table Row Count: 8000 Condition 1 (Invariant): (partsupp.ps_availqty > 100) Condition 1 Selectivity: 0.98951089 Output Vector: 2 entries (8 data bytes) 3 . . #02: Vertical Cursor Table Name: supplier Table Row Count: 100 Output Vector: 1 entries (4 data bytes)
GROUP BY JOIN LEAF Table: partsupp LEAF Table: supplier IQ Query Plans The Query Tree: The Query Plan: Query Plan: 1 #04: Group By (Hash) Child Node 1: #03 Query_Name: techwave_example_1 Output Vector: 2 entries (12 data bytes) 2 . #03: Join (Hash) Left Child Node: #01 Right Child Node: #02 Valid Join Algorithms: NLJ, NLPDJ, SMJ, SMPDJ, HJ, HPDJ Left Input Table 1: partsupp Right Input Table 1: supplier Join Result Constraint: Many to 1 Condition 1: (partsupp.ps_suppkey = supplier.s_suppkey) Output Vector: 2 entries (8 data bytes) 3 . . #01: Vertical Cursor Table Name: partsupp Table Row Count: 8000 Condition 1 (Invariant): (partsupp.ps_availqty > 100) Condition 1 Selectivity: 0.98951089 Output Vector: 2 entries (8 data bytes) 3 . . #02: Vertical Cursor Table Name: supplier Table Row Count: 100 Output Vector: 1 entries (4 data bytes)
Query plans are enabled by setting an option Query plans appear in the server log (.iqmsg) file All user’s query plans plus all other server messages are interleaved in the server log file All plan lines in the server log are prefixed like: 2000-06-20 20:45:23 0001 [20535]: Prefix is composed of: date, time, connection id, message number for query plan lines ‘[20535]’ IQ Query Plans set temporary option Query_Plan = ‘on’
Other useful options with query plans: Query_Name - adds one line to the query plan with the name string supplied; makes it much easier to find your query plan in the midst of the server log file Query_Detail - adds lots of extra info to the query plan that is occasionally useful, like the data types, distinct counts, and indexes available on each column used NoExec - generates a plan without actually executing Query_Plan_After_Run - delays query plan printing so that actual rows projected, etc. can be included in plans IQ Query Plans
Leaf Node: one table or one join index IQ Query Plan Elements 3 . . #01: Vertical Cursor Table Name: partsupp Table Row Count: 8000 Condition 1 (Invariant): (partsupp.ps_availqty > 100) Condition 1 Selectivity: 0.98951089 Output Vector: 2 entries (8 data bytes)
Leaf Node with Query_Plan_After_Run IQ Query Plan Elements 3 . . #01: Vertical Cursor Generated Result Rows: 7916 Table Name: partsupp Table Row Count: 8000 Condition 1 (Invariant): (partsupp.ps_availqty > 100) Condition 1 Selectivity: 0.98951089 Output Vector: 2 entries (8 data bytes)
Leaf Node with Query_Detail IQ Query Plan Elements 3 . . #01: Vertical Cursor Table Name: partsupp Table Row Count: 8000 Condition 1 (Invariant): (partsupp.ps_availqty > 100) Condition 1 Selectivity: 0.98951089 Condition 1 Index: HNG ps_availqty_hng Output Vector: 2 entries (8 data bytes) Output 1: partsupp.ps_partkey Output 1 Data Type: INTEGER (10, 0) Output 1 Base Distincts: 2000 Output 1 Indexes: FP, HG Output 2: partsupp.ps_suppkey Output 2 Data Type: INTEGER (10, 0) Output 2 Base Distincts: 100 Output 2 Indexes: FP, HG Column 1: partsupp.ps_availqty Column 1 Data Type: INTEGER (10, 0) Column 1 Base Distincts: 5497 Column 1 Indexes: FP, HNG, HG Index Output 1 (Projected) : partsupp.ps_partkey Index Output 2 (Projected) : partsupp.ps_suppkey
IQ Query Plan Elements • Join Node 2 . #03: Join (Hash) Left Child Node: #01 Right Child Node: #02 Valid Join Algorithms: NLJ, NLPDJ, SMJ, SMPDJ, HJ, HPDJ Left Input Table 1: partsupp Right Input Table 1: supplier Join Result Constraint: Many to 1 Condition 1: (partsupp.ps_suppkey = supplier.s_suppkey) Output Vector: 2 entries (8 data bytes)
IQ Query Plan Elements • Join Node with Query_Plan_After_Run 2 . #03: Join (Hash) Left Child Node: #01 Right Child Node: #02 Generated Result Rows: 7916 Valid Join Algorithms: NLJ, NLPDJ, SMJ, SMPDJ, HJ, HPDJ Left Input Table 1: partsupp Right Input Table 1: supplier Join Result Constraint: Many to 1 Condition 1: (partsupp.ps_suppkey = supplier.s_suppkey) Hash actual number of keys: 100 Output Vector: 2 entries (8 data bytes)
IQ Query Plan Elements • Join Node with Query_Detail 2 . #03: Join (Hash) Left Child Node: #01 Right Child Node: #02 Valid Join Algorithms: NLJ, NLPDJ, SMJ, SMPDJ, HJ, HPDJ Left Input Table 1: partsupp Right Input Table 1: supplier Join Result Constraint: Many to 1 Condition 1: (partsupp.ps_suppkey = supplier.s_suppkey) Output Vector: 2 entries (8 data bytes) Output 1: supplier.s_suppkey Output 2: partsupp.ps_partkey
IQ Query Plan Elements • Join Node: Hash Push-Down 2 . #03: Join (Hash PushDown) Left Child Node: #01 Right Child Node: #02 Valid Join Algorithms: NLJ, NLPDJ, SMJ, SMPDJ, HJ, HPDJ Left Input Table 1: partsupp Right Input Table 1: supplier Join Result Constraint: Many to 1 Condition 1: (partsupp.ps_suppkey = supplier.s_suppkey) Condition 2 (Pushed): (partsupp.ps_suppkey IN BV(0)) Output Vector: 2 entries (8 data bytes)
IQ Query Plan Elements • Join Node: Sort-Merge Push-Down 2 . #03: Join (Sort-Merge PushDown) Left Child Node: #05 Right Child Node: #06 Valid Join Algorithms: NLJ, NLPDJ, SMJ, SMPDJ, HJ, HPDJ Left Input Table 1: partsupp Right Input Table 1: supplier Join Result Constraint: Many to 1 Condition 1: (partsupp.ps_suppkey = supplier.s_suppkey) Condition 2 (Pushed): (partsupp.ps_suppkey PROBABLY_IN BV(0)) Output Vector: 2 entries (8 data bytes)
IQ Query Plans • Other query plan node types: • Group By - grouping operator seen earlier • Order By - sort operator used by sort-merge joins, subquery outer reference ordering, and sort-based grouping, as well as the ORDER BY clause • Vertical Grouping Cursor - leaf node where grouping is being done in the indexes • Root - only appears when Query_Detail is on; site of internal data type to external data type conversions
IQ Query Plans • Other query plan node types (cont.): • Scrolling Cursor Store - results cache that appears at the top of a query tree to support scrolling cursor access (see the Force_No_Scroll_Cursors option description) • Store - results cache for invariant portions of correlated subqueries • Subquery - executes correlated subqueries, or uncorrelated subqueries that could not be used directly with the parent query blocks indexes • Filter - site of non-pushable non-join predicates
IQ Query Performance: Hints • Use a sensible schema design • Follow the recommendations in the manual for where to create indexes, and what types to create, including HG indexes on all join columns • Define the IQ UNIQUE attribute on all columns with modest expected distinct value counts (< 10000) • Do not use tiny page sizes. For big benchmarks on 1Gb memory machines we often use 128K pages • RTFM and RTFRN
IQ Query Performance: Experimentation • The IQ optimizer usually does a far better job of creating a good query plan than we can, but there will be times when you have a specific query that just runs too slow (IYHO). Things you can try: • If there is any local predicate where the estimated selectivity is way off, supply the actual selectivity. • You can often influence the choice of algorithms by experimenting with the options: Join_Preference, Aggregation_Preference, and Index_Preference • You can force the optimizer to accept your specified join order exactly by setting Join_Optimization off.