DM211 Query Processing Enhancements in Adaptive Server Enterprise 12.0

DM211Query Processing Enhancements in Adaptive Server Enterprise 12.0 • Michael C. Mamet • Systems Consultant / President • XLS, Inc. • mamet@worldnet.att.net

Merge Joins Join Transitive Closure Predicate Factoring and Transformation Like Optimization Presentation Map

The optimizer can now choose Merge Joins in addition to Nested Loop Joins Pre-12.0 releases support only Nested Loop Joins (NLJ) Referred to as Nested iteration in showplan Adaptive Server 12.0 may use a combination of Nested Loop and Merge Joins (MJ) for multi-table join queries The optimizer costs each join to determine the optimal join type (NLJ or MJ) Selection of the appropriate join type is greatly dependent on server and table configuration and the availability of statistics on the join columns Adaptive Server 12.0 introduces Merge Joins...

Merge Join processing: Two tables to be joined are opened for scanning The join columns of each table are compared row by row looking for matching values If a match is found, the results are “merged” into the result set If a duplicate key occurs, the scan must be repositioned so that the matching row can be reread to meet the join condition - This is called “rewinding” Sometimes referred to as Merge Scan Join or Sort Merge Join in database literature What is a Merge Join?

Nested Loop Joins result in a lot of disk I/O when there are many qualifying rows Inner rows are scanned repeatedly for each qualifying row in the outer table, resulting in extra i/o What are the benefits of Merge Joins over Nested Loop Joins? lineitem clustered on l_orderkey partsupp clustered on ps_partkey part clustered on p_partkey

Merge Join processing relies on ordered input on the joining columns Achieved by index access or sorting Merge Joins generally result in less I/O than Nested Loop Joins lineitem sorted on l_partkey partsupp clustered on ps_partkey part clustered on p_partkey

If the join column is already sorted due to an index, a separate sort step is avoided... Data is clustered when there is a clustered index on the join column of an APL table The leaf level is clustered when there is a non-clustered index (or a DOL clustered index) on the join column Clustered indexes on DOL tables are structurally non-clustered Access to clustered data reduces i/o The presence of a B-tree index provides sorted input data

The type of Merge Join chosen depends on the join keys and available indexes: Full Merge Join Left Merge Join Right Merge Join Sort Merge Join There are actually 8 Merge Joins possibilities since each variant can also be done in parallel Parallel Merge Joins generally do not require as many worker processes as Nested Loop Joins ASE 12.0 implements several Merge Join variants

Full Merge Join One step process MJ Table R Table S Scan the indexes on the join keys for both tables and merge the results

The tables to be joined have useful indexes on the join keys No sorting necessary Joined tables are merged by following the indexes The index guarantees that sorted data can be accessed by following the index leaf Full Merge Joins are only possible for the outermost pair of tables in the join order If the join order is {R,S,T,U}, only R and S can be joined by a Full Merge Join Full Merge Join details...

Left Merge Join Step 1 Step 2 LMJ Worktable Sort Table S Table R Worktable Create and populate the worktable Sort the worktable and merge with the outer (left) table

The inner table does not have a useful index on the join column The inner (right) table must be first sorted into a worktable A useful index with the necessary ordering from the left (outer) side is used to perform the merge join Like Full Merge Joins, Left Merge Joins are only possible for the outermost pair of tables in the join order Left Merge Join details...

Right Merge Join Step 1 Step 2 RMJ Worktable Sort Table R Worktable Table S Create and populate the worktable Sort the worktable and merge with the inner (right) table

The outer table does not have a useful index on the join column The outer (left) table must be first sorted into a worktable A useful index with the necessary ordering from the right (inner) side is used to perform the merge join Right Merge Join details...

Sort-Merge Join Step 1 Step 2 Step 3 Worktable1 Worktable2 SMJ Sort Sort Table R Table S Worktable1 Worktable2 Create and populate the worktables Sort the worktables and merge the results

Neither table has an index on the join column, or the optimizer thinks that it is cheaper to sort the data Sort-Merge Joins are chosen when a useful index is not available The base tables are read into separate worktables The worktables are sorted in tempdb The worktables are merged (not the base tables) Subsequent joins are to the worktables Sort-Merge Join details...

select l_orderkey, sum(l_extendedprice * (1 - l_discount)) as revenue, o_orderdate, o_shippriority from customer, orders, lineitem where c_mktsegment = 'BUILDING' and c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate < '1995-03-15' and l_shipdate > '1995-03-15' group by l_orderkey, o_orderdate, o_shippriority order by revenue desc, o_orderdate The optimizer can choose any combination of join types

QUERY PLAN FOR STATEMENT 1 (at line 2). Executed in parallel by coordinating process and 6 worker processes. STEP 1 The type of query is INSERT. The update mode is direct. Executed in parallel by coordinating process and 6 worker processes. FROM TABLE orders Nested iteration. Table Scan. showplan describes the join strategy

... Executed in parallel with a 6-way partition scan. ... TO TABLE Worktable2. Worktable2 created for sort merge join. STEP 2 The type of query is INSERT. The update mode is direct. Executed in parallel by coordinating process and 6 worker processes. If necessary, base table data is sorted into a worktable in tempdb

FROM TABLE customer Merge join (outer table). Parallel data merge using 6 worker processes. Using Clustered Index. Index : customer_custkey_clu Forward scan. Positioning by key. Keys are: c_custkey ASC Using I/O Size 16 Kbytes for data pages. With LRU Buffer Replacement Strategy for data pages. The base table is the outer when a Left Merge Join is chosen...

FROM TABLE Worktable2. Merge join (inner table). Table Scan. Forward scan. Positioning at start of table. Using I/O Size 2 Kbytes for data pages. With LRU Buffer Replacement Strategy for data pages. TO TABLE The results are sorted into worktable3 Worktable3. Worktable3 created for sort merge join. … And the worktable is the inner

STEP 3 The type of query is SELECT (into Worktable1). ... Executed in parallel by coordinating process and 6 worker processes. FROM TABLE Worktable3. Merge join (outer table). Parallel data merge using 6 worker processes. Table Scan. Forward scan. Positioning at start of table. Using I/O Size 2 Kbytes for data pages. The worktable is the outer when a Right Merge Join is chosen...

FROM TABLE lineitem Merge join (inner table). Using Clustered Index. Index : lineitem_lorder_clu Forward scan. Positioning by key. Keys are: l_orderkey ASC ... TO TABLE Worktable1. Parallel work table merge. … And the base table is the inner

The type of Merge Join is not explicitly stated in showplan, however it is easy to recognize When the join type is Full Merge Join, no worktables are created showplan describes the base tables as (inner table) and (outer table) When the join type is Right or Left Merge Join, one of the merged tables a worktable, the other is a base table When the join type is Sort Merge Join, both of the merged tables are worktables Notes on showplan for Merge Joins

Merge Joins are not selected for: Non-equi-join predicates (<, >, <=, >=, != ) Outer Joins Text/image/Java objects in a select list or where clause The inner block of a subquery Declarative Referential Constraints Multi-table updates/deletes or updateable cursors Left/Right/Sort Merge Joins cannot follow an existence join Merge Joins require an equi-join condition: R.c = S.c

ASE 12.0 offers the initial implementation of Merge Joins - with some limitations: The current implementation does not use the Pipe Manager Sorting for sort-merge joins is done in tempdb Make sure the optimizer picks the right plan - if not, there can be excessive I/O to tempdb tempdb requirements are greater - be prepared to extend tempdb if 1105 errors occur on the system segment Pipes are being implemented in a subsequent release The parent writes to the pipe (in memory) and the child reads from it (resulting in less I/O in tempdb) Limitations inherent in the current implementation

Merge Joins are implemented within the constraints of the current Query Processing Engine... As a result, useful ordering properties resulting from a sort-merge operation are not propagated up the join order The optimizer evaluates subsequent join conditions without taking into consideration that the optimal path may benefit from the ordered data from a previous sort step This is called peephole optimization Significant optimizer enhancements scheduled for a subsequent release address this limitation Peephole optimization

Server-level setting: sp_configure “enable sort-merge join and JTC”, 1 Only one sp_configure parameter to enable/disable both Merge Joins and Join Transitive Closure Session-level: set sort_merge {on | off} set jtc {on | off} More on JTC in a little while... Merge Joins are disabled by default, but can be enabled dynamically

Join type can be forced using Abstract Plans in ASE 12.0 To force a Nested Loop Join: ( nl_g_join ( ( derived_table1 ) ( derived_table2 ) … ( derived_tableN )) To force a Merge Join: ( m_g_join ( ( derived_table1 ) ( derived_table2 )) Abstract Plans allow you to “hard code” the query plan without altering the query text - assuming you think you can outsmart the optimizer Forcing a Merge Join using Abstract Plans

sp_configure “number of sort buffers” determines memory allocation for sorting Extremely important for sort-merge join performance The default configuration is only 500 sort buffers and is much too small for most applications Can be set as high as 32,767 buffers, possibly at the expense of other memory (depends on cache configuration) Try various settings to see what works best Can be configured dynamically to suit application needs: sp_configure “number of sort buffers”, 20000 Make sure to configure “number of sort buffers” appropriately!

The optimizer factors in the cost of internal vs. and external sorts If “number of sort buffers” >= the number of pages to be sorted, sorting is done in cache - this is an internal sort If “number of sort buffers” < the number of pages to be sorted, intermediate results are written to disk - this is an external sort Since there are not enough buffers available to perform the entire sorting operation in memory, intermediate results are written to disk The optimizer knows this is more expensive How do sort buffers affect the optimizer’s cost estimates?

F pages of input are read at a time, sorted, and a “merge run” of F pages is written to disk. This is repeated until there are no more input pages. A “merge tree” is created to merge the runs created phase 1 Two phases of an external sort 7 5 9 1 5 7 1 9 1 5 7 9

Assume there are 100 pages to be sorted with only 10 sort buffers available for sorting 10 merge runs of 10 pages each are written out in phase 1 Phase 2 merges the 10 merge runs Each page is read only twice Thus, if the number of pages to be sorted is P, the number of sort buffers necessary to process the table reading each page only twice is sqrt(P) An internal sort is always cheaper than external, but there are not always enough sort buffers available for large tables Example: External sort and number of sort buffers

Merge Joins are very appropriate for DSS where parallel queries provide maximum benefit Parallel Merge Join processing is different from Nested Loop Tables to be merged are horizontally partitioned by identical non-overlapping key ranges based on a distribution map Each worker process is assigned to a sorted range of values called a “partition” (not necessarily a partitioned table) # of partitions depends on availability of worker processes Once partitioning is complete, each worker process executes a “serial” merge join on the range assigned to it Parallel Merge Joins

Worker Process 1 Range 1 - 5 Worker Process 2 Range 6 -10 Worker Process 3 Range 11 - 15 Worker Process 4 Range 16 - 20 Assigning worker processes to partitions • Table R Table S 1 1 6 6 11 11 16 16 2 2 7 7 12 12 17 17 2 3 8 8 13 13 18 18 3 4 9 9 14 14 19 19 5 10 10 15 15 20

Merge runs generated in phase 1 of an external sort are horizontally partitioned based on the distribution map Parallel Merge Join requiring sorting Range 1-5 1 5 1 5 7 5 9 1 7 9 7 9 Range 6-10

The distribution map describes the workload (in pages) for each parallel worker process Contains n partitions, where n = number of worker processes Each partition represents the workload and boundary for a single worker process (one assigned to each range of values) The distribution map is generated based on the availability of distribution statistics (the histogram in sysstatistics) If the join columns have histograms, they are combined and the result is used to generate the distribution map A histogram is an attribute of a column not an index, so make sure to add histograms on non-indexed join columns The Distribution Map

Performance depends on each worker process being assigned an equal amount of work Most accurate distribution map is generated at compile-time, when distribution statistics are available on both of the joined columns If statistics are not available, the distribution map is generated at run-time Distribution map is based on the table with the most rows (heuristic) The distribution of data in the other table is ignored while distributing workload to the worker processes This leads to sub-optimal plans because the workload is not distributed evenly amongst the worker processes Compile vs. Run-time distribution map

Run-time distribution map is often sub-optimal Requires sampling, which is expensive and requires more i/o Relying on the table with the most rows is not usually accurate May require a run-time adjustment Use optdiag to check if there are histograms on the join columns - especially minor index attributes and non-indexed columns If not, make sure to generate histograms with update statistics update statistics table_name (column_name) using n values update index statistics table_name [index_name] using n values Avoid a run-time distribution map by generating statistics

Help the optimizer pick the right plan Make sure there are statistics on the join columns Add additional cells for a more accurate histogram Default is 20 cells - does not accurately represent data skew More cells means a more descriptive histogram, but may increase compile time and procedure cache requirements Useset table countwhen joining > 4 tables Configure a 16K I/O pool Make sure number of sort buffers is optimally configured May need to increase procedure cache if 701 errors occur Merge Join performance tips

Greater parallel degree does not always mean better performance Nested Loop Joins require the product of the parallel degree needed to access each table on a stand-alone basis Parallel degree for Merge Joins is based on the distribution map, not the optimal parallel degree to access each table Start by setting parallel degree = # of partitions in the table with the most partitions Settings less than this may result in serial Nested Loop Joins # of partitions should equal # of devices on target segment Increase parallel degree only if helpful or set it for each query Don’t over-configure parallel degree

Merge Joins Join Transitive Closure Predicate Factoring and Transformation Like Optimization Presentation Map

Join Transitive Closure (JTC) extends two equi-joins with a shared column to a third equi-join between “unshared” columns In other words, if A.a = B.b and B.b = C.c, intuitively A.a = C.c Only considered for equi-joins Additional joins are created for the optimizer to consider and added to those explicitly stated in the query A transitive join may result in a more efficient join order and plan A join order made available by a transitive join may or may not be chosen, but at least the optimizer can consider it What is Join Transitive Closure?

select * from t1, t2, t3 where t1.A = t2.B and t2.B = t3.C and t3.C = 1 By default, the only possible joins are {t1, t2, t3} and {t3, t2, t1} {t1, t3, t2} and {t3, t1, t2} are not possible Note, no relationship between t1.A and t3.C is explicitly stated When JTC is enabled in ASE 12.0, the optimizer knows that since t1.A = t2.B and t2.B = t3.C, then t1.A = t3.C Thus, {t1, t3, t2} is added to the join enumeration process where all join orders are evaluated and the cheapest is chosen All three tables can also limit their scans by the SARG t3.C = 1 JTC Example

Transitive Closure for SARGs: select p_partkey, l_quantity, p_type, l_shipdate from part, lineitem where l_partkey = p_partkey and p_partkey = 1 Transitive Closure for SARGs adds the following SARG: and l_partkey = 1 Transitive Closure for SARGs is beneficial because it allows the optimizer to consider an index on lineitem.l_partkey, if one exists JTC is not the same as Transitive Closure for SARGS (added in 11.X)

dbcc traceon(3604, 302) go select n_name nation, datepart(year,o_orderdate) year, sum(l_extendedprice * (1-l_discount) - ps_supplycost * l_quantity) sum_profit from part, supplier, lineitem, partsupp, orders, nation where s_suppkey = l_suppkey and ps_suppkey = l_suppkey and ps_partkey = l_partkey and p_partkey = l_partkey and o_orderkey = l_orderkey and s_nationkey = n_nationkey and p_name like '%green%' group by n_name, datepart(year,o_orderdate) order by n_name, datepart(year,o_orderdate) desc Enable the 302 trace flag to see if JTC is taking place

In this case, two transitive joins were discovered: Added transitive closure equijoin clause supplier.s_suppkey = partsupp.ps_suppkey Added transitive closure equijoin clause partsupp.ps_partkey = part.p_partkey ******************************* Beginning selection of qualifying indexes for table 'part', (remaining output deleted…) All transitive joins are printed at the top of the 302 output

QUERY PLAN FOR STATEMENT 1 (at line 2). ... STEP 1 ... FROM TABLE part Nested iteration. Table Scan. ... FROM TABLE partsupp Nested iteration. Index : partsupp_pkey showplan lists the tables in the order they are joined

Analysis of the sample query plan • Since partsupp follows part in the showplan output, it is clear that the second transitive join was chosen by the optimizer • As it turns out, the first transitive join was not used • The optimizer may discard a transitive join if it is not helpful • JTC can be disabled using set jtc off • When I disabled JTC, the join order changed because the transitive joins were not generated • The plan that was generated required several worktables which consumed a lot of tempdb (necessary to extend tempdb due to 1105s) • The query ran for much, much longer indicating that the optimal plan was achieved only with JTC enabled

JTC Internals • A transitivity matrix is generated internally listing the connected columns • Assume 1 = true and 0 = false for explicitly joined tables • Tables are not joined to themselves

DM211 Query Processing Enhancements in Adaptive Server Enterprise 12.0

DM211 Query Processing Enhancements in Adaptive Server Enterprise 12.0

Presentation Transcript

Session No. DM131 New Optimizer and Query Execution Options in Adaptive Server Enterprise 12.0

ASE103 Adaptive Server Enterprise Migration to Linux

Parallel Query Processing in SQL Server

Eddies: Continuously Adaptive Query Processing

Adaptive Query Processing: Progress and Challenges

Adaptive Query Processing

Adaptive Query Processing in the Looking Glass

Adaptive Query Processing with Eddies

Adaptive Query Processing for Data Aggregation:

IMA Release 12.0 Enhancements

Adaptive Query Processing

Adaptive Query Processing

Adaptive Query Processing

BI515: Query Processing in Adaptive Server IQ

Adaptive Query Processing

Eddies: Continuously Adaptive Query Processing

Eddies: Continuously Adaptive Query processing

E118 User Created Tempdbs in Adaptive Server Enterprise

Adaptive Server Enterprise: The e-Business Database

Adaptive Query Processing

Adaptive Query Processing: Progress and Challenges

Adaptive Query Processing in Data Stream Systems