Query Tree Disassembly

9 Query Tree Disassembly

Why are Query Plans Important? • Query plans describe every important detail about • how a query was executed, or will be executed. • You suspect (or know) a query runs poorly? • Query plans can help you investigate the issue • You’re not sure you have the best indexes for a query? • Query plans tell you what indexes are used in a query

Query Execution Phases in IQ • Upon submission of a query: • Syntax and Permissions are Checked • Performed by the ASA front-end • Query is Parsed • Broken down into optimizable objects • Query is Optimized • Most efficient execution method that fits within available resource constraints is determined • Query Plan is created • Query is executed • Resources are cleaned up

Sybase IQ Query Execution Server Front End Shared with ASAnywhere (ASA) Handles Connections Parses Incoming Statements Cross-DB Decomposition (CIS) Security Checking Java Support Stored Procedures Optimizer Predicate Inference Predicate Selectivity Estimation Join Optimization Grouping Algorithm Selection Subquery Optimization Index Access Selection Run-Time Engine Prefetch Manager Predicate Execution Tuple (Row) Projection Join Execution Grouping Execution Sorting Subquery Execution

How do I See a Query Plan? • Query Plans are Generated - • In the IQ Message File • As HTML Pages (if requested) in a separate file • Set Option Commands used with Query Plans • Usually set as Temporary Options • Example: Set Temporary Option Query_Detail = ‘On’; • In 12.5 and later, Query_Plan option is ‘ON’ by default • Query Plan appears in the IQ Message File • You may turn this option off, if desired

Database Options for Query Plans • Query_Plan = ‘On’ (default ‘On’) • Writes a basic query plan in the IQ Message File • May not provide enough detail • Query_Plan_As_HTML = ‘On’ (default ‘Off’) • Creates a basic query plan as an external HTML file • Query_Detail = ‘On’ (default ‘Off’) • Adds more detailed information into query plans (Recommended) • Has no effect unless used with Query_Plan or Query_Plan_As_HTML

More Database Options for Query Plans • Query_Plan_After_Run = ‘On’ (default ‘Off’) • Delays creating the query plan until query completes • Query_Timing = ‘On’ (default ‘Off’) • Includes execution times at each stage of the query • Must be used with Query_Plan_After_Run • Query_Name = ‘query_name’ (default ‘’) • Prints name provided in the query plan, and as part of the file name for HTML query plans • NoExec = ‘On’ (default ‘Off’) • Creates query plan but does not execute the query

HTML Query Plans • Since HTML plans are easier to read these will be used • in this presentation • Query_Detail will also be set ‘On’ for all examples • When you specify Query_Plan_As_HTML = ‘On’ • File will be written in the directory with the Database File (.db) unless the Query_Plan_As_HTML_Directory option is set • The File name that is produced - • Prefaced with User Name, Date and Time • Also contains the “Query_Name” if specified • Has a “.html” file extension and can be opened with a browser

Simple Example of HTML Query Plan File • Set temporary option Query_Name =‘stanton’; • Select count(*) from Central_Fact_Table; • HTML file generated in Database File Directory

Query Plan (HTML) Output – Two Parts 1) Query Tree 2) Query Detail

HTML Query Plan – The Query Tree Query_Name Estimated RowsMoving Up the Tree Node Type and Number The Node Number (underlined) is a hot linkto the Nodes below in the Query Detail

Query Tree and Nodes • Query Tree is a visual representation of the Query Plan • Consists of Nodes representing execution plan steps • Nodes are numbered sequentially and identified by type • Some optimizations can eliminate nodes, so there may be gaps in the Node number sequence • Query Tree also includes the estimated number of rows flowing up each link between nodes (or the actual number of rows flowing if Query_Plan_After_Run was ‘On’) • The Query Tree is displayed inverted • In detailed plans, the top node in the Query Tree is the Root Node • Leaf Nodes are typically at the bottom edges of the tree

Leaf Nodes A More Complex Query Tree

Node Detail - General • Header Shows the Node Number and Node Type • Number is a hot link back to this node within the Query Tree • Child Node(s) (if any) • Node(s) that ‘feed’ this node • Hot link to a Child Node (Leaf Nodes rarely have children) • Estimated Result Rows • Number of rows the optimizer thinkswill come out of this Node • Actual Result Rows are shown withQuery_Plan_After_Run = ‘On’

Root Scrolling Cursor Store Filter Group By Sort Hash Order By Store Semi-join Filter Filler Leaf Join Hash (HJ) Hash Pushdown (HJPD) Nested Loop (NL) Nested Loop Pushdown (NLPD) Sort-Merge (SM) Sort-Merge Pushdown (SMPD) Cartesian Subquery Union All Some Node Types (there are others)

Node Details • Some Nodes Contain a Wealth of Information • Detail in other nodes is not as important • Nodes to Concentrate On - • Root • Leaf • Join • Group By • Filter

Root Node (Query_Detail = ‘ON’) • Node Detail • Child Node • Query Name (if any) • User Name (login) • Temp Space Used • Num of Users in Server • Num of CPUs • Database Options Set • Outputs from this node

Examining the Root Node • Effective Number of Users • This is the count of active cursors, but one user can have multiple cursors open simultaneously • Many active users will have an effect on query response • Optimizer may choose a different Query Plan depending upon the number of CPUs, the available IQ cache space, and the effective number of users • Number of CPUs • This is the number of CPUs that IQ believes are available • Because of hyper-threaded CPUs, or the -iqncpus server startup option, this number may not match the actual number of CPUs on the system

Examining the Root Node - 2 • Temp Space Usage • This estimate is the total over the life of the query • The maximum at any point in time may be much lower • Is this estimated usage greater than the Temp Cache Size? • It may indicate Temp Cache is paging to disk • Database Options • A user or DBA may have set options influencing joins, aggregation, or other aspects of optimizer and server behavior • Any options set to non-default values which may affect query performance or behavior are displayed with their current value

Leaf Nodes • A Leaf Node typically represents access to a Table in IQ • Several Types of Leaf Nodes: • (Regular) Leaf • Aggregation Leaf • Grouping Leaf • Distincting Leaf • Ordered Leaf • All run-time access to indexes happen in Leaf Nodes

Leaf Nodes - 2 • All Leaf Nodes have a wealth of information • Table Row Count • Condition(s) - WHERE clause search predicates local to this table • Estimated Selectivity of predicates • Usefulness of the predicate • Index used by the predicate • Columns from this table referenced by this query with: • Data type • Indexes available • Count of NULLs in the column, if any • Count (or estimate) of distinct values in the column • Estimate of distinct value count after all conditions evaluated

Leaf Nodes - 3 • The portion of a table projected by a query • is referred to as the ‘found set’ • The optimizer knows that Table ‘A’ has <N> rows • If there are no predicates on columns in this table, then we are done • We just need the column(s) to be projected from this table • Tuples (Rows) are passed up the Query Tree to the next Node • The found set will be all <N> rows in the table

Leaf Nodes - 4 • The portion of a table projected by a query • is referred to as the ‘found set’ • If there are predicates on this table, then the found set will be the set of rows that satisfy all of those predicates • In this case, the optimizer must determine: • 1) What portion of the table will satisfy each predicate (Selectivity) • 2) Collective effect of all the predicates (Estimated Result Rows) • 3) The best order to execute the predicates (Usefulness) • 4) What effect these predicates will have, both individually and collectively, on the estimate of the number of distinct values for each column projected after all the conditions evaluated

Leaf Nodes - 5 - Selectivity • When NO ‘usable’ index exists on a column in a predicate, • then the optimizer estimates the selectivity based entirely • on the type of predicate (partial list): • Percent of tablePredicate Type (Estimated) Selectivity Equality (=) 20% 0.2000000 Open Range (>) 40% 0.4000000 Between 40% 0.4000000 Like (%) 20% 0.2000000 Inter-column equality (t.a = t.b) 30% 0.3000000 Inter-column comparison (t.a < t.b) 50% 0.5000000 • (You will see these Selectivity estimates in Query Plans)

Leaf Nodes - 6 - Index-based Selectivity • If a ‘useable’ index does exist on a column in a predicate • it may enable more accurate selectivity information • Depends on the type of predicate and the index type(s) • May also depend on the number of values in the column which satisfy that predicate • If an index was usable, the Selectivity will usually look like a precise number without all those trailing zeros: 0.34982376 • Beware: functions used on a column may negate the ability to use an index for execution or for selectivity estimation! • SUBSTRING( t.a, 5, 5 ) = ‘homes’ • May result in a Column Scan • The optimizer will use the default rule for rows returned (20% of the rows for an Equality search)

Leaf Nodes - 7 - Usefulness • Usefulness is used to rank the predicates for execution order • Predicate with the highest value (score) is executed first • Usefulness values run from 0.0 to 10.0 (10.0 is the most useful) • Remaining predicates executed in descending order of Usefulness • What determines a predicate’s Usefulness? • How far the predicate will reduce the found set (Selectivity) • Type of predicate • Index(es) available on the column • Optimized FP indexes on the column • How fast the predicate can be executed • How many resources the predicate execution will require

Leaf Nodes - 8 - Example Query • Single table (147 million rows) - 5 Predicates • What is the best index to use to execute each of them ? • What is the best order to execute them ? • SELECT PAID_AMT • FROM CENTRAL_FACT_TABLE • WHERE ELIG_BEG_DATE >= '2000-01-01' • AND MEMBER_GENDER = 'M' • AND MEMBER_DOB < '1950-01-01' • AND COMPLICATION = 'OUTPATIENT' • AND SERVICE_ROLLUP = 'Neoplasms'

Leaf Nodes - 9 - Predicate Types • Equality and Inequality ( = , != ) • Includes IN and NOT IN lists • Ranges ( <, >, <=, >=, Between) • Including NOT (!) • Like ( % ) or Not Like • Contains • IS [NOT] NULL

Leaf Nodes - 10 - Predicate and Index • Equality and Inequality ( = , != ) • Includes IN and NOT IN lists • Ranges ( <, >, <=, >=, Between) • Like ( % ) or Not Like • Contains • IS [NOT] NULL LF and HG DATE, HNG, LF and HG FP and WD Null Bit

Leaf Nodes - 11 - HG and LF Indexes • These indexes provide exact meta-data to the optimizer: • The number of distinct values for a column • The number of rows for each value • They also provide fast run-time access to the set of cells that contain a particular value

Leaf Nodes - 12 - HG and LF Indexes • When columns with these indexes are used in certain predicates they will provide the optimizer with the exact number of rows that will satisfy the predicate • Equality, inequality, IN, and NOT IN predicates • Range predicates (if the range does not contain too many values) • If a range predicate is too wide to get an exact row count, the LF or HG index will still provide the exact count of distinct values within the range • See the option, HG_SEARCH_RANGE,to change ‘too wide’

Leaf Nodes - 13 - FP(1) and FP(2) • These optimized FP indexes can also provide exact • meta-data to the optimizer: • The number of distinct values for a column • The number of rows for each value • They do not provide fast run-time access to the set of cells • that contain a particular value, as the HG or LF indexes do. • They can dramatically reduce the storage required for a • column, and thereby reduce I/O costs for projection.

Leaf Nodes - 14 - FP(1) and FP(2) • For predicates involving expressions on columns, or for • predicates where index except an FP can be used, these • optimized FP indexes can: • Provide an exact selectivity to the optimizer • Greatly accelerate the execution of such predicates • Examples of predicates affected: • SUBSTR(T.X, 1, 2) = 'TE’ • T.NAME LIKE ‘%B%’

Leaf Nodes - 15 - Minimize_Storage Option • New in 12.5 was a database option, MINIMIZE_STORAGE, • to auto-create optimized FP indexes for all columns, without • requiring IQ UNIQUE be declared on each column. • Late addition to 12.5 Release • Option description did not make first printing of any 12.5 IQ Docs • Look in the Release Bulletin for 12.5 for details • When set ‘ON’, MINIMIZE_STORAGE: • Equivalent to IQ UNIQUE(255) for all columns in tables you create • As data is loaded an FP(1) index will ‘roll over’ first to an FP(2) and then to a flat (unoptimized) FP index as necessary • Highly recommended for all tables < 1000 columns

Leaf Nodes - 16 - FP(1) and FP(2) • NOTE: In 12.6 and later, the optimized FP indexes will be • used to accelerate the evaluation of single column • expressions, even if they are not used in predicates. • See the 12.6 documentation on the option, • FPL_Expression_Memory_KB,for details. • Examples of expressions affected: • SELECT SUBSTR(T.X, 1, 2), (1.0 - T.DISCOUNT)

Leaf Nodes - 17 - Other Indexes • Except for special cases, the other indexes can be used in predicates, but: • They will not provide exact counts of rows to the optimizer • Optimizer will use the default selectivity (discussed earlier) • Later you will see how to help the optimizer avoid bad estimates • Some of the special cases are: • CONTAINS predicate with a WD index • DATEPART function predicate on a column with a DATE index

Leaf Nodes - 18 - Query Example • Back to our example query … • SELECT PAID_AMT • FROM CENTRAL_FACT_TABLE • WHERE ELIG_BEG_DATE >= '2000-01-01' • AND MEMBER_GENDER = 'M' • AND MEMBER_DOB < '1950-01-01' • AND COMPLICATION = 'OUTPATIENT' • AND SERVICE_ROLLUP = 'Neoplasms'

Leaf Nodes - 19 - Example Leaf Node This is a largeLeaf Node ! (about 2 pages) We will examine it in parts

Estimated Result Set Original Table Size Leaf Nodes - 20 - Example Leaf Node: Row Info (Information here is hidden for clarity)

0.199 % of the table is returned That is very useful!! Leaf Nodes - 21 - Example Leaf Node: Predicate Info • Each predicate for the table has been analyzed: • If an HG, LF, FP(1), orFP(2) index exists it provides Selectivity statistics (number rows returned as a Percentage of the Table) • Usefulness was assigned based partly on the selectivity • The smaller the selectivity, the more useful the predicate

Leaf Nodes - 22 - Example Leaf Node: Predicate Info Selectivity Usefulness Index Used

Leaf Nodes - 23 - Example Predicates • Predicates ranked by Usefulness: • Predicate • ========================================== • SERVICE_ROLLUP = 'Neoplasms' .001 9.99 • MEMBER_GENDER = 'M' .389 9.61 • ELIG_BEG_DATE >= '2000-01-01' .764 6.23 • MEMBER_DOB < '1950-01-01' .389 5.61 • COMPLICATION = 'OUTPATIENT' .200 2.80 Selectivity Usefulness This looks like a ‘Guess’ – 20% for an Equality

Leaf Nodes - 24 - Example Estimated Result Rows • In 12.5 and earlier, the ‘Estimated Result Rows’ from this table • is the product of the Selectivity estimates from each predicate • This implies that all predicates are assumed to be uncorrelated • Condition 1: 149,178,536 * 0.001995 = 297,587 rows • Condition 2: 297,587 * 0.389099 = 115,791 rows • Condition 3: 115,791 * 0.764858 = 88,564 rows • Condition 4: 88,564 * 0.389793 = 34,521 rows • Condition 5: 34,521 * 0.200000 = 6,904 rows

Leaf Nodes - 25 - Example Estimated Result Rows • NOTE: In 12.6 and later, the ‘Estimated Result Rows’ from this table • will be the actual number of rows that satisfy the entire set • of predicates, because all simple predicates will be executed • early in the optimization process. • For most predicates, there will no longer need to be an • assumption that predicates are statistically uncorrelated.

Leaf Nodes - 26 - Column Information • There is more detail on each column referenced • This shows how much IQ knows about the column’s data • 8578 distinct values in column • 1096 values satisfy the >= condition • Column stored as FP(2) • HG and Date Index on column elig_beg_date >= '2000-01-01'

Leaf Nodes - 27 - Back to Our Query Example • Was the Optimizer Estimate correct? • Use Query_Plan_After_Run option • Delays the printing of the Query Plan until Query Completes • New details appear in the Query Plan After Run: • Estimated and Actual Row Counts at each Node • Estimated and Actual Temp Space Used • For predicates with multiple algorithms available, the algorithm actually used

Leaf Nodes - 28 - Example - Query Plan After Run • Row Estimate • Off by a factor or 72.6 • Temp Space Usage • Just slightly below actual • Questions … • Why is row estimate inaccurate? • Is this a problem?

Row Estimates • Several Reasons for Wrong Estimates • Method used to derive estimate is statistical, not perfect • Assumes even distribution of data • Assumes predicates are not correlated • One Predicate was a Guess (no useable index on that column) • Importance of HG, LF, FP(1), and FP(2) indexes cannot be stressed too highly • Inaccurate Estimates May NOT be a Problem • Only a problem if those inaccurate estimates cause the optimizer to select a poorly performing query plan • Nodes using HASH tables are sometimes sensitive to row counts • Can be negative performance implications for underestimates • Sort Nodes are not sensitive to inaccurate estimates

Leaf Nodes - 29 - Corrective Action and Results • Created LF Index on the COMPLICATION column for this 149 million row table • Took 110 Seconds (Sun 64 w/ 900 MHZ cpu) • New index is more Selective (and Useful) • Estimates are now much closer to reality • This is about as close as we can get in 12.5

Leaf Nodes - 30 - Query Plan After Index Added

Query Tree Disassembly

Query Tree Disassembly

Presentation Transcript

PC assembly / disassembly workshop

Understanding Simulators Using Tree-Query Language

Product Disassembly

Remanufacturing, disassembly…..

Query Tree Disassembly

Mining Tree-Query Associations in a Graph

Active Disassembly

Mining for Tree-Query Associations in a Graph

Product Disassembly

Engine Disassembly

ideapad U350 Hardware Disassembly

Disassembly

Assembly / Disassembly Machines

A Query-Based Routing Tree in Sensor Networks

No Disassembly Required!

Spatial Query Processing using the R-tree

Product Disassembly

Code Disassembly Techniques

Understanding Simulators Using Tree-Query Language