When Good Optimizers Make Bad Choices (Sometimes)

Overview Introduction – the Nature of the Game The CBO versus the Human “Player” Choices Unavailable to the CBO Objections to Human Tuning, Addressed

Introduction – The Nature of the Game Think of tuning as a game (“Deep Blue” versus Kasparov?): What advantages does the CBO have, versus a human “player” (tuner)? Ability to crunch lots of numbers, well, to explore many options. What disadvantages does the CBO have? Very little time to optimize. Limited information. Fewer options (different, more-limiting rules than apply to a human player). Less strategic sophistication than a good human tuner.

Reasons the CBO Misses Good Choices CBO is not allowed to execute queries to test alternative plans, or to learn more about selectivities than its best guess from statistics (optimizer_dynamic_sampling offers an exception to this rule, though!) CBO searches a tiny fraction of all possible plans, for truly complex SQL, and applies a limited strategy to prune the search. CBO cost function imperfectly predicts the relative costs of alternative plans. CBO lacks insight into the freedom to change corner-case behavior.

Unfair Advantages Humans Enjoy CBO is not allowed to execute queries to test alternative plans, or to learn more about selectivities than its best guess from statistics: While the CBO must try to optimize every query, the human can focus on a couple of dozen queries. We can test alternatives as long as necessary and only choose alternatives that we prove to be faster. With proven, tested performance of a chosen alternative, a human need never lose (although we may “tie” with the CBO).

Unfair Advantages Humans Enjoy CBO searches a tiny fraction of all possible plans, for truly complex SQL, and applies a limited strategy to prune the search: An informed human understanding of the tuning problem is much deeper, enabling more sophisticated strategies to locate the optimum, or a near-optimum, in a huge search space. Kasparov can (sometimes) beat even the brute force of Deep Blue with a superior understanding of strategy, and SQL optimization, so far, favors humans more than chess does.

Unfair Advantages Humans Enjoy CBO cost function imperfectly predicts the relative costs of alternative plans. The cost function really has two parts: It predicts how many index entries, table rows, and blocks of every object the execution plan will reach during execution – this is the logical cost.

Unfair Advantages Humans Enjoy CBO cost function imperfectly predicts the relative costs of alternative plans. The cost function really has two parts: From the logical cost, it must estimate a physical cost, which is mostly just the runtime, but weighted, potentially, in favor of plans that avoid bottlenecks that harm other processes.

Unfair Advantages Humans Enjoy CBO cost function imperfectly predicts the relative costs of alternative plans. The cost function really has two parts, logical and physical cost estimation: A bad logical cost estimate almost guarantees a bad physical cost estimate (often by orders of magnitude!). However, even a good logical cost estimate does not guarantee a good physical cost estimate.

Unfair Advantages Humans Enjoy CBO cost function imperfectly predicts the relative costs of alternative plans. Logical costs: Even with perfect statistics, the selectivity of a complex filter is impossible to predict, and filter selectivities are the main determiner of logical costs. Humans need not predict – we can measure! This is a huge advantage!

Selectivity-Error Examples SQL> @exq8 test1 1 SELECT count(Data_Col) FROM DTow_Talk_Child 2* WHERE Date_Col >= TO_DATE('2005/01/01','YYYY/MM/DD')-10 .1 SELECT STATEMENT c=196, R=1 ..2 SORT AGGREGATE c=_, R=1 ...3 TABLE ACCESS BY INDEX ROWID 1*DTOW_TALK_CHILD c=196, R=10009 ....4 INDEX RANGE SCAN DTOW_TALK_CHILD_DATE_COL: Date_Col c=27, R=10009 SQL> @test1 3 ; COUNT(DATA_COL) --------------- 10000

Selectivity-Error Examples SQL> @exq8 test2 1 SELECT count(Data_Col) FROM DTow_Talk_Child 2 WHERE Date_Col >= TO_DATE('2005/01/01','YYYY/MM/DD')-10 3* AND Code_Col = 'OP' .1 SELECT STATEMENT c=7, R=1 ..2 SORT AGGREGATE c=_, R=1 ...3 TABLE ACCESS BY INDEX ROWID 1*DTOW_TALK_CHILD c=7, R=46 ....4 INDEX RANGE SCAN DTOW_TALK_CHILD_CODE_COL: CODE_COL c=1, R=4600 SQL> @test2 4 ; COUNT(DATA_COL) --------------- 3700 Elapsed: 00:00:00.14

Selectivity-Error Examples SQL> @exq8 test3 1 SELECT count(Data_Col) FROM DTow_Talk_Child 2 WHERE Date_Col <= TO_DATE('2005/01/01','YYYY/MM/DD')-90 3* AND Code_Col = 'OP' .1 SELECT STATEMENT c=7, R=1 ..2 SORT AGGREGATE c=_, R=1 ...3 TABLE ACCESS BY INDEX ROWID 1*DTOW_TALK_CHILD c=7, R=4195 ....4 INDEX RANGE SCAN DTOW_TALK_CHILD_CODE_COL: CODE_COL c=1, R=4600 SQL> @test3 4 ; COUNT(DATA_COL) --------------- 110

Dynamic Sampling in the CBO Dynamic sampling allows the CBO to sample application data at parse time to better estimate selectivities. Dynamic sampling is not the default, except low-level sampling where stats are wholly missing. Dynamic sampling slows parsing and sometimes prevents wrong optimizer choices. It can be activated for the system, the session, or with individual hints.

Now Try Dynamic Sampling! SQL> alter session set optimizer_dynamic_sampling=9; Elapsed: 00:00:00.07 SQL> @exq8 test2 1 SELECT count(Data_Col) FROM DTow_Talk_Child 2 WHERE Date_Col >= TO_DATE('2005/01/01','YYYY/MM/DD')-10 3* AND Code_Col = 'OP' .1 SELECT STATEMENT c=7, R=1 ..2 SORT AGGREGATE c=_, R=1 ...3 TABLE ACCESS BY INDEX ROWID 1*DTOW_TALK_CHILD c=7, R=2900 ....4 INDEX RANGE SCAN DTOW_TALK_CHILD_CODE_COL: CODE_COL c=1, R=4600 COUNT(DATA_COL) --------------- 3700 Elapsed: 00:00:00.59

Dynamic Sampling SQL> alter session set optimizer_dynamic_sampling=9; COUNT(DATA_COL) --------------- 3700 Elapsed: 00:00:00.59 Compare this with default optimizer_dynamic_sampling=1! QL> @test2 4 ; COUNT(DATA_COL) --------------- 3700 Elapsed: 00:00:00.14

Dynamic Sampling Trade-offs In the example, level-9 dynamic sampling still mis-estimated the selectivity of the combined conditions by 22%, did not change the execution plan, and made the parse take 350ms longer. In general, dynamic sampling (even at level 10!) is a great tool for automatically generating potentially-improved execution plans, but higher across-the-board parsing costs explain why this is not the default!

Dynamic Sampling – Having Your Cake and Eating it, Too The cost of dynamic sampling is not a problem for a one-time tuning exercise, but why pay the cost at every parse? Use dynamic sampling to generate more-informed cost-based optimization during the tuning exercise. If the result is great performance, then you’re done, without figuring out the best plan, yourself. Use hints to force the great plan (found in the one-time tuning exercise) without recurring costs for dynamic sampling at every parse! This is like getting Deep Blue to choose your move!

Unfair Advantages Humans Enjoy CBO cost function imperfectly predicts the relative costs of alternative plans. Physical costs: Relative CPU times for different sort of logical I/O vary widely, in hard-to-predict ways. Hit ratios of the different objects involved in SQL execution vary widely, in hard-to-predict ways. Humans need not predict – we can measure! We can even measure physical costs without knowing logical costs! This is a huge advantage!

Unfair Advantages Humans Enjoy CBO cost function imperfectly predicts the relative costs of alternative plans: Humans need not predict – we can measure! This is the moral equivalent to being allowed to “take back” moves in a game of chess, as many moves as we want, as often as we want, as we learn the consequences of those moves. This is a colossal advantage!

Measurement Example 7 … FROM DTow_Talk_Child C, --<plan for test4> 8 DTow_Talk_Parent1 P, --This plan is unchanged by 9 DTow_Type TC, --dynamic sampling!!! 10 DTow_Type TP 11 WHERE C.FKey1_ID=P.PKey_ID 12 AND C.Type_ID=TC.Type_ID 13 AND P.Type_ID=TP.Type_ID 14* and TC.Description='VERY_RARE' .1 SELECT STATEMENT c=3023, R=1 ..2 SORT AGGREGATE c=_, R=1 ...3 HASH JOIN c=3023, R=333333 ....4 TABLE ACCESS FULL 4*DTOW_TYPE c=2, R=3 ....4 HASH JOIN c=3020, R=333333 .....5 TABLE ACCESS FULL 2*DTOW_TALK_PARENT1 c=250, R=100000 .....5 HASH JOIN c=2559, R=333333 ......6 TABLE ACCESS BY INDEX ROWID 3*DTOW_TYPE c=2, R=1 .......7 INDEX RANGE SCAN DTOW_TYPE_DESCRIPTION: DESCRIPTION c=1, R=1 ......6 TABLE ACCESS FULL 1*DTOW_TALK_CHILD c=2555, R=1000000

Measurement Example SQL> @mysid9 SESSION_ID Oracle_PID Client_PID ---------- ------------ ------------ 78 3705 11178 Elapsed: 00:00:00.15 SQL> @test4 15 ; CDATE PDATE TCDESC TPDESC ---------- ---------- ---------- ---------- 1000 1000 1000 1000 Elapsed: 00:00:01.14 SQL> @reads9 LIO PIO ---------------------------- ---------------------------- Logical Reads = 18484 Physical Reads = 0

Measurement Example SQL> @exq8 test6 1 select count(*) from (SELECT /*+ first_rows */ …rownum rn 8 FROM …--<just like test4 SQL> 16* AND TC.Description='VERY_RARE') .1 SELECT STATEMENT c=300252, R=1 ..2 SORT AGGREGATE c=_, R=1 ...3 VIEW 1* c=300252, R=333333 ....4 COUNT c=_, R=_ .....5 TABLE ACCESS BY INDEX ROWID 2*DTOW_TALK_CHILD c=3, R=3 ......6 NESTED LOOPS c=300252, R=333333 .......7 NESTED LOOPS c=252, R=100000 ........8 MERGE JOIN CARTESIAN c=252, R=100000 .........9 TABLE ACCESS BY INDEX ROWID 4*DTOW_TYPE c=2, R=1 ..........10 INDEX RANGE SCAN DTOW_TYPE_DESCRIPTION: DESCRIPTION c=1, R=1 .........9 BUFFER SORT c=250, R=100000 ..........10 TABLE ACCESS FULL 3*DTOW_TALK_PARENT1 c=250, R=100000 ........8 INDEX UNIQUE SCAN DTOW_TYPE_UNQ1: type_id c=_, R=1 .......7 INDEX RANGE SCAN DTOW_TALK_CHILD_FKEY1_ID: fkey1_id c=2, R=10

Measurement Example SQL> @mysid9 SESSION_ID Oracle_PID Client_PID ---------- ------------ ------------ 78 3705 11178 Elapsed: 00:00:00.14 SQL> @test6 -------FIRST_ROWS plan 17 ; COUNT(*) ---------- 1000 Elapsed: 00:00:03.60 SQL> @reads9 LIO PIO ---------------------------- ---------------------------- Logical Reads = 220516 Physical Reads = 0

Measurement Example SQL> @exq8 test7 1 SELECT /*+ leading(TC) use_nl(C P) index(C DTOW_TALK_CHILD_TYPE_ID) 2 index(P DTOW_TALK_PARENT1_UNQ1) */ … --<Otherwise just like test4.sql 15* and TC.Description='VERY_RARE' .1 SELECT STATEMENT c=340711, R=1 ..2 SORT AGGREGATE c=_, R=1 ...3 HASH JOIN c=340711, R=333333 ....4 NESTED LOOPS c=340470, R=333333 .....5 NESTED LOOPS c=7137, R=333333 ......6 TABLE ACCESS BY INDEX ROWID 3*DTOW_TYPE c=2, R=1 .......7 INDEX RANGE SCAN DTOW_TYPE_DESCRIPTION: DESCRIPTION c=1, R=1 ......6 TABLE ACCESS BY INDEX ROWID 1*DTOW_TALK_CHILD c=7135, R=333333 .......7 INDEX RANGE SCAN DTOW_TALK_CHILD_TYPE_ID: type_id c=843, R=333333 .....5 TABLE ACCESS BY INDEX ROWID 2*DTOW_TALK_PARENT1 c=1, R=1 ......6 INDEX UNIQUE SCAN DTOW_TALK_PARENT1_UNQ1: pkey_id c=_, R=1 ....4 TABLE ACCESS FULL 4*DTOW_TYPE c=2, R=3

Measurement Example SQL> @mysid9 SESSION_ID Oracle_PID Client_PID ---------- ------------ ------------ 78 3705 11178 Elapsed: 00:00:00.14 SQL> @test7 -----Hand-tuned plan, with hints 16 ; CDATE PDATE TCDESC TPDESC ---------- ---------- ---------- ---------- 1000 1000 1000 1000 Elapsed: 00:00:00.16 SQL> @reads9 LIO PIO ---------------------------- ---------------------------- Logical Reads = 3012 Physical Reads = 0

Unfair Advantages Humans Enjoy CBO lacks insight into the freedom to change corner-case behavior: SQL constitutes a rigorous, unambiguous spec for what the CBO must deliver, functionally, regardless of cost, without the slightest risk of wrong-rows bugs in even the most obscure corner cases!

Unfair Advantages Humans Enjoy CBO lacks insight into the freedom to change corner-case behavior: Humans understand which corner cases we can safely ignore. Humans understand that even for corner cases we cannot ignore, the current corner-case behavior specified by the SQL may be wrong, and should be changed, if it is wrong, and if it prevents access to a fast execution plan.

Corner-Case Example SQL> @exq8 test5 1 SELECT --This is unchanged by dynamic sampling!!! 2 C.PKey_ID CKey_ID, 3 C.Date_Col CDate, 4 G.PKey_ID GKey_ID 5 FROM DTow_Talk_Child C, 6 DTow_Talk_Generic G 7 WHERE C.PKey_ID=G.Generic01 8* AND C.PKey_ID=654321 .1 SELECT STATEMENT c=2170, R=1 ..2 NESTED LOOPS c=2170, R=1 ...3 TABLE ACCESS BY INDEX ROWID 1*DTOW_TALK_CHILD c=3, R=1 ....4 INDEX UNIQUE SCAN DTOW_TALK_CHILD_UNQ1: pkey_id c=2, R=1 ...3 TABLE ACCESS FULL 2*DTOW_TALK_GENERIC c=2167, R=1

Corner-Case Example SQL> @test5 9 ; CKEY_ID CDATE GKEY_ID ---------- --------- ---------- 654321 21-JAN-04 1654321 Elapsed: 00:00:01.26 SQL> @reads9 LIO PIO ---------------------------- ---------------------------- Logical Reads = 14282 Physical Reads = 0

Corner-Case Example SQL> @exq8 test8 1 SELECT 2 C.PKey_ID CKey_ID, 3 C.Date_Col CDate, 4 G.PKey_ID GKey_ID 5 FROM DTow_Talk_Child C, 6 DTow_Talk_Generic G 7 WHERE TO_CHAR(C.PKey_ID)=G.Generic01 8* AND C.PKey_ID=654321 .1 SELECT STATEMENT c=6, R=1 ..2 NESTED LOOPS c=6, R=1 ...3 TABLE ACCESS BY INDEX ROWID 1*DTOW_TALK_CHILD c=3, R=1 ....4 INDEX UNIQUE SCAN DTOW_TALK_CHILD_UNQ1: pkey_id c=2, R=1 ...3 TABLE ACCESS BY INDEX ROWID 2*DTOW_TALK_GENERIC c=3, R=1 ....4 INDEX RANGE SCAN DTOW_TALK_GENERIC_GENERIC01: GENERIC01 c=2, R=1

Corner-Case Example SQL> @test8 9 ; CKEY_ID CDATE GKEY_ID ---------- --------- ---------- 654321 21-JAN-04 1654321 Elapsed: 00:00:00.15 SQL> @reads9 LIO PIO ---------------------------- ---------------------------- Logical Reads = 9 Physical Reads = 0

Unfair Advantages Humans Enjoy CBO lacks insight into the freedom to change corner-case behavior: View-using queries are the most common example of queries with quirky corner-case behaviors (almost never anticipated by the developers) that trigger expensive execution plans.

Choices Unavailable to the CBO The CBO’s game: Tune fast. Tune at runtime, without executing SQL against application data. Take each SQL spec as gospel. Use only what can be deduced or assumed from dictionary statistics. Use the database as-is.

Choices Unavailable to the CBO Higher-level games for human tuning: Take as long as necessary to tune. Verify fixes! Learn what needs to be learned any way necessary. Consider altering corner-case behavior of the SQL. Use existing denormalizations. Educate end users to avoid unnecessary, hard problems (e.g., search on phone number, not last name.)

Choices Unavailable to the CBO Higher-level games for human tuning: Consider modifying the application: Read fewer rows. Read less frequently, or not at all. Move load off-hours, or into batch. Combine multiple statements into one. Break one statement into multiple statements.

Choices Unavailable to the CBO Higher-level games for human tuning: Consider modifying the database: Add or modify indexes. Denormalize the database (rarely necessary!). Add constraints that secure corner-case issues. Alter the database design.

Objections to Human Tuning, Addressed “The CBO’s assumptions and approximations are fine! It finds a great plan, or at least a good enough plan, 99% of the time.”

Objections to Human Tuning, Addressed “The CBO’s assumptions and approximations are fine! It finds a great plan, or at least a good enough plan, 99% of the time.” YES!

Objections to Human Tuning, Addressed “The CBO’s assumptions and approximations are fine! It finds a great plan, or at least a good enough plan, 99% of the time.” That’s about right!

Objections to Human Tuning, Addressed “The CBO’s assumptions and approximations are fine! It finds a great plan, or at least a good enough plan, 99% of the time.” Corrollary: It’s a waste of time to just tune SQL at random, or to tune all SQL!

Objections to Human Tuning, Addressed “The CBO’s assumptions and approximations are fine! It finds a great plan, or at least a good enough plan, 99% of the time.” However, that can still leave hundreds of badly tuned queries in a complex application suite!

Objections to Human Tuning, Addressed “The CBO’s assumptions and approximations are fine! It finds a great plan, or at least a good enough plan, 99% of the time.” If you find the right SQL to tune (method R or similar, runtime-based methods), you will find that most of that SQL has ready opportunities for improvement! This is not a paradox!

Objections to Human Tuning, Addressed “Most developers do not know enough to outsmart the CBO.” OK, if you are among those developers, learn more! … Or hire someone who already knows how;-)

Objections to Human Tuning, Addressed “The CBO only fails when the database (or application) design is bad.” Even if this was true, so what?! Bad design happens! (Tune in the real world!) We (in this room, today) almost never have the chance to start at, and completely control, the design! Tuning slow SQL is a great way to uncover needed fixes to bad design!

Objections to Human Tuning, Addressed “Even if you can improve the query, today, tuning is a bad idea because hints will over-constrain the CBO in the future.” Less than half of SQL tuning even involves beating the CBO at its own game! Add indexes, change the application, denormalize,… These changes do not constrain the future CBO!

Objections to Human Tuning, Addressed “Even if you can improve the query, today, tuning is a bad idea because hints will over-constrain the CBO in the future.” Even when you beat the CBO at its own game, you don’t always need hints to do it! E.g., convert subqueries to inline views. E.g., convert view-using queries to queries of simple tables. These changes do not constrain the future CBO!

When Good Optimizers Make Bad Choices (Sometimes)

When Good Optimizers Make Bad Choices (Sometimes)

Presentation Transcript

When Good Doctors Go Bad

When Good Projects Go Bad . . .

Managerial Judgment: When Good Managers Make Bad Decisions

When GOOD Maps Go BAD (Cartography)

Flagging: When Good Data Go Bad

When Good Standards Go Bad

Good Choices

When Good Religions Go Bad

When good conventions go bad

When A Good Cluster Goes Bad

When Good Pleural Effusions Go Bad

When Good Doctors Go Bad

Good Choices Make Good Friends

When Good Cells Go Bad!!

Troubleshooting – When good computers go bad!

How To Make A Good Decision

bad Food

When Good Cells Go Bad

WHEN GOOD CODE GOES BAD!