Partitioning with Oracle 11G

Partitioning with Oracle 11G Bert Scalzo, Domain Expert, Oracle Solutions Bert.Scalzo@Quest.com

About the Author … Domain Expert & Product Architect for Quest Software Oracle Background: • Worked with Oracle databases for over two decades (starting with version 4) • Work history includes time at both “Oracle Education” and “Oracle Consulting” Academic Background: • Several Oracle Masters certifications • BS, MS and PhD in Computer Science • MBA (general business) • Several insurance industry designations Key Interests: • Data Modeling • Database Benchmarking • Database Tuning & Optimization • "Star Schema" Data Warehouses • Oracle on Linux – and specifically: RAC on Linux This presentation draws heavily on these areas • Articles for: • Oracle’s Technology Network (OTN) • Oracle Magazine, • Oracle Informant • PC Week (eWeek) • Articles for: • Dell Power Solutions Magazine • The Linux Journal • www.linux.com • www.orafaq.com

Books by Author … Coming in 2008 …

Agenda • Partitioning Benefits • Partitioning History • Partitioning Options • Partitioning Advisor (if you’re licensed) • Typical Data Warehousing Environment • TPC-H “Data Warehouse” Benchmark • Results TPC-H with Various Partition Strategies • What about OLTP Environments and the TPC-C/E • Lessons Learned (and their relevance/application) • Questions & Answers

Partitioning Benefits: Facts • Manageability • Classic “Divide & Conquer” technique • More granular storage allocation options • Keeps otherwise time consumptive options viable • Availability • More granular online/offline options • More granular rebuild/reorganization options • More granular object level backup/restore options • Capacity Management • Enables a “Tiered Storage Architecture” approach • More granular storage cost management decision points • Performance • Partition Pruning • Partition-Wise Joins

Partitioning Benefits: Opinion (Mine) • Manageability 40% • Availability 20% • Capacity Management 20% • Performance 20% • Don’t over-sell/over-expect the performance aspect • Need to experiment for best approach for a database • Better to take longer at the start to get right, because very often it’s far too expensive to change afterwards • Examples demonstrate very positive performance, but better to be conservative and error on the side of caution – then be very pleasantly surprised… Why to Partition

Partition Pruning (Restriction Based) • From Docs: In partition pruning, the optimizer analyzes FROM and WHERE clauses in SQL statements to eliminate unneeded partitions when building the partition access list. This enables Oracle Database to perform operations only on those partitions that are relevant ... • “Divide and Conquer” for performance • Sometimes can yield order of magnitude improvement • But once again, best not to oversell and/or over-expect • Some Potential Issues to be aware of: • SQL*Plus Auto-Trace can sometimes miss partition pruning • “Old Style” Explain Plans via simple SELECT has issues too • Best to always use DBMS_XPLAN and/or SQL_TRACE Note: Trace file analysis much easier these days – SQL Developer + free Hotsos plug-in, metalink trace analysis scripts, Quest Toad DBA

Partition-Wise Join (Multi-Object Based) • From Docs: Partition-wise joins reduce query response time by minimizing the amount of data exchanged among parallel execution servers when joins execute in parallel. This significantly reduces response time & improves the use of both CPU & memory resources. • Different Flavors: • Full – Single to Single • Full – Composite to Single • Full – Composite to Composite • Partial – Single • Partial – Composite • Indexing Strategy Counts • Local Prefixed/Non-Prefixed • Global All of these affect the explain plan

Picture Worth 1000 Words (from Docs) Simple Mantra: Subdivide the work into equally paired chunks, then perform all that work using many parallel processes Make sure not to over-allocate CPU’s – remember there will also be concurrent workload

Partitioning History (from Oracle 11G training+) Oracle 5 Before Tablespaces – we had partitions  Oracle 7 Partition Views – really more of a cheat 

Partitioning Options – Part 1 IOT’s can be partitioned as well in later versions of Oracle, so the basic choices are even more complex than this…

Partitioning Options – Part 2 Prior to 11G: Oracle White Paper: 2007 Partitioning in Oracle Database 11g

Partitioning Options – Part 3 Post 11G: Oracle White Paper: 2007 Partitioning in Oracle Database 11g Very exciting new options…

Partitioning Advisor (if you’re licensed) Advisor Central -> SQL Advisors -> SQL Access Advisor

Typical Data Warehouse Architecture TPC-H

Typical Environments We’ll come back to this picture

TPC-H Benchmark • Industry Standard “Data Warehouse” Benchmark • URL: www.tpc.org/tpch • Spec: http://tpc.org/tpch/spec/tpch2.7.0.pdf • 8 Tables • 22 Queries (answer complex business questions) • Database scaling: • Factor = 1, 10, 30, 100, 300, 1000, 3000, 10000, 30000, 100000 • Size GB = 1, 10, 30, 100, 300, 1000, 3000, 10000, 30000, 100000

TPC-H Data Model Sub-Partitions SF * 6,000,000 SF * 200,000 SF * 800,000 SF * 10,000 Partitions 25 5 SF * 150,000 SF * 1,500,000

TPC-H Permits Partitioning … But what to do, what to do ???

Disclosure Reports http://tpc.org/tpch/results/tpch_perf_results.asp

Disclosure Report – Lots of Info This is where people document exactly what advanced database feature and storage parameters they used – info is invaluable 

Disclosure Report – Appendix B

Sample Expensive Query

Example Explain Plan

Example Explain Plan Explain complete. Plan hash value: 2545634784 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 42533 | 5648K| | 641K (1)| 01:57:41 | | 1 | SORT GROUP BY | | 42533 | 5648K| 105M| 641K (1)| 01:57:41 | |* 2 | HASH JOIN | | 715K| 92M| | 631K (1)| 01:55:51 | | 3 | TABLE ACCESS FULL | H_NATION | 25 | 725 | | 3 (0)| 00:00:01 | |* 4 | HASH JOIN | | 715K| 72M| | 631K (1)| 01:55:51 | | 5 | TABLE ACCESS FULL | H_SUPPLIER | 100K| 781K| | 646 (1)| 00:00:08 | |* 6 | HASH JOIN | | 720K| 68M| 68M| 631K (1)| 01:55:44 | |* 7 | HASH JOIN | | 751K| 59M| 232M| 589K (1)| 01:48:10 | |* 8 | HASH JOIN | | 3004K| 197M| 4984K| 485K (1)| 01:28:57 | |* 9 | TABLE ACCESS FULL| H_PART | 100K| 3808K| | 11805 (1)| 00:02:10 | | 10 | TABLE ACCESS FULL| H_LINEITEM | 60M| 1716M| | 342K (1)| 01:02:52 | | 11 | TABLE ACCESS FULL | H_ORDER | 15M| 200M| | 72122 (1)| 00:13:14 | | 12 | TABLE ACCESS FULL | H_PARTSUPP | 8000K| 122M| | 25945 (1)| 00:04:46 | ---------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("S_NATIONKEY"="N_NATIONKEY") 4 - access("S_SUPPKEY"="L_SUPPKEY") 6 - access("PS_SUPPKEY"="L_SUPPKEY" AND "PS_PARTKEY"="L_PARTKEY") 7 - access("O_ORDERKEY"="L_ORDERKEY") 8 - access("P_PARTKEY"="L_PARTKEY") 9 - filter("P_NAME" LIKE :SYS_B_2)

Method of Attack • Since many data warehouses are utilized for data mining, we can’t always know every possible query likely to run – thus aggregate measure for success • Thus we’ll compare the benchmark’s weighted performance scores for the TPC-H using various partitioning schemes (all within spec of course…) • Goal will be to find the best overall partitioning … • Then we’ll examine some specific explain plans …

10G Sample Test cases • 10G Simple Approach (Just Huge Tables) • Range: ORDER (order date) • Hash: LINEITEM (order key) • 10G Basic Approach (Single Level Partitions) • Range: ORDER (order date) • Hash: LINEITEM (order key) • List: CUSTOMER (nation key) • Hash: PART, SUPPLIER and PARTSUPP (part & supp keys) • 10G Complex Approach (Composite Partitions) • Range-Hash: ORDER (order date & cust key) • Multi-Hash: LINEITEM (part, supp & order keys) • List: CUSTOMER (nation key) • Hash: PART, SUPPLIER and PARTSUPP (part & supp keys)

11G Sample Test cases • 11G Simple Approach (+Interval) • Interval-Hash: ORDER (order date & cust key) • Multi-Hash: LINEITEM (part, supp & order keys) • List: CUSTOMER (nation key) • Hash: PART, SUPPLIER and PARTSUPP (part & supp keys) • 11G Basic Approach (+Virtual) • Interval-Hash: ORDER (virtualized order date & cust key) • Multi-Hash: LINEITEM (part, supp & order keys) • List: CUSTOMER (nation key) • Hash: PART, SUPPLIER and PARTSUPP (part & supp keys) • 11G Complex Approach (+REF) • Interval-Hash: ORDER (virtualized order date & cust key) • REF: LINEITEM (order key) • List: CUSTOMER (nation key) • Hash: PART, SUPPLIER and PARTSUPP (part & supp keys)

Is that It? • No – just six very obvious high-level scenarios • Your selections and actual mileage will vary • Experimentation usually yields the best results • Always trust “empirical results” over conjecture • So improved response-time beats better explain plan • Remember, DW’s usually have unpredictable queries • So don’t tune for just a few queries, look for the best overall and/or more generic performance solution 

Intermediate Results can be Misleading  TPC-H Power score seemingly implies that every partitioning schema is incrementally better TPC-H Throughput score seems to show that non-partitioned is equal to the best

Final Results tell the Real Truth  TPC-H Query/Hour score shows that some partitioning schemes better, and some not TPC-H $/Query/Hour score confirms the inverse in terms of dollars per unit of work

Why such seemingly Opposite Results ??? • Run times and explain plans apply to single measurable operation • Even aggregate & averaged run times don’t relate the entire truth! • Need answer based upon sound mathematics (reliable & repeatable)

TPC-C Benchmark • Historical Industry Standard “OLTP” Benchmark • URL: www.tpc.org/tpchc • Spec: http://tpc.org/tpcc/spec/tpcc_current.pdf • Probably the most used & widely quoted benchmark • But suffers from overly simplistic design & code logic • Generally considered unreliable with modern RDBMS • But still a decent rough “sounding board” for many …. • Being replaced by the newer TPC-E (later slides)

TPC-C Data Model # Terminals/Warehouse (i.e. concurrent users) Base Scaling Unit Clustered Partitions Sub-Partitions

TPC-E Benchmark • Emerging Industry Standard “OLTP” Benchmark • URL: www.tpc.org/tpche • Spec: http://tpc.org/tpce/spec/TPCE-v1.5.1.pdf • Very new and still evolving – but highly promising • Not too many published TPC-E results as of yet … • Design not compromised by RDBMS features • Much more realistic (i.e. real world) in nature  • Nowhere near as easy as the old TPC-C test 

TPC-E Data Model

TPS is Moot, Average Response Time is King tpmC 584 582 579 578 But wait: adding cluster &partitioning yields negative – why ??? Look to Stats Pack, AWR and ADDM Reports to investigate…

Single block reads 32 ms! DISTRICT Table needs clustered

Suggest fix one major item per test iteration, so made choice to address this 1st Single block read 32 ms  Clustering worked, it made SQL the #1 performance issue – as was expected Single block read 5 ms  Partitioning did not shine through just yet, possibly skewed by the first issue

Switch MEMORY_TARGET to SGA/PGA_TARGETS -13% Notice that manual memory management resulted in 13% gain !!! But wait, there’s more (isn’t that almost always the case) …

Now it’s all just SQL, so time for SQL Tuning Advisor & SQL Tuning Sets

SQL Tuning Sets & 11G Results/Client Cache -9% Going to stop – Have quickly reached the “good enough” point 

Architecture Findings Mostly Partition Elimination Mostly Partition Wise Join Design to eliminate per object Design to parallelize across objects

Other Interesting Findings • 64-bit scales much more reliably than 32-bit, even when on same hardware and using the <= 4GB memory model • If you know the application code’s nature and it’s well definable, manual memory management may be better • Using manual SGA/PGA targets with floors yields much more scalable results and also more predictable patterns • Partitioning is not an automatic bonus, must experiment to identify the optimal partitioning scheme per situation • Don’t forget older technologies like clusters, they still can add a positive to the overall equation in certain cases • Don’t forget SQL Tuning Sets & SQL Advisor, or the “explain plans” may not in fact be the best obtainable • Don’t forget 11G’s Result and Client Caches

Thank you Questions and Answers … Presenter: Bert Scalzo E-mail: Bert.Scalzo@Quest.com Note: these slides should be available on Open World web site, but I’ll also make sure to post them on my company’s web site: www.toadworld.com/Experts/BertScalzosToadFanaticism/tabid/318/Default.aspx

Partitioning with Oracle 11G

Partitioning with Oracle 11G

Presentation Transcript

Oracle 11g: SQL

Relational Oracle 11g Tutorial

Oracle Data Guard 11g Release 2 with Oracle Enterprise Manager 11g Grid Control

Oracle 11g Database Architectures

Upgrading to Oracle 11g

Oracle 11g: SQL

Oracle 11g: SQL

Oracle 11g online training | Online Oracle 11g Training in u

Oracle 11G XE

Oracle 11g: SQL

Oracle 11g: SQL

Oracle 11g Online Training | Online Oracle 11g Training

Oracle 11g New Features | Oracle Database 11g Online Training

Oracle 11g certifications

Oracle 11g Administration II

Oracle 11g: SQL

Oracle 11g: SQL

Oracle 11g: SQL

Oracle 11g: SQL

Oracle 11g: SQL

Oracle 11g: SQL

Oracle Partitioning in Oracle Database 11g