1 / 87

Oracle9i: The Data Warehouse Database OracleWorld 2002 #31348

Oracle9i: The Data Warehouse Database OracleWorld 2002 #31348. Ian Abramson IAS Inc. ian@abramson.ca. Agenda. The database Partitioning External Tables Parallel Execution Materialized Views The Optimizer Bit-map join indexes The SQL CUBE and ROLLUP Rolling Windows

akamu
Télécharger la présentation

Oracle9i: The Data Warehouse Database OracleWorld 2002 #31348

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Oracle9i:The Data Warehouse DatabaseOracleWorld 2002 #31348 Ian Abramson IAS Inc. ian@abramson.ca

  2. Agenda • The database • Partitioning • External Tables • Parallel Execution • Materialized Views • The Optimizer • Bit-map join indexes • The SQL • CUBE and ROLLUP • Rolling Windows • Multi-table Inserts • OLAP Overview • Data Mining Overview

  3. Oracle9i for Data Warehousing Oracle 7.3 • Hash Join • Bitmap Indexes • Parallel-Aware Optimizer • Partition Views • Instance Affinity: Function Shipping • Parallel Union All • Asynchronous Read-Ahead • Histograms • Anti-Join Oracle 8.0 Oracle8i • Partitioned Tables and Indexes • Partition Pruning • Parallel Index Scans • Parallel Insert, Update, Delete • Parallel Bitmap Star Query • Parallel ANALYZE • Parallel Constraint Enabling • Server Managed Backup/Recovery • Point-in-Time Recovery • Hash and Composite Partitioning • Resource Manager • Progress Monitor • Adaptive Parallel Query • Server-based Analytic Functions • Materialized Views • Transportable Tablespaces • Direct Loader API • Functional Indexes • Partition-wise Joins • Security Enhancements Oracle9i • List Partitioning • Bitmap Join Index • Dynamic Aggregation Buffersize • Materialized Intermediate Results • Grouping Sets • Concatenated Grouping Sets • Aggregate Pruning • New Analytic Functions • Self-Tuning Execution Memory • System Managed Undo • Dynamic Resizing of Buffer Pool • ETL Infrastructure • and much more ...

  4. Oracle9.2iComplete e-Business Intelligence Infrastructure

  5. Metadata Relational OLAP Data Mining ETL Oracle9.2i Database

  6. Metadata Portal Query & Reporting BI Components Real-time Personalization Oracle9.2i Application ServerRuns All Your Business Intelligence Applications

  7. Oracle9iDB Oracle9iAS Data Warehousing Metadata Portal Query & Reporting ETL BI Components OLAP Real-Time Personalization Data Mining Oracle9i R2 DW Architecture

  8. OLAPEngine DataIntegrationEngine Data Warehouse Engine MiningEngine The Old Way:Everything is a Different Product DataSources

  9. Oracle9i Data Warehousing ETL OLAP Data Mining The New Way: Oracle9i • All aspects of architecture are integrated

  10. The Oracle9i DW Database Computers are useless. They only give you answers Picasso

  11. Partitioning Advantages • Separates data in separate pieces • Partition key defined at creation • Partition pruning • Partition-wise joins • May partition • Tables • Indexes

  12. Partition table Approach • Single table Approach Traditional vs. Partitions

  13. More Advantages • Partitions - separate physical entities • physical attributes (PCTFREE, PCTUSED, INITRANS,MAXTRANS) may vary for different partitions of the same table or index • Different partitions - different tablespaces • minimizes the impact of data corruption • independent back up and recovery of each partition • balance the I/O load

  14. Partition Options • Range • Hash • Composite • List • Range-List

  15. Range Partitioning • Range partitioning - maps rows to partitions based on ranges of column values CREATE TABLE Sales_by_department (Department NUMBER, SalesId NUMBER, Amount NUMBER ) PARTITION BY RANGE ( Department ) PARTITION single_digits VALUES LESS THAN (10) TABLESPACE sd_low, PARTITION double_digits VALUES LESS THAN (100) TABLESPACE dd_middle, PARTITION multiple_digits VALUES LESS THAN (maxvalue) TABLESPACE md_high);

  16. Hash Partitioning • maps rows to partitions based on a hash value of the partitioning key (oracle internally determines that) CREATE TABLE Sales_by_department (Department NUMBER, SalesId NUMBER, Amount NUMBER ) PARTITION BY HASH ( SalesID ) PARTITION hash_name1 TABLESPACE hash_name1_tbls, PARTITION hash_name2 TABLESPACE hash_name2_tbls, PARTITION hash_name3 TABLESPACE hash_name3_tbls);

  17. Composite Partitioning • Partitions data using the range method, and within each partition sub-partitions it using the hash method CREATE TABLE Sales_by_department (Department NUMBER, SalesId NUMBER, Amount NUMBER ) PARTITION BY RANGE ( Department ) SUBPARTITION BY HASH (SalesID) SUPPARTITIONS 2 STORE IN (sub1_tbls, sub2_tbls) (PARTITION single_digits VALUES LESS THAN (10) TABLESPACE sd_tbls, PARTITION double_digits VALUES LESS THAN (100) TABLESPACE dd_tbls, PARTITION multiple_digits VALUES LESS THAN (maxvalue) TABLESPACE md_tbls) );

  18. List Partitions CREATE TABLE sales_list ( salesman_id NUMBER(5), salesman_name VARCHAR2(30), sales_state VARCHAR2(20), sales_amount NUMBER(10), sales_date DATE) PARTITION BY LIST(sales_state) (PARTITION sales_west VALUES('California', 'Hawaii'), PARTITION sales_east VALUES ('New York', 'Virginia', 'Florida'), PARTITION sales_central VALUES('Texas', 'Illinois') PARTITION sales_other VALUES(DEFAULT) );

  19. Range-List Partitions CREATE TABLE quarterly_regional_sales (deptno NUMBER, item_no VARCHAR2(20), txn_date DATE, txn_amount NUMBER, state VARCHAR2(2)) PARTITION BY RANGE (txn_date) SUBPARTITION BY LIST (state) (PARTITION q1_2002 VALUES LESS THAN(TO_DATE('1-APR-2002','DD-MON-YYYY')) (SUBPARTITION q1_2002_northwest VALUES ('OR', 'WA'), SUBPARTITION q1_2002_southwest VALUES ('AZ', 'UT', 'NM'), SUBPARTITION q1_2002_northeast VALUES ('NY', 'VM', 'NJ'), SUBPARTITION q1_2002_southeast VALUES ('FL', 'GA'), SUBPARTITION q1_2002_northcentral VALUES ('SD', 'WI'), SUBPARTITION q1_2002_southcentral VALUES ('NM', 'TX')), PARTITION q2_2002 VALUES LESS THAN(TO_DATE('1-JUL-2002','DD-MON-YYYY')) …

  20. Partition Maintenance Functions • Add • Drop • Exchange • Move • Split and Merge • Truncating • Coalesce (Hash only)

  21. Partition Exchange • Allows you to create data in a separate table and then replace a partition with the table • Validate or don’t validate is the question • Actually exchanges table and partition • Nice for archiving SQL> run 1 alter table sales_transactions 2 exchange partition sales_feb_2000 3 with table load_sales 4 including indexes 5* without validation Table altered.

  22. Partition Notes • Separate partitions - separate tablespaces • Beware of MAXVALUE in rolling-window • Use naming conventions • range partitions --> table_name_YYYY_MM_DD table_name_tbls_YYYY_MM_DD • No global indexes for hash partitions

  23. External Tables • Data resides on operating system • Table definition resides in database (SYSTEM tablespace) • Need to define: • Directory for files • Table definition • Read-only • No indexes • Problems exhibit themselves at SELECT time

  24. Oracle8i: Multiple staging tables and SQL statements Staging Table Staging Table Step 1: Load into staging table Step 2: Transform data using function Step 3: Insert and update into target table TARGET FLAT FILES UPSERT LOAD TRANS- FORM Oracle9i: Single SQL statement Oracle9i: Parallel pipelining of data Oracle9i: ETL Scenario

  25. External Tables the SQL • Create directory: CREATE DIRECTORY data_dir AS 'd:\wkdir'; CREATE DIRECTORY log_dir AS 'c:\TEMP';

  26. CREATE TABLE products_delta( PROD_ID NUMBER(6), PROD_NAME VARCHAR2(50), PROD_DESC VARCHAR2(4000), PROD_SUBCATEGORY VARCHAR2(50), PROD_SUBCAT_DESCVARCHAR2(2002), PROD_CATEGORYVARCHAR2(50), PROD_CAT_DESCVARCHAR2(2002), PROD_WEIGHT_CLASS NUMBER(2), PROD_UNIT_OF_MEASURE VARCHAR2(20), PROD_PACK_SIZE VARCHAR2(30), SUPPLIER_ID NUMBER(6), PROD_STATUS VARCHAR2(20), PROD_LIST_PRICE NUMBER(8,2), PROD_MIN_PRICE NUMBER(8,2) ) ORGANIZATION external (TYPE oracle_loader DEFAULT DIRECTORY data_dir ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII BADFILE log_dir:'prod_delta.bad_xt‘ LOGFILE log_dir:'prod_delta.log_xt‘ FIELDS TERMINATED BY "|" LDRTRIM) Location ('prodDelta.dat') ) REJECT LIMIT UNLIMITED NOPARALLEL; External Tables the SQL (2)

  27. Parallel Execution • Do more work at the same time • Best for: • Large table scans • Creation of large indexes • Partition table scans • Bulk inserts, updates and deletes • Aggregations and summarizations • System characteristics • SMP, MPP, Clusters • Sufficient I/O bandwidth • Under-utilized CPU • Sufficient memory

  28. Getting Parallel to Work • In the init.ora parallel_automatic_tuning=TRUE parallel_max_servers=n • 2 * DOP * # of concurrent users parallel_min_servers=n • 0 or ??(your choice) large_pool_size & shared_pool_size mem in bytes = (3 x size x users x groups x connections) • SIZE = PARALLEL_EXECUTION_MESSAGE_SIZE • USERS = the number of concurrent parallel execution users that you expect to have running with the optimal DOP • GROUPS = the number of query server process groups used for each query • A simple SQL statement requires only one group. However, if your queries involve subqueries which will be processed in parallel, then Oracle uses an additional group of query server processes. • CONNECTIONS = (DOP2 + 2 x DOP)

  29. Materialized Views • Provide summary tables • A “Physical” view • Performance gains are significant • Similar to snapshots • Refresh may be FAST (need log on master) • COMPLETE • FORCE • Oracle packages provide help and guidance DBMS_MVIEW.EXPLAIN_MVIEW ('SH.CAL_MONTH_SALES_MV');

  30. Materialized Views • Enable it in the init.ora • Query rewrite QUERY_REWRITE_ENABLED = TRUE QUERY_REWRITE_INTEGRITY=TRUSTED • May enable at session level as well

  31. Creating a Materialized View CREATE MATERIALIZED VIEW cust_sales_mv PCTFREE 0 STORAGE (initial 8k next 8k pctincrease 0) BUILD IMMEDIATE REFRESH FORCE ENABLE QUERY REWRITE AS SELECT c.cust_id, SUM(amount_sold) AS dollar_sales FROM sales s, customers c WHERE s.cust_id= c.cust_id GROUP BY c.cust_id ;

  32. Collecting the Details • Use the DBMS_MVIEW supplied package • Has many other packages to help you with your materialized views. • Today you get one! EXECUTE DBMS_MVIEW.EXPLAIN_MVIEW('SH.CAL_MONTH_SALES_MV');

  33. Verifying a Materialized View SELECT capability_name, possible, SUBSTR(related_text,1,8) AS rel_text, SUBSTR(msgtxt,1,60) AS msgtxt FROM MV_CAPABILITIES_TABLE ORDER BY seq; CAPABILITY_NAME P REL_TEXT MSGTXT --------------- - -------- ------ PCT N (Partition Change Tracking) REFRESH_COMPLETE Y REFRESH_FAST N REWRITE Y PCT_TABLE N SALES no partition key or PMARKER in select list PCT_TABLE N TIMES relation is not a partitioned table REFRESH_FAST_AFTER_INSERT N SH.TIMES mv log must have new values REFRESH_FAST_AFTER_INSERT N SH.TIMESmv log must have ROWID REFRESH_FAST_AFTER_INSERT N SH.TIMESmv log does not have all necessary columns REFRESH_FAST_AFTER_INSERT N SH.SALES mv log must have new values REFRESH_FAST_AFTER_INSERT N SH.SALES mv log must have ROWID REFRESH_FAST_AFTER_INSERT N SH.SALES mv log does not have all necessary columns REFRESH_FAST_AFTER_ONETAB_DML N DOLLARS SUM(expr) without COUNT(expr) REFRESH_FAST_AFTER_ONETAB_DML N see the reason why REFRESH_FAST_AFTER_INSERT is disabled REFRESH_FAST_AFTER_ONETAB_DML N COUNT(*) is not present in the select list REFRESH_FAST_AFTER_ONETAB_DML N SUM(expr) without COUNT(expr) REFRESH_FAST_AFTER_ANY_DML N see the reason why REFRESH_FAST_AFTER_ONETAB_DML is disabled REFRESH_FAST_AFTER_ANY_DML N SH.TIMES mv log must have sequence REFRESH_FAST_AFTER_ANY_DML N SH.SALES mv log must have sequence REFRESH_PCT N PCT is not possible on any of the detail tables in the materialized view REWRITE_FULL_TEXT_MATCH Y REWRITE_PARTIAL_TEXT_MATCH Y REWRITE_GENERAL Y REWRITE_PCT N PCT is not possible on any detail tables

  34. Dimensions • Categorizes data • Provides hierarchy guidance • Allows roll-up and roll-down • Needed for query rewrite • Needed for materialized views • OEM has Dimension Wizard Category Sub-Category Product

  35. Dimensions the SQL CREATE DIMENSION products_dim LEVEL product IS (products.prod_id) LEVEL subcategory IS (products.prod_subcategory) LEVEL category IS (products.prod_category) HIERARCHY prod_rollup ( product CHILD OF subcategory CHILD OF category ) ATTRIBUTE product DETERMINES (products.prod_name, products.prod_desc, prod_weight_class, prod_unit_of_measure, prod_pack_size,prod_status, prod_list_price, prod_min_price) ATTRIBUTE subcategory DETERMINES (prod_subcategory, prod_subcat_desc) ATTRIBUTE category DETERMINES (prod_category, prod_cat_desc);

  36. The Optimizer is a Star • Star Transform must be enabled STAR_TRANSFORMATION_ENABLED=TRUE • Requires Bit Mapped indexes or bit mapped join index on foreign key columns CREATE BITMAP INDEX sales_c_state_bjix ON sales(customers.cust_state_province) FROM sales, customers WHERE sales.cust_id = customers.cust_id LOCAL NOLOGGING COMPUTE STATISTICS; • Cost-based optimizer must be used • Optimizer looks for small set of dimensions to satisfy query, even if large number of rows in fact

  37. The Star SQL SELECT ch.channel_class, c.cust_city, t.calendar_quarter_desc, SUM(s.amount_sold) sales_amount FROM sales s, times t, customers c, channels ch WHERE s.time_id = t.time_id AND s.cust_id = c.cust_id AND s.channel_id = ch.channel_id AND c.cust_state_province = 'CA' AND ch.channel_desc in ('Internet','Catalog') AND t.calendar_quarter_desc IN (‘2001-Q1',‘2001-Q2') GROUP BY ch.channel_class, c.cust_city, t.calendar_quarter_desc;

  38. How the Optimizer Sees It SELECT ch.channel_class, c.cust_city, t.calendar_quarter_desc, SUM(s.amount_sold) sales_amount FROM sales WHERE time_id IN (SELECT time_id FROM times WHERE calendar_quarter_desc IN(‘2001-Q1',‘2001-Q2')) AND cust_id IN (SELECT cust_id FROM customers WHERE cust_state_province='CA') AND channel_id IN (SELECT channel_id FROM channels WHERE channel_desc IN('Internet','Catalog'));

  39. Star Transform Restrictions • The star transform use is restricted: • A hint tells optimizer to not use a bitmap index • Query contains bind variables • Not enough bitmap indexes • The fact table is a remote table • The optimizer may not choose the star: • Tables have a single access path • Tables are too small to be useful • Database is in read-only mode

  40. Sales(Customer.state) Sales(Customer.state) Index key is Customer.State Indexed table is Sales Bitmap Join Indexes Sales Customer CREATE BITMAP INDEX cust_sales_bji ON Sales(Customer.state) FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id;

  41. Resumable Transactions • Suspend transactions • Ability to resume these transactions • Only errors currently handled: • SQL statements that run out of TEMP space • DML/Export/CREATE TABLE as … • Space limits • Out-of-space transaction • Exceed space quota • Full support of locally managed tablespaces • Great for large DW queries

  42. Resumable the SQL • Enable Resumable transactions ALTER SESSION ENABLE RESUMABLE TIMEOUT 1200; • The Transaction: CREATE TABLE sales_prod_dept ( prod_category, prod_subcategory, cust_id, time_id,channel_id,promo_id, quantity_sold, amount_sold ) NOLOGGING TABLESPACE transfer PARTITION BY LIST (prod_category) ( PARTITION boys_sales values ('Boys'), PARTITION girls_sales values ('Girls'), PARTITION men_sales values ('Men'), PARTITION women_sales values ('Women') ) AS SELECT p.prod_category, p.prod_subcategory, s.cust_id, s.time_id, s.channel_id, s.promo_id, SUM(s.amount_sold) amount_sold, SUM(s.quantity_sold) quantity_sold FROM sales s, products p, times t WHERE p.prod_id=s.prod_id AND s.time_id = t.time_id AND t.fiscal_year=2002 GROUP BY prod_category, prod_subcategory,cust_id, s.time_id, channel_id, promo_id ;

  43. The Proof it Stopped SELECT name, status, error_msg FROM dba_resumable;

  44. The SQL

  45. The SQL • Oracle provides many analytical functions • Cross tabular reports • Ranking functions • Percentile functions • Regression analysis • Moving windows • Ratios within a report • CASE statements

  46. CUBE, ROLLUP and GROUP functions • Efficient data access • Creates matrix of totals • Crosstabs are computed for you • CUBE provides all totals • GROUPING provides guidance of totals

  47. ROLLUP the SQL SELECTchannel_desc, calendar_month_desc, country_id, TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$ FROM sales, customers, times, channels WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id AND sales.channel_id= channels.channel_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc IN ('2002-09', '2002-10') AND country_id IN ('CA', 'US') GROUP BY ROLLUP(channel_desc,calendar_month_desc,country_id);

  48. ROLLUP Results CHANNEL_DESC CALENDAR CO SALES$ ------------- -------- -- -------------- Direct Sales 2002-09 CA 1,378,126 Direct Sales 2002-09 US 2,835,557 Direct Sales 2002-09 4,213,683 BY Channel and Month Direct Sales 2002-10 CA 1,388,051 Direct Sales 2002-10 US 2,908,706 Direct Sales 2002-10 4,296,757 BY Month Direct Sales 8,510,440 BY Channel Internet 2002-09 CA 911,739 Internet 2002-09 US 1,732,240 Internet 2002-09 2,643,979 BY Channel and Month Internet 2002-10 CA 876,571 Internet 2002-10 US 1,893,753 Internet 2002-10 2,770,324 BY Month Internet 5,414,303 BY Channel 13,924,743For Everything

  49. CUBE the SQL SELECTchannel_desc, calendar_month_desc, country_id, TO_CHAR(SUM(amount_sold), '9,999,999,999') SALES$ FROM sales, customers, times, channels WHERE sales.time_id=times.time_id AND sales.cust_id=customers.cust_id AND sales.channel_id= channels.channel_id AND channels.channel_desc IN ('Direct Sales', 'Internet') AND times.calendar_month_desc IN ('2002-09', '2002-10') AND country_id IN ('CA', 'US') GROUP BY CUBE (channel_desc,calendar_month_desc,country_id);

  50. CUBE the Results CHANNEL_DESC CALENDAR CO SALES$ -------------------- -------- -- ---------- Direct Sales 2002-09 CA 1,378,126 Direct Sales 2002-09 US 2,835,557 Direct Sales 2002-09 4,213,683 BY Channel and Month Direct Sales2002-10CA 1,388,051 Direct Sales 2002-10 US 2,908,706 Direct Sales 2002-10 4,296,757 BY Channel and Month Direct Sales CA 2,766,177 BY Channel and Country Direct Sales US 5,744,263 Direct Sales 8,510,440 BY Channel Internet 2002-09 CA 911,739 Internet 2002-09 US 1,732,240 Internet 2002-09 2,643,979 BY Channel and Month Internet 2002-10 CA 876,571 Internet 2002-10 US 1,893,753 Internet 2002-10 2,770,324 BY Channel and Month Internet CA 1,788,310 BY Channel and Country Internet US 3,625,993 Internet 5,414,303 BY Channel 2002-09 CA 2,289,865 BY Month and Country 2002-09 US 4,567,797 2002-09 6,857,662 BY Month 2002-10 CA 2,264,622 2002-10 US 4,802,459 2002-10 7,067,081 CA 4,554,487 US 9,370,256 13,924,743 Everything

More Related