550 likes | 666 Vues
Oracle9 i for Data Warehousing John Abrahams Technology Sales Consultant Oracle Nederland. The Old Way: Fragmented Information Supply Chain. OLAP Engine. Data Integration Engine. Data Warehouse Engine. Mining Engine. Protracted implementation and maintenance cycle
E N D
Oracle9i for Data Warehousing John Abrahams Technology Sales Consultant Oracle Nederland
The Old Way:Fragmented Information Supply Chain OLAPEngine DataIntegrationEngine Data Warehouse Engine MiningEngine • Protracted implementation and maintenance cycle • Synchronization and currency issues • Information Management chaos
The New Way: Oracle9i Oracle9i Data Warehousing ETL OLAP Data Mining • Single business intelligence platform • Reduce administration and implementation costs • Faster deployment • Improved scalability and reliability
Oracle9iDatabaseSingle business-intelligence data server Metadata Relational OLAP Data Mining ETL
Oracle9iApplication ServerRuns All Your Business Intelligence Applications Metadata Portal Query & Reporting BI Components Web Site Analysis
Oracle Databasefor Data WarehousingContinuous Innovation Oracle 7.3 Oracle 8.0 • Hash Join • Bitmap Indexes • Parallel-Aware Optimizer • Partition Views • Instance Affinity: Function Shipping • Parallel Union All • Asynchronous Read-Ahead • Histograms • Anti-Join Oracle8i • Partitioned Tables and Indexes • Partition Pruning • Parallel Index Scans • Parallel Insert, Update, Delete • Parallel Bitmap Star Query • Parallel ANALYZE • Parallel Constraint Enabling • Server Managed Backup/Recovery • Point-in-Time Recovery • Summary Management • New Partitioning Schemes • Resource Manager • Progress Monitor • Adaptive Parallel Query • Server-based Analytic Functions • Transportable Tablespaces • Direct Loader API • Functional Indexes • Partition-wise Joins • Security Enhancements • and more ...
Automatic Memory Tuning ETL Infrastructure Change data capture External tables Table functions Upserts Multi-table INSERTs Resumable statements Transportable tablespace enhancements List Partitioning Internal enhancements for: parallel query aggregation cost-based optimization Bitmap Join Indexes Analytic SQL fns Grouping sets FIRST/LAST aggregates Inverse distribution Hypothetical rank Proactive query governing Enhancements to MVs Broader refresh and rewrite capabilities More sophisticated summary advisor Full Outer Joins WITH-clause New Oracle9i RDBMS featuresExtending Oracle’s leadership
Real Application Clusters(RAC) • Exploiting clustered systems • Shared Cache ArchitectureOne database • Avoiding application downtime by single node failure • Allows applications to become • Highly scaleable • Highly available
Shared Cache Data A-Z Cache Fusion means Scalability • Protocol that allows instances to combine their data caches into a shared global cache • Global Cache Service (GCS) coordinates sharing • Key features are • Direct sharing of volatile buffer caches • Efficient inter-node messaging framework • Fast recovery from node failures using cache and CPU resources from all surviving nodes
Managing Large Volumes of Data • Partitioning and parallelism are crucial for VLDB • Parallelism for all operations • DBA operations: loading, index-creation, table-creation, data-modification, backup and recovery • End-user operations: Queries • Unbounded scalability: Real Application Clusters • Partitioning provides ‘incremental’ operations for: • Data loading • Indexing • Referential Integrity • Backup and recovery
How Parallel Execution Works • With serial execution only one process is used • With parallel execution • One parllel execution coordinator process • Many parallel execution servers • Table is dynamically partitioned into granules
VLDB Manageability and Performance Constraints • Table availability: • Large tables are more vulnerable to disk failure • It is too costly to have a large table inaccessible for hours due to recovery • Large table manageability • They take to long to be loaded • Indexes take too long to be built • Partial deletes take hours • Performance considerations • Large table and index scans are costly • Scanning a subset improves performance
Benefits of Partioning • Availability • Partions can be independently managed • Backup and restore operations can be done on individual partitions • Partitions that are unavailable do not affect queries on DML operations on other paritions that use the same table or index • Manageability • A partition can be moved from one tablespace to another • A partition can be dropped, truncated, added • A partition can be divided at user-defined value
Benefits of Partioning (2) Performance • The optimizer eliminates partitions that not have to be scanned • Partitions can be scanned in parallel • Partitions can be load-balanced across physical devices • Join operations can be optimized to “join by the partition”
Partitioning Methods • Range Partitioning • Hash partitioning • List Partioning • Composite Partioning Hashpartitioning Rangepartitioning List Partioning Composite partitioning
Partitioned Indexes • Indexes can be partitioned like tables • Partitioned or nonpartioned indexes can be used with partitioned or nonpartitioned tables • Partioned indexes can be • Global or local • Prefixed or nonprefixed
Rolling Window Operations Partitioned Tables with Local Indexes: ... 96-Jun 01-Feb 96-Apr 96-May 01-Mar
Rolling Window Operations Partitioned Tables with Local Indexes: 1. Load and index new month ... 96-Jun 01-Apr 96-Apr 96-May 01-Feb 01-Mar
Rolling Window Operations Partitioned Tables with Local Indexes: 1. Load and index new month 2. Add new month to table ... 01-Apr 96-Jun 96-Apr 96-May 01-Feb 01-Mar
Rolling Window Operations Partitioned Tables with Local Indexes: 1. Load and index new month 2. Add new month to table 3. Remove old month from table ... 01-Apr 96-Jun 96-Apr 96-May 01-Feb 01-Mar
Rolling Window Operations • Partitioned Tables with Local Indexes: • New data has been loaded with virtually no disruption • Powerful methodology for managing time-based updates to the Warehouse ... 01-Apr 96-Jun 96-May 01-Feb 01-Mar
List Partitioning Same benefits as rolling window: data is partitioned according to business requirements Online Queries Europe Region Americas Region Asia Region 1200 GMT Maintenance
1200 GMT 2000 GMT List Partitioning Same benefits as rolling window: data is partitioned according to business requirements Online Queries Europe Region Americas Region Asia Region Maintenance
1200 GMT 0400 GMT 2000 GMT List Partitioning Same benefits as rolling window: data is partitioned according to business requirements Online Queries Europe Region Americas Region Asia Region Maintenance
Table Compression: What is it? • Tables can be compressed • Compression can also be specified at the partition level • Indexes and index-organized tables are not compressed • Typical compression ratios are 3:1 - 5:1 • Compression is dependent upon the actual data • Compression algorithm based on removing data redundancy • All DDL/DML commands are supported on compressed tables
Table Compression:What isn’t it? • This is not a generic ‘zip’-style compression • Not all tables will have good compression • Compression algorithm guarantees that compression will never increase size of table • Most large DW tables seem to compress well • Compression happens between column/row values, not within column/row values • Long character strings are not compressed unless the exact same string appear multiple times • LOB/BLOB columns are not compressed
Table Compression:How it works • Duplicate values are stored in symbol table for each block <rowid> ‘650-506-7000’‘650-123-4567’ <rowid> ‘650-506-7000’‘650-506-7001’ <rowid> ‘650-506-7000’‘650-456-7890’ <rowid> ‘650-506-7000’‘650-098-7654’ <rowid> ‘650-506-7000’‘650-123-4567’ <rowid> ‘650-506-7001’‘650-123-4567’ <rowid> ‘650-506-7001’‘650-123-4567’ … Uncompressed Compressed <symbol table: <A>= ‘650-506-7000’, <B>=‘650-506-7001’, <C>=‘650-123-4567’> <rowid> <A> ‘650-123-4567’ <rowid> <A> <B> <rowid> <A> ‘650-456-7890’ <rowid> <A> ‘650-098-7654’ <rowid> <A> <C> <rowid> <B> <C> <rowid> <B> <C> …
Table Compression:Usage • Creating compressed tables: • CREATE TABLE T1(id integer) COMPRESS; • Converting tables to compressed tables: • ALTER TABLE T3 MOVE COMPRESS; • Creating compressed tablespaces: • CREATE TABLESPACE tabspace_2 DATAFILE 'diska:tabspace_file2.dat' SIZE 20M DEFAULT COMPRESS STORAGE ( … );
Table Compression: Performance Impact • Queries on compressed tables may observe minor performance degradation • Performance impact depends upon the query • Many queries will be faster • Compression reduces IO but increases CPU utilization • For a set of heterogeneous queries, performance should degrade by no more than 5% • Load and direct-path INSERT performance will be slower • Data must be compressed as it is added to the table
Table Compression: When to Use it • Data warehouses containing large volumes of historical data • Compress all of the older data in a data warehouse • Integrate compression into the ‘rolling window’ paradigm • For example, most recent 3 months of data could be stored uncompressed and the previous 21 months could be stored compressed • Materialized views and other derived data sets • Generally, compression should be applied to data that is infrequently updated
Manage large numbers of users • Key requirements: • Guarantee optimal resource utilization all the time • Provide the appropriate amount of resources to every job or query based on priority and system load • Pro-actively prevent ‘runaway’ queries • Pro-actively prevent system overloading • Managing large numbers of users should be simple and automated
Appropriate Resources to Each Query • CPU • Business-critical processes receive more CPU • Database Resource Manager allows DBA to assign CPU resources to groups of users • Memory • Oracle9i dynamically allocates runtime memory based on current available memory and each query’s requirements • Parallelism • Degree of parallelism is dynamically chosen based on available resources and each query’s requirements
Automatic Runtime Memory Tuning • One parameter: PGA_AGGREGATE_SIZE = <size> • Dynamic allocation of ‘runtime’ memory based upon each query’s requirements • In data-warehouse environments, >50% of a server’s physical memory is typically used for query ‘runtime’ memory • Benefits: • Reduced overall memory usage • Improved throughput • Simplified tuning
Pro-active management of DW Workloads • Predictive Query Governing and Dynamic Re-prioritization: • Queries which are estimated to take longer than an DBA-specified limit will abort or be ‘de-prioritized’ • Automatic Queuing: • A limit can be set on the number of active session for each group of users; queries submitted which exceed this limit will be queued • Via Database Resource Manager
Example Scenario • Power Users • Up to 70% of the CPU resources • Any degree of parallelism • Any query which is expected to take over one hour will be migrated to background • Report Users • Up to 20% of the CPU resources • No parallelism • Limit of 40 concurrent queries • Any query which is expected to take over 20 minutes will be aborted • Background Jobs • Up to 10% of the CPU resources • Any degree of parallelism • Limit of 5 concurrent queries
Fast Query Performance • The best approach for every query • Integrated • Comprehensive Parallel Operations Materialized Views Query Optimizer Partitioning Access & Join Methods
Bitmap Indexes • The most common index type in Oracle DW environments • Bitmap indexes introduced in Oracle 7.3 • Bitmap join indexes introduced in Oracle9i • Oracle has over a dozen patents for bitmap index technology • Oracle provides patented compression technique for bitmap indexes • Bitmap indexes are 3-20x smaller than b-tree indexes • Less storage yields better query performance and more indexed columns
<Blue, <rowid>, 1000100100010010100> <Green, <rowid>, 0001010000100100000> <Red, <rowid>, 0100000011000001001> <Yellow, <rowid>, 0010001000001000010> Bitmap indexes introduction Structure of a bitmap index CREATE BITMAP INDEX PROD_COLOR ON PROD(COLOR) Separate ‘bitmap’ created for each value of the color column A high-level b-tree structure is created so that each bitmap can be located
Bitmap indexes characteristics • Columns with Low-to-Medium Cardinality • ‘Set-based’ manipulation of data • Especially good for large, complex queries • Orders of magnitude performance improvement • Fully integrated within Oracle9i • Created and managed similar to other Oracle9i indexes • Used to accelerate single-table access, joins, and aggregation • Transparently selected by the query optimizer
Bitmap join indexes Sales Customer CREATE BITMAP INDEX cust_sales_bji ON Sales(Customer.state) FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id;
Bitmap join indexes Sales Customer CREATE BITMAP INDEX cust_sales_bji ON Sales(Customer.state) FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id; Sales(Customer.state) Index key is Customer.State
Bitmap join indexes Sales Customer CREATE BITMAP INDEX cust_sales_bji ON Sales(Customer.state) FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id; Sales(Customer.state) Sales(Customer.state) SELECT SUM(SALES.DOLLAR_AMOUNT FROM Sales, Customer WHERE Sales.cust_id = Customer.cust_id AND CUSTOMER.STATE = ‘California’; Index key is Customer.State Indexed table is Sales
Materialized Views CREATE MATERIALIZED VIEW sf_sales AS SELECT * FROM sales WHERE city_name = ’SAN FRANCISCO’ • Currently, indexes provide fast path access to specific data • Materialized views work on the same principle • A Materialized view is an instantiation of a SQL statement - a view with data storage • Materialized views can be partitioned, indexed separately • Used for query rewrite to increase performance • Rewrites are transparent to applications • Rewrites do not require any special privileges CREATE Sales SF_Sales SELECT prod_code FROM sales WHERE city_name = ’SAN FRANCISCO’ SELECT prod_code FROM sf_sales
SQL support for analytic calculations • Why enhance the RDBMS for analytic calculations? • Benefits • Performance • Scalability • Simpler SQL development
Analytic Functions: Examples • Rank • Top 10 sales-reps in each region • Moving Window • Today’s stock price minus 200-day moving average • Period-over-period comparisons • Percentage growth of Jan-99 sales over Jan-98 • Ratio-to-report • January’s sales as a percentage of the entire year’s
Platform for Business Intelligence:Data Warehousing Data Warehousing • Foundation of the Business Intelligence Platform • More data • More users • Faster • Simple management • Oracle9i introduces dozens of new DW features ETL OLAP Data Mining Oracle9i