1 / 15

Chapter 8: The Physical Data Warehouse

Chapter 8: The Physical Data Warehouse. Data Warehousing Lab. 윤 혜 정. 목차. The VLDB Supporting a Star Schema Parallelism and Performance of Physical Data Warehouse Tablespace Segregation Other Physical Considerations / Oracle Internet File System Physical Implementation Approach. The VLDB.

bebe
Télécharger la présentation

Chapter 8: The Physical Data Warehouse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 8:The Physical Data Warehouse Data Warehousing Lab. 윤 혜 정

  2. 목차 • The VLDB • Supporting a Star Schema • Parallelism and Performance of Physical Data Warehouse • Tablespace Segregation • Other Physical Considerations / Oracle Internet File System • Physical Implementation Approach

  3. The VLDB • What is VLDB? • -managing a 30GB database • -(10 terabyte database and the 30GB is not a true VLDB) • Window of Opportunity • -The amount of time in a 24-hour period that the database is quite and within nightly management task can be applied. • -VLDB define : • cannot be managed during the nightly window of opportunity • Size requires to be into broken • Implementing a VLDB

  4. The VLDB • Tables and Partitions • Partitioning breaks up a single table into smaller pieces. • 3 method to partition table 1. Range Partitioning 2. Hash Partitioning 3. Composite Partitioning / Subpartitioning • Nested Table • A table within table • provide a way to simplify the table structure. • Transportable Tablespaces * move complete tables from one database to another • Source and target database are • Same block size, NLS character, hardware platform • Source tablespace is unique • Transportable tablespace must be self-contained.

  5. Supporting a Star Schema • Systematic Denormalization • violate the rule that relational database architect. • enhance the performance of the warehouse by reducing the need to understand every possible join path. • process of compressing lots of tables into fewer tables • Star schema Implementation • Materialized View -Actually store the data not just the SQL definition of the data -Together with Dimensions Oracle8i has powerful aggregate navigation functions.

  6. Supporting a Star Schema Define and populate materialized View Create materialized view revenue_summary build immediate refresh compete as select r.product_id, r.location_id, sum(r.revenue_dollars) sum_revenue_dollars from revenue r group by r.product_id, r.location_id; REVENUE table is commited, the materialized view is also update Materialized View log Create materialized view log on revenue with rowid (product_id, location_id, revenue_dollars) including new values Materialized view log should be create before the Materialized view itself. DBMS_OLAP execute dbms_olap.recommend_mv(null, 4096000, null, 0)

  7. Supporting a Star Schema • Dimension -used to define a hierarchy of data -database what columns are subsets of other columns Create dimension time_dimension level day is time.day level month is time.month level quarter is time.quarter level year is time.year hierarchy calendar_year ( day child of month child of quarter child of year); Day level Create materialized view revenue_daily_summary build immediate refresh complete as select r.product_id, r.day, sum(r.revenue_dollars) from revenue r group by r.product_id, r.day Month level Create materialized view revenue_monthly_summary build immediate refresh complete as select r.product_id, r.month, sum(r.revenue_dollars) from revenue r group by r.product_id, r.month

  8. Supporting a Star Schema Cost-based Optimizer (비용 기준 옵티마이저) 처리 방법들에 대한 비용을 산정해 보고 그 중에서 가장 적은 비용이 들어가는 처리방법을 선택한다. Gathering Statistics for Cost-based Optimizer • Measurement of the expected resource • Data distribution and storage characteristics influence the choice the optimizer makes. The SQL ANALYZE command • Tables, partitions, subpartitions, indexes, and clusters • Estimate or compute • Estimate table and compute index • Estimate • Random sample of rows • Significantly less time-consuming • Less restrictive lock

  9. Supporting a Star Schema • How to Analyze Object • Using SQL*Plus • Using a procedure Execute dbms_utility.analyze_schema(‘CVAN’,’ESTIMATE’, NULL, 20) • When to Analyze Object • Static objects and Dynamic objects • Where to view Statistics • Stale Statistics • Consistency with Statistic Collection Select table_name, num_rows From user_table Where num_rows is not null Analyze result Table_name num_rows Sale 19009 ITEM 2199 Sql>select count(*) from sale COUNT(*) 894981 Sql>select count(*) from item COUNT(*) 53219

  10. Parallelism and Performance of Physical Data Warehouse • MMP(Massively parallel processor) & SMP(Symmetric parallel processor) • Parallel Query - be controlled by a dispatcher • Parallelism and the Warehouse Oracle8i can parallelize more than 20 operations • Table Scan • Not in • Group by • Select distinct • Aggregation • Order by • Create table as select • Index maintenance • Inserting rows from other tables • Star optimization

  11. Parallelism and Performance of Physical Data Warehouse • Degree of Parallelism - number of query process associated with a single operation • At the Statement level using Hint (Hint used to influence the way the optimizer processes queries) • At the Object level the best place to define degree of parallelism. (example : create table statement includes a parallel (degree n)

  12. Parallelism and Performance of Physical Data Warehouse • Choosing a Degree of Parallelism -defined for the objects and the number of query server processes the instance has available. • Turning on Parallel Query at the Instance Level • PARALLEL_MIN_SERVERS • PARALLEL_MAX_SERVERS • PARALLEL_SERVER_IDLE_TIME If number of CPUs =1 then process query serially Else if number of CPUs >> disk drives upon which object(s) stored then first degree = number of disk drives as degree Else first degree = number of CPUs End if

  13. Tablespace Segregation • System Support Tablespaces • -SYSTEM, ROLLBACK, TEMPORARY, TOOLS, USERS • Application Tablespaces • Estimate the space required • Create Oracle accounts • Separate the data and index containers • Pointing Users at Tablespaces

  14. Other Physical Considerations/ Oracle Internet File System • National Language Support • NLS parameters : reads an initialization parameter file(init.ora) • NLS_DATE_FORMAT • NLS_TERRITORY • Oracle Internet File System To manage unstructed data within the Oracle8i kernel Rolf GrỐenveld’s Rolf Groenveld’s

  15. Physical Implementation Approach Step 1 : Plan the Database Instance  language, space planning and allocation, Security, Performance structures, Object allocation(table, index, partition) Step 2 : Perform the Installation Step 3 : Test the Installation • The cost-based optimizer • Parallel processing • Indexing schemes • The Dimensions and materialized views

More Related