Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
The Oracle9i Multi-Terabyte Data Warehouse PowerPoint Presentation
Download Presentation
The Oracle9i Multi-Terabyte Data Warehouse

The Oracle9i Multi-Terabyte Data Warehouse

157 Vues Download Presentation
Télécharger la présentation

The Oracle9i Multi-Terabyte Data Warehouse

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Session id: The Oracle9i Multi-Terabyte Data Warehouse Jeff ParkerManager Data Warehouse Development Amazon.com

  2. The Challenges • Rapidly evolving business • Growing data volumes • Do more with less

  3. The Challenges • Rapidly evolving business • New international markets • Continual innovation of features on Amazon • Buy it used • Magazine subscriptions • Marketplace Partnerships – Toys R Us, Target • Growing data volumes • Do more with less

  4. The Challenges • Rapidly evolving business • Growing data volumes • 2X growth yearly over the past 5 years • Currently 10 Terabytes of raw data • Do more with less

  5. The Challenges • Rapidly evolving business • Growing data volumes • Do more with less • Innovative use of technology and resources • Throwing money and people at the problem is not an option • Leverage existing investment in Oracle

  6. Addressing the issues • Rapidly evolving business • Denormalize only for performance reasons • Create a solution that allows new datasets to be brought in rapidly to the DW, but without high maintenance costs • Growing data volumes • Do more with less

  7. Addressing the issues • Rapidly evolving business • Growing data volumes • Dual database approach to ETL • Staging database for efficient transformation of large datasets. SQL and hash-joins allow transforms to scale in a non-linear fashion • Second database optimized for analytics • Oracle as an API • Simplifies ETL architecture • Better scalability than traditional ETL tools • Do more with less

  8. Addressing the issues • Rapidly evolving business • Growing data volumes • Do more with less • One DW schema supports all countries • Cut costs by eliminating unneeded software • Data driven Load functionality

  9. The ETL Process • Extract data from source • The Load process • Dimensional Transforms

  10. The ETL Process • Extract data from source • Can create one or more files to be loaded • Must produce Metadata upon which the Load process can depend • The Load Process • Dimensional Transforms

  11. Extract produced Metadata • Describes each field in database type terms • Changes as the dataset changes • Can reference multiple files • Very reliable • No additional overhead

  12. XML Based Metadata <DATA CHARSET="UTF8" DELIMITER="\t" ROWS=”1325987> <COLUMNS> <COLUMN ID="dataset_id" DATA_TYPE="NUMBER" DATA_PRECISION="38" DATA_SCALE="0“/> <COLUMN ID="dataset_name" DATA_TYPE="VARCHAR2" DATA_LENGTH="80“/> <COLUMN ID="CREATION_DATE" DATA_TYPE="DATE" DATE_MASK="YYYY/MM/DD.HH24:MI:SS“/> <COLUMN ID="CREATED_BY" DATA_TYPE="VARCHAR2" DATA_LENGTH="8“/> </COLUMNS> <FILES> <FILE PATHNAME="/flat/datasets_20020923_US.txt.1“/> <FILE PATHNAME="/flat/datasets_20020923_US.txt.2“/> </FILES> </DATA>

  13. The ETL Process • Extract data from source • The Load Process • Makes extensive use of External Tables • MERGE and Bulk Insert • Contains integrated DBA tasks • Every load is tracked in an operational database • Dimensional Transforms

  14. The Load Process

  15. The Load Process • External Tables • access to files on the operating system • Is a building block in a broader ETL process • MERGE & Bulk Insert • Integrated DBA tasks

  16. The External Table • Created by using Metadata from the Extract process • Data is read-only • No indexes • Use DBMS_STATS to set number of rows

  17. Example External Table • Copy the data to the database server • Data must reside in a file system location specified by the DBA’s. - create directoryDAT_DIR as ‘/stage/flat’

  18. Example External Table 2. Create the external table using the DML from the extract. CREATE TABLE XT_datasets_77909 ( dataset_id NUMBER , dataset_name VARCHAR2(80) , creation_date DATE ,created_by VARCHAR2(8) ) ORGANIZATION EXTERNAL( TYPE ORACLE_LOADER DEFAULT DIRECTORY dat_dir ACCESS PARAMETERS( records delimited by newline characterset UTF8 fields terminated by '\t' LOCATION (‘/flat/datasets_20020923_US.txt' )

  19. The External Table • No pre-staging of data • Ability to describe a flat file to Oracle • Handles horizontally partitioned files • Good error messaging

  20. The Load Process • External Tables • MERGE • Can be run in parallel • Combined with external table provides a powerful set of ETL tools • Integrated DBA tasks

  21. MERGE • Allows for update or insert in a single statement • If key value already exists • Yes, update row • No, insert row • MERGE statement is auto-generated • Row level column transforms are supported

  22. MERGE

  23. MERGE example MERGE into DATASETS ds USING ( SELECT ds.dataset_name ,ds.creation_date ,nvl(created_by,’nobody’) as created_by ,sysdate as last_updated FROM XT_datasets_77909 xt ) src On ( xt.dataset_id = ds.dataset_id ) When matched then UPDATE SET ds.dataset_name = src.dataset_name ,ds.creation_date = src.dataset_name ,ds.created_by = src.created_by ,ds.last_updated = sysdate when not matched then INSERT( dataset_name, creation_date, created_by, last_updated ) VALUES( dataset_name, creation_date, created_by, sysdate )

  24. MERGE • Issues we faced • Duplicate records in the dataset • NESTED-LOOPS from external table • Parallelism is not enabled by default • Bulk Load partition determination

  25. The Load Process • External Tables • MERGE • Integrated DBA tasks • Reduces workload required by the DBA team • Streamlines the load process • Eliminates human error

  26. Integrated DBA Tasks • Provided by the DBA team • Managed by the DBA team • ETL team does not need special knowledge of table layout

  27. Integrated DBA Tasks • Truncate Partition developer makes call truncate_partition( ‘TABLE-NAME’, partition-key1, partition-key2, partition-key3 ) DBA utility translates this and executes alter table TABLE-NAME drop partition dbi20020930_101;

  28. Integrated DBA Tasks • Analyze Partition developer makes call analyze_partition( ‘TABLE-NAME’, partition-key1, partition-key2, partition-key3 ) DBA utility translates this and executes dbms_stats.gather_table_stats(ownname , tabname , partname , cascade , estimate_percent, granularity);

  29. Integrated DBA Tasks • Return Partition Name developer makes call get_partition_name( ‘TABLE-NAME’, partition-key1, partition-key2, partition-key3 ) DBA utility translates this and returns the appropriate name of the partition. This is very useful when bulk loading tables.

  30. Integrated DBA Tasks • Partitioning utilities • Helps to streamline the process • Reduces workload of DBA team • Helps to eliminate the problem of double loads for Snapshot tables and partitions

  31. The Load Process • External Tables • Provides access to flat files outside the database • MERGE • Parallel “upsert” simplifies ETL • Row level transforms can be performed in SQL • Integrated DBA tasks • Reduces workload required by the DBA team • Streamlines the load process • Eliminates human error • Loads are repeatable processes

  32. Summary • Reduction in time to integrate new subject areas • Oracle parallelism scales well • Eliminated unneeded software

  33. Summary • Oracle has delivered on the DW promise • Oracle External table combined with MERGE is a viable alternative to other ETL tools • ETL tools are ready today

  34. & Q U E S T I O N S A N S W E R S

  35. Reminder – please complete the OracleWorld session surveyThank you.