1 / 24

Transportation: Loading Warehouse Data

Transportation: Loading Warehouse Data. Chapter 12. Transporting data into the Warehouse. Loading moves the data into the warehouse Loading can be time-consuming: - Consider the load window. - Schedule the task; automates all processes.

fala
Télécharger la présentation

Transportation: Loading Warehouse Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transportation: Loading Warehouse Data Chapter 12

  2. Transporting data into the Warehouse • Loading moves the data into the warehouse • Loading can be time-consuming: - Consider the load window. - Schedule the task; automates all processes. • Initial load moves large volumes • Subsequent refresh moves smaller volumes • Business determines the cycle

  3. Extract Processing Environment • After each time interval, build a new database • Run queries Operational database T1 T2 T3

  4. Warehouse Processing Environment • Build a new database • After each time interval, add changes to database • Archive or pure oldest data • Run queries Operational database T1 T2 T3

  5. First-Time Load • Single event that populates the data with historical data • Involves large volume of data • Employs distinct ETT tasks • Involves large amounts of processing after load Operational database T1 T2 T3

  6. Refresh • Performed according to a business cycle • Simple task • Less data to load than first-time load • Less-complex ETT • Smaller amounts of postload processing Operational database T1 T2 T3

  7. Building the Transportation Process Specification • Techniques and tools • File transfer methods • The load window • Time window for other tasks • First-time and refresh cycle • Connectivity bandwidth

  8. Building the Transportation Process • Test the proposed techniques • Document proposed load • Gain agreement on the process • Monitor • Review • Revise

  9. Granularity Low-level grain - Expensive, high level of processing, more disk, detail High-level grain - Cheaper, less processing, less disk, little detail • Important design and operational issue • Space requirements - Storage - Backup - Recovery - Load

  10. Transportation Techniques • Tools • Utilities and 3GL • Gateways • Customized copy programs • Replication • FTP • Manual

  11. Transportation Technique Considerations • Tools are comprehensive but costly. • Data-movement utilities are fast and powerful. • Gateways are not always the fastest method: - Access other databases - Supply dependent data marts - Support a distributed environment - Provide real-time access if needed

  12. Using SQL*Loader to Load Data Control file Input files • Fastest load mechanism • Direct path • Parallel and unrecoverable • Direct-load INSERT (Oracle8) • Direct-path load API (Oracle8i) SQL*Loader Bad files Log files Discard files

  13. Direct-Path Load API in Oracle8i • Allows ETT and other tools to load Oracle databases efficiently • Permits load behavior to be customized • Gives direct-path load performance • Provided complete access to all direct-load functionality using OCI

  14. More Transportation Technique Considerations • Use customized programs as a last resort • Replication is limited by data-transfer rates

  15. Postprocessing of Loaded Data Transform Extract Transport Create Indexes Generate keys Postprocessing of loaded data Summarize Filter

  16. Indexing Data • Before load: fast index reenablement • During load: adds time to load window • After load: adds time to load window Index Operational databases Staging files Warehouse database

  17. Unique Indexes • Disable constraints load • Enable constraints to create index Disable constraints Load data Enable constraints Create index Catch errors Reprocess

  18. Creating Artificial Keys • Use generalized or derived keys • Maintain the uniqueness of a row • Use an administrative process to assign the key • Concatenate operational key with number: - Easy to maintain - Cumbersome keys - No clean value for retrieval

  19. Creating Unique Keys for Records • Assign a number from a list: - No semantic meaning - Extract operations must reference table assign numbers Update metadata • Verdict 109908 1

  20. Creating Summary Tables • CTAS • pCTAS Summary data Warehouse Data marts

  21. Verifying Data Integrity • Load data into intermediate file • Compare target flash totals with totals before load Flash totals Counts and amounts File 1 Load Load File 1 File 2 File 2 Preserve, inspect, fix, then load Intermediate file Warehouse

  22. Steps for Verifying Data Integrity

  23. Standard Quality Assurance Checks • Load status • Completion of the process • Completeness of the data • Data reconciliation • Violations • Reprocessing • Comparison of counts and amounts

  24. Summary This lesson discussed the following topics: • First-time load considerations • Techniques for transporting data • Tasks involved in the postload processing stage

More Related