1 / 33

Populating a Data Warehouse

Populating a Data Warehouse. Overview. Process Overview Methods of Populating a Data Warehouse Tools for Populating a Data Warehouse Populating a Data Warehouse by Using DTS. Source OLTP Systems. Temporary Data Staging Area. Data Marts. SQL Server. Sales. Oracle. Data Warehouse.

urbain
Télécharger la présentation

Populating a Data Warehouse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Populating a Data Warehouse

  2. Overview • Process Overview • Methods of Populating a Data Warehouse • Tools for Populating a Data Warehouse • Populating a Data Warehouse by Using DTS

  3. Source OLTP Systems Temporary DataStaging Area Data Marts SQL Server Sales Oracle Data Warehouse Service Other Other Sales Data Hardware Data Validate, Gather , Transform Populate Data Distribute Make Data Consistent Data Warehouse Data Process Overview

  4. Validating Data • Validate and Correct Data at the Source Before You Import It • Determine and Correct Processes That Invalidate Data • Save Invalid Data to a Log for Review

  5. Making Data Consistent • Data Can Be Inconsistent in Several Ways: • Data in each source is consistent, but you want to represent it differently in the data warehouse • Data is represented differently in different sources • You Can Make Data Consistent by: • Translating codes or values to readable strings • Converting multiple versions of the same information into a single representation

  6. Transform Change buyer_name reg_id total_sales buyer_name reg_id total_sales Barr, Adam II 17.60 Barr, Adam 2 17.60 Chai, Sean IV 52.80 Chai, Sean 4 52.80 O’Melia, Erin VI 8.82 O’Melia, Erin 6 8.82 ... ... ... ... ... ... Combine buyer_first buyer_last reg_id total_sales buyer_name reg_id total_sales Adam Barr 2 17.60 Barr, Adam 2 17.60 Sean Chai 4 52.80 Chai, Sean 4 52.80 Erin O’Melia 6 8.82 O’Melia, Erin 6 8.82 ... ... ... ... ... ... ... Calculate buyer_name price_id qty_id buyer_name price_id qty_id total_sales Barr, Adam .55 32 Barr, Adam .55 32 17.60 Chai, Sean 1.10 48 Chai, Sean 1.10 48 52.80 O’Melia, Erin .98 9 O’Melia, Erin .98 9 8.82 ... ... ... ... ... ... ... Transforming Data

  7. Methods of Populating a Data Warehouse • Select the Method of Populating a Data WarehouseThat Suits Your Business Needs • Method 1: Validate, combine, and transform datain a temporary data staging area • Method 2: Validate, combine, and transform data during the loading process • Migrate Data During Periods of Relatively Low System Use

  8. Tools for Populating a Data Warehouse • What Is the Appropriate Tool to Use • Transact-SQL Query • Distributed Query • bcp Utility and the BULK INSERT Statement • DTS

  9. What Is the Appropriate Tool to Use • Format of Source and Destination Data • Location of Source and Destination Data • Import or Export of Database Objects • Frequency of Data Transfer • Interface Preference • Tool Performance

  10. CustomerSummary Customer FullName Johnson, Steve Smith, Douglas FirstName LastName Wilson, Les Steve Johnson Salinger, Paul Douglas Smith Les Wilson Paul Salinger USE northwind_mart SELECT Lastname + ', ' + Firstname As Fullname INTO CustomerSummary FROM Northwind.dbo.Customer Transact-SQL Query

  11. AccountingServer StockServer Sales Local SQL Server Suppliers Table Products Table Sales Item_Dim Table Distributed Query USE northwind_mart SELECT productname, companyname INTO item_dim FROM StockServer.sales.dbo.products p JOIN AccountingServer.sales.dbo.suppliers s ON p.supplierid = s.supplierid

  12. bcp Utility and the BULK INSERT Statement bcp Utililty BCP accounting.dbo.orders in Orderstbl.dat –c –t, -r \n–Smysqlserver –Usa –Pmypassword BULK INSERT Statement BULK INSERT Accounting.dbo.orders FROM 'C:\ordersdir\orderstble.dat' WITH( DATAFILE TYPE = 'char' FIELDTERMINATOR = '|', ROWTERMINATOR = '|\n')

  13. DTS • When to Use DTS • DTS Data Source and Destination Types • OLE DB • ODBC • ASCII text file • DTS Tools • DTS Import and Export wizards • DTS Designer • dtsrun utility • Custom • HTML • Spreadsheet

  14. Populating a Data Warehouse by Using DTS • Building a DTS Package • Transforming Data by Using an ActiveX Script • Transforming Data by Using a Lookup Query • Defining Transactions • Tracking Data Lineage • Creating a DTS Package Programmatically

  15. Building a DTS Package • Mapping Source and Destination Data • Defining Data Transformation Tasks • Creating and Saving a DTS Package • Executing a DTS Package • Scheduling and Securing a DTS Package

  16. Mapping Source and Destination Data • Mapping Columns • Decide which columns to copy • Choose the columns in the target database that map to the source columns • Mapping Data Types • Specify transformation rules • Specify levels of data conversion

  17. Defining Data Transformation Tasks • DTS Packages Contain Tasks • A Task Can: • Execute a Transact-SQL statement • Execute a script • Launch an external application • Transfer SQL Server 7.0 objects • Execute or retrieve results from a DTS package

  18. Creating and Saving a DTS Package • Creating a DTS Package • By using DTS wizards • By using DTS Designer • By using a COM interface exposed by DTS • Saving a DTS Package • COM-structured storage file • Microsoft Repository • SQL Server msdb database

  19. Executing a DTS Package • You Can Execute a DTS Package by Using SQL Server Enterprise Manager or dtsrun Command Prompt Utility • File Storage Location Determines the dtsrun Syntax dtsrun /sAccounts /uJose /nOrdersImport

  20. Scheduling and Securing a DTS Package • Scheduling a DTS Package • Use DTS Import or DTS Export wizards when you save the DTS package to the msdb database • Use SQL Server Enterprise Manager when you usethe dtsrun command prompt utility • Implementing DTS Package Security • Login permissions • Owner and user passwords

  21. Demonstration: Transferring Data by Using DTS

  22. Transforming Data by Using an ActiveX Script • Why Use an ActiveX Script • How to Use an ActiveX Script • Define a function to contain the transformation script • Specify the destination column • Specify the source columns • Use operators and VBScript or JScript functions and control-of-flow statements • Set the return code value for the function • How to Handle Errors with Return Codes

  23. CustomerSummary Customer FullName Johnson, Steve Smith, Douglas FirstName LastName Wilson, Les Steve Johnson Salinger, Paul Douglas Smith Les Wilson Paul Salinger Function Main() DTSDestination(“FullName”) = DTSSource(“Lastname”) + “, ” + DTSSource(“Firstname”) Main = DTSTransformStat_OK End Function Examples of ActiveX Scripts

  24. Demonstration: Transforming Data by Using an ActiveX Script

  25. Source Data State_lookup Abbreviation State FL Florida Lookup Table WY Wyoming AR Arkansas Transform Customer_source Customer_dim Destination Data Name State Name State D. Smith FL D. Smith Florida L. Wilson WY L. Wilson Wyoming P. Salinger AR P. Salinger Arkansas Transforming Data by Using a Lookup Query

  26. Implementing a Lookup Query • Set Up Connections to Source, Destination, and Lookup Tables • Create a Task, and Specify the Source and Destination • Add a Lookup Query Definition • Map the Source and Destination Columns, andCall the Lookup Query from the ActiveX Script

  27. Defining Transactions • You Specifically Must Add a Step or Task to the Transaction • You Can Specify When a Transaction Commits • DTS Only Supports One Transaction Per Package • MS DTC Must Be Running • The Data Provider for the Data Destination Must Support Transactions

  28. 1 2 3 Tracking Data Lineage • Using Data Lineage • Tracks history of data at package and table row levels • Provides audit trail of data transformation and DTS package execution • Implementing Data Lineage • Create the table columns in the data warehouse • Add data lineage variables to the DTS package • Map data lineage source and destination columns • Viewing Data Lineage

  29. Demonstration: Defining Transactions and Tracking Data Lineage

  30. DTS Package Steps Steps Steps Steps Steps Precedence Constraints Create Process Send Mail Bulk Insert Transfer Objects Steps Steps Steps Steps Steps Steps Connections Global Variables Tasks Execute SQL Data-driven Query Custom ActiveX Data Pump Source Columns Destination Creating a DTS Package Programmatically

  31. Correct and Validate Data at the Source Use a Temporary Data Storage Area Use an ActiveX Script or a Transact-SQL Script to Transferand Transform Data Save and Store DTS Packages in the Microsoft Repository to Maintain Data Lineage Recommended Practices

  32. Lab A: Populating a Data Warehouse

  33. Review • Process Overview • Methods of Populating a Data Warehouse • Tools for Populating a Data Warehouse • Populating a Data Warehouse by Using DTS

More Related