1 / 24

Data Warehousing Designing the data warehouse

Data Warehousing Designing the data warehouse. Instructor: Peter Rawsthorne. Class 2 review. Go to class2.ppt Completion of install TSQL Managing the Project Database design of the data warehouse. OLTP vs. DSS. OLTP Normalization Entity Relationships Classic SDLC DSS Star Schema

tonya
Télécharger la présentation

Data Warehousing Designing the data warehouse

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data WarehousingDesigning the data warehouse Instructor: Peter Rawsthorne

  2. Class 2 review • Go to class2.ppt • Completion of install • TSQL • Managing the Project • Database design of the data warehouse

  3. OLTP vs. DSS • OLTP • Normalization • Entity Relationships • Classic SDLC • DSS • Star Schema • Snowflake Schema • Very Iterative

  4. Prerequisites • The OLTP data model • Technology selection • Sizing the data warehouse • bytes per row • number of rows per table (fact & dimensions) • time intervals • Collecting informational requirements

  5. Technology selection • the volume of data to be accommodated, • the speed with which data is needed, • the history of the organization, • which level of data is being built, • how many users there will be, • what kind of analysis is to be performed, • the cost of technology, etc.

  6. Building the data model • The corporate data model • Other sources • The mid-level data • Transforming the data model • The first iteration • The system of record • Transforming the data • Scheduling the transformation

  7. The corporate data model • the major subjects of the enterprise, • the relationships between the subjects, • the creation of an ERD (entity relationship diagram), • for each major subject area: • the keys(s) of the subject, • the attributes of the subject, • the subtypes of the subject, • the connectors of one subject area to the next, • the grouping of attributes.

  8. The corporate data model

  9. Other sources • functional decomposition, • data/process matrices, • data flow diagrams, • state transition diagrams, • HIPO charts, • pseudocode, et al.

  10. mid-level model

  11. Each subject has a mid-level

  12. Transforming the data model • the removal of purely operational data, • the addition of an element of time to the key structure of the data warehouse if one is not already present, • the addition of appropriate derived data, • the transformation of data relationships into data artifacts, • accommodating the different levels of granularity found in the data warehouse, • merging like data from different tables together,

  13. Removing operational data

  14. Adding the element of time

  15. Adding derived data

  16. Creating relationship artifacts • An artifact of a relationship is merely that part of a relationship that is obvious and tangible at the moment the snapshot of data is made for the data warehouse.

  17. MOC See P02_DWS.PPT power point presentation

  18. Exercise • 30 minutes • Break into groups of three • Use on of the existing SQL7 databases and create a star schema logical design using this database • Be prepared to present your design.

  19. The first iteration • HOW MUCH DATA TO LOAD? • THE FIRST ITERATION SHOULD CONTAIN DATA THAT IS LARGE ENOUGH TO BE MEANINGFUL AND SMALL ENOUGH TO BE QUICKLY DOABLE. • FISHING IN THE RIGHT POND • Classic functional areas; • finance, • accounting, • sales, • marketing • SELECTING A SUBJECT AREA • Marketing campaign success • Inventory turn-over

  20. System of Record • The system of record is the data that is the: • most complete, • most accurate, • most timely, • has the best structural conformance to the data model, and • is nearest to the point of operational entry.

  21. The identification of the system of record is the most complex activity in the building of the first iteration of data warehouse development. There are MANY circumstances that must be accounted for in the specification of the system of record: multiple sources of data, no sources of data under certain cases, reformatting of data, summarization of data, conversion of data, recalculation of data, alteration of key structure, restructuring of data attributes, conditional merging of data, and so forth. System of Record

  22. Transforming the data

  23. Scheduling the transformations

  24. Scheduling the transformations • The frequency of execution of transformation depends on many things, such as: • how much data will be transformed, • whether a delta file is being used, • the complexity of the transformation, • the business need to see the updated data warehouse, • the availability of the operational data, • how frequently system checkpoints are being done, • the amount of summarization and aggregation that needs to be done, and so forth.

More Related