1 / 25

Lecture 5

Lecture 5. Themes in this session Building and managing the data warehouse Data extraction and transformation Technical issues. Building the data warehouse. Basic guidelines for the creation of a DW. Create corporate sponsors and plan thoroughly

reed
Télécharger la présentation

Lecture 5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5 Themes in this session • Building and managing the data warehouse • Data extraction and transformation • Technical issues

  2. Building the data warehouse

  3. Basic guidelines for the creation of a DW • Create corporate sponsors and plan thoroughly • Determine a scalable architectural framework for the DW • Identify and document all assumptions, conflicts and issues at the start of the project • Choose methodology and tools which are compatible with the organisation • Take the continuous nature of the DW life cycle into account • Ensure a thorough data analysis and resolve all data conflicts • Learn from your experience and above all, learn from your mistakes

  4. The data warehouse life cycle Investigation On-going data administration Analysis of current environment Implementation Identify requirements Development Identify architecture Data warehouse design

  5. Creating a strategy for a data warehousing • Identification of current information strategy (usually an implicit strategy) • Internal appraisal of the strategy • External appraisal of the strategy • Perform a strategic gap analysis • Enumerate strategic alternatives which can be achieved through the application of a DW • Choose among alternatives on the basis of the organisations mission, objectives and resources • Formulate a strategic statement which specifies the organisation’s utilisation of its information resources and the role the DW is to play in this utilisation

  6. Building a business case for a data warehouse • Describe the AS-IS situation • Identify business goals that the DW will help to fulfil • Identify business problems that the DW can be used to solve • Describe the TO-BE situation • Explain how the data warehouse will be used to evolve the organisation from the AS-IS to the TO-BE situation • Calculate the expected ROI for the data warehouse • identify all costs associated with the data warehouse • quantify all the benefits between the AS-IS and TO-BE situation

  7. Creating a project plan • What is the project scope? • determine the scope of the data • determine the role of technology • determine any temporal considerations • What is the business reason? • identify the essential purpose of the project • identify business drivers • What are the critical success factors for the project? • identify critical objectives • identify critical tasks and activities • What are the resource constraints on the project? • Determine the need for resources • determine the resource availability

  8. Output from the planning phase • A set of activities to be performed and a set of requirements on these activities (performance metrics) • Documented business drivers • Definition of scope of data • Defined temporal scope • Business reasons • The overall approach • Participants and their roles • Assumptions and constraints • Project management strategy

  9. Data warehouse development activities Architecture definition Decision maker needs Source system analysis Transform design Physical database design Warehouse development Data modelling End-user access development End-user access design Subject area analysis Planning and project initiation Warehouse populate and implement End-user access definition

  10. Project delivery tactics • First things first • Market the project • Adopt a customer-focused orientation • Deliver everything well Note: the project must always be able to show progress and have the ability to deliver business value

  11. Focus areas for the management of a data warehouse • Monitor and manage data warehouse activity • Monitor and manage data warehouse data • Monitor and manage security in the data warehouse • Monitor and manage the data warehouse data model • Monitoring and managing data warehouse metadata • Monitoring and managing the integration and transformation interface • Monitoring and managing the demands of the data warehouse’s business environment

  12. Staffing requirements for initial data warehouse development and subsequent DW management • Data warehouse management and maintenance • Data warehouse administrator • Data warehouse organisational change manager • Database administrator • Data warehouse maintenance developers • Metadata Manager • Analysis and design • Business requirements analysts • User groups • Data warehouse architect • Data procurement • Data quality analyst • Data acquisition developer • Data access developer • IS executive sponsor

  13. Data warehouse end-user roles and responsibilities • Support roles • Iteration sponsors • Subject matter experts • User support technician • User types • Unlimited ad hoc user access users • Limited ad hoc access users • Predefined application users • Data warehouse initial and ongoing end-user staffing • Subject matter experts • User support technician

  14. Data extraction and transformation

  15. Data Quality (1) • Data should be accurate • Data should be stored according to data type • Data should have high integrity • Data should be consistent • Databases should be well designed • Data should not be redundant • Data should follow business rules • Data should correspond to established domains

  16. Data quality (2) • Data should be timely • Data should be well understood • Data should be integrated • Data should satisfy the needs of the business • Users should be satisfied with the data and the information derived from the data • Data should be complete • There should be no duplicate records • There should be no data anomalies

  17. A four-phase process to achieve high data quality • Data investigation • parsing • lexical analysis • pattern investigation • data typing • Data conditioning and standardisation • Data integration • Data Survivorship and formatting

  18. Fundamental types of data transformation • Simple transformation • data type conversion • date/time format conversions • field decoding • Cleansing and scrubbing • valid values • complex reformatting • Integration • simple field level mappings • complex integration • Aggregation and summarisation

  19. Other transformations • Operating system conversions • Hardware architecture conversions • affects the structure of data • affects the structure of programs running against the data • affects the computer operations needed for each environment • the available software which makes the different environments run • Application conversions

  20. Types of source system extracts • Point-in-time snapshots • scheduled at specific points in time • efficient method for users to pinpoint specific points in time or ranges of time • unfortunately requires nearly a complete read of all operational sources of data • Significant business events • non-predetermined events drive the capture of data • captured as a snapshot of relevant data entities • triggered when a completion event is performed • Delta data • see next slide...

  21. Types of source system extracts - Delta data Delta data is both new and changed data, it represents changes from one point in time to the next • Delta data can be captured in a number of ways: • Operational events • Changed data capture • date last modified • Point-in-time comparisons

  22. Types of data warehouse updates • Insert • Full replace • Partial replace • Update • Update plus insert • Insert with update • Replace and insert

  23. Copy management • What to extract? • When to extract? • How to extract? • Transformation requirements? • What to transform? • How to transform? • What to update? • When to update? • How to update? • How to generate the necessary metadata?

  24. Technical issues

  25. Major questions affecting the choice of the technical solution • Connectivity and interoperability? • Need for parallel processing ability? • Scalability? • Standards? • Single vendor/multi-vendor • Vendor stability and service? • In house competency? • Compatibility with existing systems architecture? • Compatability with IT strategy? • Functionality vs Cost?

More Related