1 / 52

DATAWAREHOUSE

DATAWAREHOUSE. Introduction. Information. Data. What is Data Warehousing?. A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research]. What are the users saying.

Télécharger la présentation

DATAWAREHOUSE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DATAWAREHOUSE Introduction www.notesvillage.com

  2. Information Data What is Data Warehousing? A process of transforming data into information and making it available to users in a timely enough manner to make a difference [Forrester Research] www.notesvillage.com

  3. What are the users saying... • Data should be integrated across the enterprise • Summary data has a real value to the organization • Historical data holds the key to understanding data over time • What-if capabilities are required www.notesvillage.com

  4. What is a Data Warehouse? A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. [Barry Devlin] www.notesvillage.com

  5. Data, Data everywhereyet ... • I can’t find the data I need • data is scattered over the network • many versions, subtle differences • I can’t get the data I need • need an expert to get the data • I can’t understand the data I found • available data poorly documented • I can’t use the data I found • results are unexpected • data needs to be transformed from one form to other www.notesvillage.com

  6. Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? What is the most effective distribution channel? What product prom--otions have the biggest impact on revenue? Which customers are most likely to go to the competition ? What impact will new products/services have on revenue and margins? A producer wants to know…. www.notesvillage.com

  7. Necessity • Need for Strategic Information • Retain consumer base • Increase customer base by 20% • Enhance consumer satisfaction level • Launch two new products • Information Crisis • How to maintain lots of data in the organization • IT resources and strategy is not effective for data • Opportunities and Risks: • Quick decisions • Managerial analysis • Crisis management www.notesvillage.com

  8. History of DSS • Ad hoc reports • Small applications • Information centers • Decision support system www.notesvillage.com

  9. Disadvantages • Many adhoc requests . • Supplementary report problems • No interactivity of the user • Flexible informations. www.notesvillage.com

  10. OPERATIONAL VS INFORMATIONAL • Current Values • Optimized for transaction • High frequency • Read Update Delete • Large user • Response time is very fast • Archived value • Optimized for complex queries • Low • Read • Adhoc,Random • Relatively more several seconds www.notesvillage.com

  11. Scenerio 1 • ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system. www.notesvillage.com

  12. Scenario 1 : ABC Pvt Ltd. Mumbai Delhi Sales per item type per branch for first quarter. Sales Manager Chennai Banglore www.notesvillage.com

  13. Solution 1:ABC Pvt Ltd • Extract sales information from each database. • Store the information in a common repository at a single site. www.notesvillage.com

  14. Solution 1:ABC Pvt Ltd . Mumbai Data Warehouse Report Delhi Query & Analysis tools Sales Manager Chennai Banglore www.notesvillage.com

  15. Scenario 2 : One Stop Shopping Data Entry Operator Operational Database Report Wait Management Data Entry Operator www.notesvillage.com

  16. Solution 2 • Extract data needed for analysis from operational database. • Store it in warehouse. • Refresh warehouse at regular interval so that it contains up to date information for analysis. • Warehouse will contain data with historical perspective. www.notesvillage.com

  17. Solution 2 Data Entry Operator Report Operational database Data Warehouse Extract data Manager Transaction Data Entry Operator www.notesvillage.com

  18. Scenario 3 Cakes & Cookies is a small,new company.President of the company wants his company should grow.He needs information so that he can make correct decisions. www.notesvillage.com

  19. Solution 3 • Improve the quality of data before loading it into the warehouse. • Perform data cleaning and transformation before loading the data. • Use query analysis tools to support adhoc queries. www.notesvillage.com

  20. Solution 3 Expansion sales Data Warehouse Query and Analysis tool President time Improvement www.notesvillage.com

  21. Definitions of a Data Warehouse “A subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process” 1. - W.H. Inmon “A copy of transaction data, specifically structured for query and analysis” 2. - Ralph Kimball www.notesvillage.com

  22. Data Warehouse • For organizational learning to take place, data from many sources must be gathered together and organized in a consistent and useful way – hence, Data Warehousing (DW) • DW allows an organization (enterprise) to remember what it has noticed about its data • Data Mining techniques make use of the data in a DW www.notesvillage.com

  23. What is Data Warehouse?? • Inmons’s definition A data warehouse is -subject-oriented, -integrated, -time-variant, -nonvolatile collection of data in support of management’s decision making process www.notesvillage.com

  24. Subject-oriented • Data warehouse is organized around subjects such as sales, product, customer. • It focuses on modeling and analysis of data for decision makers. • Excludes data not useful in decision support process. www.notesvillage.com

  25. Integrated Data • Integration • Data Warehouse is constructed by integrating multiple heterogeneous sources. • Data Preprocessing are applied to ensure consistency. RDBMS Data Warehouse Legacy System www.notesvillage.com Flat File

  26. Integration • In terms of data. • encoding structures. • Measurement of • attributes. • physical attribute. • of data • naming conventions. • Data type format www.notesvillage.com

  27. Time-variant • Provides information from historical perspective e.g. past 5-10 years • Data is stored as snapshots • Every key structure contains either implicitly or explicitly an element of time • Allows analysis for the past • Relates information to the present • Enables forecasts for the future www.notesvillage.com

  28. Nonvolatile • Data once recorded cannot be updated. • Data warehouse requires two operations in data accessing • Initial loading of data • Access of data www.notesvillage.com

  29. Data Granularity • Operational system is kept at the lowest level • Data is summarized at different levels • The lower the level the more data granularity. www.notesvillage.com

  30. Operational v/s Information System www.notesvillage.com

  31. Data Mart • A Data Mart is a smaller, more focused Data Warehouse – a mini-warehouse. • A Data Mart typically reflects the business rules of a specific business unit within an enterprise. www.notesvillage.com

  32. Data Mart Data Mart Data Mart Data Warehouse to Data Mart Decision Support Information Data Warehouse Decision Support Information Decision Support Information www.notesvillage.com

  33. www.notesvillage.com

  34. Meta-data Repository Layer Warehouse Management Layer Data Marts vs Data Warehouses Internet/Intranet Layer 11 direct queries virtual queries Operational Data Layer ad hoc queries Virtual DW Core DW Layer Data Feed/Data Mining/Indexing Layer Coarse DW Presentation/ Desktop Access Layer Data Staging and Quality Layer Data Access Layer Central DW 2a Data Mart Layer External Data Layer Distributed DW 3 2b Non-operational Data Layer 1 4 7 2c 6 5 8 9 10 Application Messaging (Transport) Layer www.notesvillage.com

  35. Operational vs. InformationalSystems Operational Systems Information Delivery System Data Warehouse Data Marts Informational Systems www.notesvillage.com

  36. Virtual Date Warehouse • A Virtual Data Warehouse approach is often chosen when there are infrequent demands for data and management wants to determine if/how users will use operational data. • One of the weaknesses of a Virtual Data Warehouse approach is that user queries a made against operational DBs. • One way to minimize this problem is to build a “Query Monitor” to check the performance characteristics of a query before executing it. www.notesvillage.com

  37. A Coarse Data Warehouse is often chosen when the organization has a relatively clean/new operational system and management wants to make the operational data more easily available for just that system. • A Central Data Warehouse is often chosen when the organization has a clear understanding about it Information Access needs and wants to provide “quality”, “integrated” , information to its knowledge workers • A Distributed Data Warehouse is similar in most respects to a Central Data Warehouse, except that the data is distributed to separate mini-Data Warehouses (Data Marts )on local or specialized servers www.notesvillage.com

  38. Meta-data Repository Layer Warehouse Management Layer Central Data Warehouse Internet/Intranet Layer 11 direct queries virtual queries Operational Data Layer ad hoc queries Virtual DW Core DW Layer Data Feed/Data Mining/Indexing Layer Coarse DW Presentation/ Desktop Access Layer Data Staging and Quality Layer Data Access Layer Central DW 2a Data Mart Layer External Data Layer Distributed DW 3 2b Non-operational Data Layer 1 4 7 2c 6 5 8 9 10 Application Messaging (Transport) Layer www.notesvillage.com

  39. Top-down approach • Single, Central storage of data about the content • Centralized Rule and Content • May seek quick result if implemented with Iteration • Needs high level cross functional skills • High outlay with proof of concepts • High exposure to risk/Failures www.notesvillage.com

  40. www.notesvillage.com

  41. Data Warehouse Architectures • Generic Two-Level Architecture • Independent Data Mart • Dependent Data Mart and Operational Data Store • Logical Data Mart and Real-Time Data Warehouse • Three-Layer architecture All involve some form of extraction, transformation and loading (ETL) www.notesvillage.com

  42. Process Architecture • These interrelated sub-processes are sometimes referred to as an “ETL” process. • 1)Extract- Since data for the data warehouse can come from different sources and may be of different types, the plan to extract the data along with appropriate compression and encryption techniques is an important requirement for consideration. • 2)Transform- Transformation of data with appropriate conversion, aggregation and cleaning besides de-normalization and surrogate key management is also an important process to be planned for building a data warehouse. • 3)Load- Steps to be considered to load data with optimization by considering the multiple areas where the data is targeted to be loaded and retrieved is also an important part of the data warehouse architecture plan www.notesvillage.com

  43. Data Model Architecture • In Data Model Architecture (also known as Dimensional Data Model), there are 3 main data modeling styles for enterprise warehouses: • 3rd Normal Form - Top Down Architecture, Top Down Implementation • Federated Star Schemas - Bottom Up Architecture, Bottom Up Implementation • Data Vault - Top Down Architecture, Bottom Up Implementation . www.notesvillage.com

  44. Technology Architecture • Scalability and flexibility is required in all facets. The extent of these features are largely depend upon organizational size, business requirements, nature of business etc. • Technology or Technical architecture primary evolved from derivations from the process architecture, meta data management requirements based on business rules and security levels implementations and technology tool specific evaluation. • Besides these, the Technology architecture also looks into the various technology implementation standards in database management, database connectivity protocols (ODBC, JDBC, OLE DB etc), Middleware (based on ORB, RMI, COM/DOM etc.), Network protocols (DNS, LDAP etc) and other related technologies. www.notesvillage.com

  45. Information Architecture • Information Architecture is the process of translating the information from one form to another in a step by step sequence so as to manage the storage, retrieval, modification and deletion of the data in the data warehouse. www.notesvillage.com

  46. Resource Architecture • Resource architecture is related to software architecture in that many resources come from software resources. Resources are important because they help determine performance. • Workload is the other part of the equation. If you have enough resources to complete the workload in the right amount of time, then performance will be high. www.notesvillage.com

  47. Figure 11-2: Generic two-level data warehousing architecture L One, company-wide warehouse T E Periodic extraction  data is not completely current in warehouse www.notesvillage.com

  48. Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality Figure 11-10: Steps in data reconciliation (cont.) Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data www.notesvillage.com

  49. Data marts: Mini-warehouses, limited in scope L T E Separate ETL for each independent data mart Data access complexity due to multiple data marts Figure 11-3 Independent data mart data warehousing architecture www.notesvillage.com

  50. Figure 11-4 Dependent data mart with operational data store:a three-level architecture ODS provides option for obtaining current data L T E Simpler data access Single ETL for enterprise data warehouse (EDW) Dependent data marts loaded from EDW www.notesvillage.com

More Related