1 / 96

Dátové sklady

Dátové sklady. Pokročilé dátové technológie Genči. Literatúra. [1] Lacko L.: Datové sklady, analýza OLAP a dolování dát s pří klady … . Computer Press. Brno. 2003 [2] Paulraj Ponniah: Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals. 2001. John Wiley & Sons, Inc.

rae-miranda
Télécharger la présentation

Dátové sklady

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dátové sklady Pokročilé dátové technológie Genči

  2. Literatúra [1] Lacko L.: Datové sklady, analýza OLAP a dolování dát s příklady … . Computer Press. Brno. 2003 [2] Paulraj Ponniah: Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals. 2001. John Wiley & Sons, Inc. ISBNs: 0-471-41254-6 (Hardback); 0-471-22162-7 (Electronic)

  3. Literatúra (pokr.) [3] Ralph Kimball, Margy Ross: The Data Warehouse Toolkit. Second Edition. 2002. Wiley Computer Publishing. [4] W. H. Inmon: Building theData WarehouseThird Edition. 2002. John Wiley & Sons, Inc.

  4. Literatúra (pokr.) • [5] Inmon W., Strauss D., Neushloss G.:DW 2.0: THE ARCHITECTURE FOR THE NEXT GENERATION OF DATA WAREHOUSING, Paperback, 400 pages, ISBN-13: 978-0-12-374319-0, MORGAN KAUFFMAN

  5. Informácie • Podľa firemnej literatúry ORACLE sa údaje stávajú informáciami, ak • máme údaje; • vieme, že máme údaje; • vieme, kde máme tieto údaje; • máme k nim prístup; • zdroju údajov môžeme dôverovať.

  6. Hierarchia informačných úrovní Múdrosť Znalosti Informácie Údaje

  7. Motivácia • Exekutíva potrebuje informácie (napr.) kvôli rozhodnutiu: • kde postaviť ďalší sklad; • ktorú produktovú líniu rozvíjať; • ktorý tržný segment by mal byť posilnený • t.j. potrebuje realizovať strategické rozhodnutia a pre ne potrebuje strategickú informáciu

  8. Strategická informácia • Nemôžu ju poskytnúť OLTP systémy • Neslúži pre denno-denné riadenie spoločnosti • Dôležitá pre zdravý vývoj a prežitie spoločnosti • Kritické rozhodnutia závisia od správnej (korektnej, patričnej) strategickej informácie

  9. Požadované vlastnosti strategickej informácie

  10. „Vstup“ dát

  11. „Výstup“ informácií

  12. Protirečenia • Organizácie majú veľké množstvo dát ale • IT zdroje a systémy nie sú schopné efektívnym spôsobom toto množstvo dát premeniť na strategickú informáciu

  13. Informačná kríza • Nie kvôli nedostatku dát, ale preto, že dáta nie sú použiteľné pre strategické rozhodovanie • Dôvody: • Údaje sú v spoločnostiach rozložené naprieč mnohými typmi nekompatibilných štruktúr a systémov • Údaje sú v spoločnostiach uložené v rôznych nezlúčiteľných systémoch, viacerých platformách a rozmanitých štruktúrach

  14. These operational systems (order processing, inventory control, claims processing, outpatient billing, ...) are not designed or intended to provide strategic information. • If we need the ability to provide strategic information, we must get the information from altogether different types of systems. • Only specially designed decision support systems or informational systems can provide strategic information.

  15. Rozdiely

  16. Processing Requirements in the New Environment Most of the processing in the new environment for strategicinformation will have to be analytical. There are four levels of analytical processing requirements: • Running of simple queries and reports against current and historical data • Ability to perform “what if ” analysis in many different ways • Ability to query, step back, analyze, and then continue the process to any desired length • Spot historical trends and apply them for future results

  17. Data warehousing concept • Take all the data you already have in the organization, clean and transform it, and then provide useful strategic information.

  18. Data warehousing concept One of the most important approaches to the integration of data sources is based on a data warehouse architecture. In this architecture, data coming from multiple external data sources (EDSs) are extracted, filtered, merged, and stored in a central repository, called a data warehouse (DW). Data are also enriched by historical and summary information. From a technological point of view, a data warehouse is a huge database from several hundred GB to several dozens of TB. Thanks to this architecture, users operate on a local, homogeneous, and centralized data repository that reduces access time to data. Moreover, a data warehouse is independent of EDSs that may be temporarily unavailable. However, a data warehouse has to be kept up to date with respect to the content of EDSs, by being periodically refreshed.

  19. Bližší pohľad na DWH

  20. Functional definition of the data warehouse The data warehouse is an informational environment that: • Provides an integrated and total view of the enterprise • Makes the enterprise’s current and historical information easily available for decision making • Makes decision-support transactions possible without hindering operational systems • Renders the organization’s information consistent • Presents a flexible and interactive source of strategic information

  21. DWH – zmes technológií

  22. Bill Inmon’s definition Bill Inmon, considered to be the father of Data Warehousing provides the following definition: • “A Data Warehouse is a subject oriented, integrated, nonvolatile, and time variant collection of data in support of management’s decisions.”

  23. The data in the data warehouse is • Separate • Available • Integrated • Time stamped • Subject oriented • Nonvolatile • Accessible

  24. Subject-oriented

  25. Integrated Data

  26. Integrated Data (2) Before the data from various disparate sources can be usefully stored in a data warehouse, you have to: • remove the inconsistencies; • standardize the various data elements; • make sure of the meanings of data names in each source application.

  27. Integrated Data (3) • Before moving the data into the data warehouse, you have to go through a process of transformation, consolidation, and integration of the source data. • Here are some of the items that would need standardization: • Naming conventions • Codes • Data attributes • Measurements

  28. Time-Variant Data • For an operational system, the stored data contains the current values. • The data in the data warehouse is meant for analysis and decision making. • A data warehouse, because of the very nature of its purpose, has to contain historical data, not just current values. Data is stored as snapshots over past and current periods. Every data structure in the data warehouse contains the time element.

  29. Time-Variant Data (2) The time-variant nature of the data in a data warehouse • Allows for analysis of the past • Relates information to the present • Enables forecasts for the future

  30. Nonvolatile Data

  31. Data Granularity

  32. DATA WAREHOUSES AND DATA MARTS

  33. OVERVIEW OF THE COMPONENTS

  34. Štruktúra DWH

  35. Source data component • Production systems • Internal data (spreadsheets) • Archived data (tapes) • External data (stocks, interest rates, …)

  36. Data Staging Component • Data Extraction. • Data Transformation. • Data Loading.

  37. Data Movement to the data Warehouse

  38. Information Delivery Component

  39. METADATA IN THE DATA WAREHOUSE

  40. WHY METADATA IS IMPORTANT Users to compose and run the query can have several important questions: • Are there any predefined queries I can look at? • What are the various elements of data in the warehouse? • Is there information about unit sales and unit costs by product? • How can I browse and see what is available? • From where did they get the data for the warehouse? From which source systems? • How did they merge the data from the telephone orders system and the mail orders system? • How old is the data in the warehouse? • When was the last time fresh data was brought in? • Are there any summaries by month and product?

  41. Metadata in a data warehouse contains the answers to questions about the data in the data warehouse.

  42. Different definitions for metadata • Data about the data • Table of contents for the data • Catalog for the data • Data warehouse atlas • Data warehouse roadmap • Data warehouse directory • Glue that holds the data warehouse contents together • Tongs to handle the data • The nerve center

  43. Metadata in OLTP • In operational systems we do not really have any easy and flexible methods for knowing the nature of the contents of the database. • There is no great need for user-friendly interfaces to the database contents. • The data dictionary or catalog is meant for IT uses only.

  44. Metadata in DWH • Users need sophisticated methods for browsing and examining the contents of the data warehouse. • Users need to know the meanings of the data items. • Users have to prevent them from drawing wrong conclusions from their analysis through their ignorance about the exact meanings. • Without adequate metadata support, users of the larger data warehouses are totally handicapped.

  45. Types of Metadata • Metadata in a data warehouse fall into three major categories: • Operational Metadata • Extraction and Transformation Metadata • End-User Metadata

  46. Operational Metadata • Data for the data warehouse comes from several operational systems of the enterprise. • These source systems contain different data structures. • The data elements selected for the data warehouse have various field lengths and data types. • In selecting data from the source systems for the data warehouse, you split records, combine parts of records from different source files, and deal with multiple coding schemes and field lengths. • When you deliver information to the end-users, you must be able to tie that back to the original source data sets. • Operational metadata contain all of this information about the operational data sources.

  47. Extraction and Transformation Metadata • Extraction and transformation metadata contain data about the extraction of data from the source systems, namely, the extraction frequencies, extraction methods, and business rules for the data extraction. Also, this category of metadata contains information about all the data transformations that take place in the data staging area.

More Related