1 / 39

Background of OLAP , DW, Data Mining

Background of OLAP , DW, Data Mining. Introductory Concepts. Related Terms OLAP Data warehouse Data mart OLAP cube, multidimensional cube Star schema Fact, dimension Dimensional modeling Data mining. Introductory Concepts. Relevance of Terms. OLAP. Data Mining. Data Warehouse.

yul
Télécharger la présentation

Background of OLAP , DW, Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Background ofOLAP, DW, Data Mining

  2. Introductory Concepts • Related Terms • OLAP • Data warehouse • Data mart • OLAP cube, multidimensional cube • Star schema • Fact, dimension • Dimensional modeling • Data mining

  3. Introductory Concepts • Relevance of Terms OLAP Data Mining Data Warehouse Star Schema Multidimensional Cube

  4. Introductory Concepts • OLAP(On-Line Analytical Processing) • Complex query processing for decision making • Report generation for the summary/aggregate /statistics over source data • Advanced data analysis techniques • OLTP(On-Line Transaction Processing) • Simple transaction processing for regular (daily) business operations • Manipulating databases maintained for the organization’s activities • Optimized for simple operations (select,insert ...)

  5. Introductory Concepts 총장, 교육부, 국회 본부 현업 부서 Transaction 요청 통계 자료 요청 정보 시스템 보고서 요청 업무 데이타베이스 OLTP DB

  6. Introductory Concepts • Solution (1) 총장, 교육부, 국회 본부 현업 부서 Transaction 요청 통계 자료 요청 정보 시스템 통계/보고서 보고서 요청 생성 모듈 업무 데이타베이스 OLTP DB

  7. Introductory Concepts • Solution (2) 총장, 교육부, 국회 본부 현업 부서 통계 자료 요청 Transaction 요청 보고서 요청 임의의 통계 정보 시스템 생성 도구 OLAP 시스템 업무 데이타베이스 OLTP DB

  8. Introductory Concepts • Solution (2) 총장, 교육부, 국회 본부 현업 부서 통계 자료 요청 Transaction 요청 보고서 요청 OLAP 시스템 정보 시스템 업무 데이타베이스 분석용/요약 자료 OLTP DB Data Warehouse

  9. Introductory Concepts • OLAP systems/tools • Present summary/aggregate data for a large OLTP databases in several ways to decision makers • Manipulate DW over several dimensions dynamically to find useful information • Data warehouse • A collection of data which are used for OLAP • Summary/aggregate data extracted from source OLTP databases which can support every possible OLAP requests. • Represented by star schema

  10. Introductory Concepts • Relevance of Terms OLAP Data Mining Data Warehouse Star Schema Multidimensional Cube

  11. Introductory Concepts • OLTP(Online Transaction Processing) • Typical daily business query and update processing • OLAP (Online Analytical Processing) • Complex query processing or report generation • Advanced data analysis techniques • ROLAP: Relational OLAP • MOLAP: Multi-dimensional OLAP • Data Warehouse • An enterprise-wide data repository for decision support • Data Mart • A smaller targeted DW for a business process • Star Schema • A DB structure for data warehouse

  12. Data Warehouse • DW is an integrated repository of data that is put into a form that can be easily understood, interpreted, and analyzed by the people who need to use it to make decisions • Data are extracted from operational systems, then cleansed, integrated, transformed, and aggregated, into a read-only database that is optimized for decision making • Data Warehouse is a • Subject-oriented • Integrated • Time-variant • Non-volatile • Collection of data in support of management’s decision-making process (W.H. Inmon)

  13. Data Warehouse • Motivation for DW • An enterprise-wide repository of data, information, knowledge, and meta-data • Gather all the information into a single place for in-depth analysis • Decouple such analysis from OLTP systems • Transform the data into information • Provide right information in the right format at the right time • Perform sophisticated analysis of data • Perform trend analysis, time series analysis, risk analysis, etc. • Perform DSS exploration such as alternative formation, alternatives testing, decision-making, etc. • Discover/visualize hidden facts, patterns, correlations, rules, exceptions using data mining techniques

  14. Data Warehouse • OLTP queries • How many shoes did we sell last month? • What are the age, address, phone of a certain student (e.g., Hong Gildong)

  15. Data Warehouse • OLAP queries • How many size 10 shoes in red did we sell last month in the Midwest, the Northeast, compared with the same month last year, actual vs. budget? • What are the top 25 brands, by products, styles, and regions, for this period for total US based on sales dollars? • How much promotional expenses did we spend on customers who purchased less than $100 worth of products? • How much discount should we offer to boost the sales volume significantly? • Find the correlation between buying patterns of products of type A and those of type B. • What are sales trends? • What percent of the market do we own? • How are our defect rates improving? • Are our profits are increasing or decreasing?

  16. Comparison of OLTP and Data Warehouse • DWs require a new query-centric view of the data

  17. Data Warehouse • Example Benefits of Data Warehousing & Data Mining • Fast sophisticated report generations • 4-15 times faster delivery • Young men buy beer on Friday nights when they buy diapers • More athletic shoes are sold on Friday evenings and Saturdays than during the rest of the week combined • More athletic shoes are sold when white tube socks are prominently displayed as part of a 2-for-1 sale. • In a retail chain, potato chip purchases were accompanied by a soda purchase in half the cases. That figure increases to 75% when there is a marketing promotion. • Blue Cross found some providers had superior treatment success rates for some fatal diseases • Victoria Secret found a particular incentive was ineffective, saving $300K per week.

  18. Data Warehouse • Database Design for DW • Objective of a DW • Creating a database optimized for decision support • Limitations of ER model for DW applications • Normalized data model support large numbers of transactions with very few records. • ER models tend to be very complex and difficult to navigate. • ER model identifies first entities, then relationships • Four basic requirements of a warehouse design • The schema must be simple. • The data must be clean, consistent, and accurate • Query processing must be fast • Load the data into the warehouse quickly • Two types of data representation in DW • Star Schema • Multi-dimensional array

  19. Star Schema • Dimensional Model (Star Schema) • A database schema for data warehousing • Initially developed to simplify SQL queries (by Ralph Kimball) • Consists of a few central fact tables and many dimensiontables • Dimension = analysis criteria • Fact = measurements aggregated over dimensions to be analyzed • Simplifies end-user query processing and high query performance • Used to reduce joins by OLAP/Relational engines • Relatively few tables and well-defined join paths

  20. Star Schema

  21. Star Schema • Dimensional Model (Star Schema) consists of: • Fact table: • Stores all transactions or factual data that are analyzed • Typically numeric measures • From millions to more than billion rows • Example: Revenue, Actuals, Budgets, Sales, Orders, Bookings, Claims • Dimension table: • Attributes about facts • Supports grouping, browsing, constraining • Provides the entry points into the DW • Example: Time, Customer, Promotions, Demographics, LifeStyles, Products, Stores, Markets

  22. Star Schema • Dimension hierarchy • Each dimension table has several attributes • These attributes may have a hierarchy • Different level of aggregation for fact data • User may want aggregate data by some level of the dimension hierarchy Year State Month County day City Street

  23. Star Schema • The Strengths of Dimensional Modeling (Kimball 98) • Provides a predictable, standard framework • Report writers, query tools, and UIs could take advantage of DM structure • Supports presentation and performance • Simplifies the understanding and navigation of metadata • Is robust against unexpected changes in user behavior • Logical design is independent of the use of the schema • All dimensions are equally entry points to the system • Can be changed gracefully • Could add new unanticipated facts of the same grain • Could add new dimensions • Add new unanticipated dimensional attributes • Break existing dimension records down to a lower level of granularity • Availability of common modeling situations • Slowly changing dimensions • Heterogeneous products • Pay-in-advance databases • Factless facts as in event-handling • Availability of administrative utilities and SW processes that manage and use aggregates

  24. Star Schema • Variations of Star Schema • Star Schema • All dimensions are denormalized • Wide dimensions and deep facts • Snowflake Schema • All dimensions are normalized into 3NF

  25. Star Schema • Summary of DW and Star Schema • DW is databases specially maintained for analytical processing • DW is organized by facts and dimensions • Star schema is a way of representing DW schema • Cube, Data Mining??

  26. Multidimensional Cube • Motivation of Cube • Similar OLAP queries can be executed repeatedly to find some valuable information. • Users want to view data in several different perspectives • It is time-consuming to calculate these data which use similar facts and dimensions • Cube is a special data structure to meet this kind of OLAP queries’ requirements dim1 dim7 fact1 dim1 dim6 dim2 fact1 dim2 dim7 dim3 dim8 fact2 dim1 fact1 dim7 dim4 dim9 fact3 dim5 dim10 dim1 fact1

  27. Multidimensional Cube • Multidimensional Cube • Maintain every possible pairs of dimensions and aggregation of fact data • Types of OLAP • ROLAP : maintaining cube by relational table • MOLAP : maintaining cube by multidimensional array dim2 dim7 Query1 dim1 dim6 fact1 dim2 dim7 Query2 dim3 dim8 fact2 dim1 dim4 dim9 Query3 fact3 dim5 dim10

  28. Multidimensional Cube • Multidimensional Cube • Maintain every possible pairs of dimensions and aggregation of fact data • Types of OLAP • ROLAP : maintaining cube by relational table • MOLAP : maintaining cube by multidimensional array Query1 dim1 dim6 fact1 dim2 dim7 Query2 dim3 dim8 fact2 dim4 dim9 Query3 fact3 dim5 dim10

  29. Multidimensional Cube • Multidimension Cube • Consist of several dimensions and facts • Its schema can be represented by star schema • Two different perspectives • Cube is a part of data warehouses • Cube is a data structure for OLAP tool which is extracted from data warehouses

  30. Introductory Concepts • OLAP Operations • Drill down. View the ata in a more specialized level within a dimension • Show me total sales for by quarter • Show me total sales for this quarter by department • Show me total sales for this quarter, dept 22, by state • Roll-up. Converse of drill-down • Drilling down is adding new headers from the dimension tables • Rolling up is moving row headers • Slice and dice. Slice (dice) data on columns (rows), eliminating the rest of the display. Keep a subset of dimensions for selected values. • Pivot. Switching the arrangement of the dimensions • Cross-tab. A matrix report with X and Y axis • Other Operations • Ranking and computed attribute definitions • Multiple granularity aggregates (e.g., cube operator)

  31. Introductory Concepts • Relevance of Terms OLAP Data Mining Data Warehouse Star Schema Multidimensional Cube

  32. Data Warehouse

  33. OLAP tools

  34. Data Mining • What is Data Mining? • Is the process of knowledge discovery (useful information, patterns, groups, etc.) from a large of data. • Is the principle of soring through large amounts of data and picking out relevant information (by Wikipedia). • Is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data. • Is the science of extracting useful information from large data sets or databases. • Data may be stored in files, databases, webs, data warehouses, etc.

  35. Data Mining • Data Mining algorithm Mining Algorithm Knowledge Information Large input data Data Warehouse Multidimen- sional Cube OLAP tools

  36. Summary • Relevance of Terms OLAP Data Mining Data Warehouse Star Schema Multidimensional Cube

More Related