1 / 41

Chapter 3

Data Warehouse Fundamentals. Chapter 3. Trends In Data Warehousing. Paul K Chen. 1. Data Warehousing is Becoming Mainstream. In the early stages, four significant factors drove many companies to move into data warehousing: Fierce competition Government deregulation

abra-daniel
Télécharger la présentation

Chapter 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehouse Fundamentals Chapter 3 Trends In Data Warehousing Paul K Chen 1

  2. Data Warehousing is Becoming Mainstream In the early stages, four significant factors drove many companies to move into data warehousing: • Fierce competition • Government deregulation • Need to revamp internal processes • Imperative for customized marketing

  3. Walmart vs. Amazon.com • Walmart is the US company most quoted for the successful application & deployment of Data Warehousing technology. • Walmart filed lawsuit against Amazon.com for its unlawful way of pirating its DW technology by hiring away its DA personnel by offering hefty stock option to these people.

  4. Significant Factors These significant factors reflect the new trends in data warehousing: • Multiple Data Types • Data Visualization • Parallel Processing • Query Tools • Browser Tools • Data Fusion • Multidimensional Analysis • Agent Technology • E-Business- ERP, KM, CRM

  5. Decision Making and Data Warehousing “A data warehouse is the data, processes, tools, and facilities to manage and deliver complete, timely, accurate, and understandable business information to authorized individuals for effective decision making.” • Structured Data • Includes traditional relational databases • Typically internal and enterprise-owned • Predetermined • Unstructured Data • Includes articles, reports, images, and videos • Utilizes external data and expert opinion • Ad hoc 3

  6. Decision Making and Data Warehousing • Management Systems • Extend relational databases to store and support multimedia • User-defined types (UDT) and functions (UDF) in SQL-3 • Specialized Servers • Used for data which is incompatible with relational databases (e.g., Streaming video servers) • Objects may be linked to a relational database • Search Engines • Query by Image Content (shape, color, texture, etc) • Text retrieval on free-text documents • Audio and video searching

  7. Decision Making and Data Warehousing ~ The trend is toward unstructured data and ad hoc warehouses. ~ ~ Trend toward multimedia. ~ 4

  8. Types of Decision Support Tools • Data Inquiry • A request for a set of data based on some search criteria • Data Interpretation • Manipulation and visualization of a set of data (statistical analysis) • Multidimensional Analysis (OLAP) • n-dimensional spreadsheet analysis • Information Discovery • Pattern recognition, trends • Browsers • Search metadata catalogs • Search information object lists • Launch analysis tools 5

  9. File-based Processing 6

  10. Types of Decision Support Tools ~ Trend toward utilization of the Web, facilitated by Java. ~ 7

  11. Data Warehouse Architectures • Single Level • Decision support tools access operational data directly • Feasible only with “clean” data • Valid for unstructured data • Two Level Reconciled • Scrubbed operational data supporting ad hoc queries • Two Level Derived • Summarized data • Three Level • Maintains both scrubbed operational data, and summarized data. 10

  12. Data Warehouse Architectures ~ Trend toward multidimensional data. ~ 11

  13. Data Stores and Access Enablers • Specialized Multidimensional Databases • Data is peregrinated and loaded into multidimensional databases • Long loading times but quick response • Relational-like Stores • Indexing is used to proved pseudo-multidimensional functionality • Relational Data Stores • An extra semantic layer generates multidimensional data on the fly • Hybrids • Details are stored in a traditional relational format • A subset is cached in a multidimensional data structure 12

  14. Database Management System (DBMS) 13

  15. Data Stores and Access Enablers • ~ Trend toward multidimensional data. 14

  16. Metadata • Integrated Components • All components (sources, stores, etc) use a common metadata repository to maintain their metadata • Standardized Metadata Interchange • Components keep their own metadata • Components use a common interchange information model and syntax to share metadata • Synchronized Metadata Interchange • Metadata changes are updated automatically across all components • Building of Business Metadata • Manually entered, free-text, plain language descriptions

  17. Metadata ~ Trend toward better metadata, exchanged between systems. ~

  18. Middleware - Gluing the Warehouse Together • Definition: software that shields users and developers from differences in services and resources used by applications • Data warehouses often have heterogeneous databases, operating systems, networks, hardware, applications

  19. Business Issues for Middleware • Role of middleware • Assist developer in data extraction/transformation and populating DW • Assist business user in accessing DW • Therefore needed at different points in life cycle • Types • Copy management: data extraction, transformation, replication, and propagation • Gateways: DB and independent gateways • Program-to program: RPCs, TP monitors, ORBs • Message-oriented

  20. Data Quality • Preprocessing Ownership • Source application owners know their data • Warehouse owners still must integrate the entire system • Automated Preprocessing Tools • Specialized packages • Generalized tools using pattern processing, lexical analysis, and statistical matching to reconcile a wide range of data sources • Custom programming • Reliability and Credibility of External Data • Quality ratings • Posted statistical meta-information (sample size, randomness, etc) 15

  21. Data Quality Trend toward better understanding of data quality. ~ 16

  22. Significant Trends- Multiple Data Types Image Spatial Structured Numeric Video Data Warehouse Repository Structured Text 17 Audio Unstructured Documents

  23. Significant Trends- Data Visualization • More Chart Types-Pie chart, scatter plot • Interactive Visualization • Chart Manipulation • Drill Down

  24. Significant Trends- Parallel Processing • Aims to solve decision-support problems using multiple nodes working on the same problem. • Performs many database operations simultaneously, splitting individual tasks into smaller parts so that tasks can be spread across multiple processors. • Parallel DBMSs must be capable of running parallel queries, parallel data loading, table scanning, and data archiving, and back up.

  25. Significant Trends- Parallel Processing • Shared memory architecture (SMP) • All the servers share all the data • Shared nothing architecture (MPP) • Each server has its own partition of data

  26. Significant Trends- Query Tools, Browse Tools • Flexible Presentation –online results and report generator • Aggregate Awareness • Crossing Subject Areas • Multiple Heterogeneous Sources • Integration • Overcoming SQL Limitations • Data Fusion

  27. Significant Trends- Integrating ERP and Data Warehouse • Option 1: Companies implement the data warehouse solutions of the ERP vendor with the currently available functionality and await the enhancements. • Option 2: Companies implement customized data warehouse and use third-party tools to extract data from the ERP datasets. Retrieving and loading data from the proprietary ERP datasets is not easy. • Option 3: It is a hybrid approach that combines the functionalities provided by the vendor’s data warehouse with additional functionalities from third-party tools.

  28. Significant Trends- Integrating KM and Data Warehouse What’s KM? • It is a systematic process for capturing, integrating, organizing, and communicating knowledge accumulated by employees. • It is a vehicle to share corporate knowledge so that employees may be more more effective and be productive in their work. • A knowledge management system must store all such knowledge in a knowledge repository.

  29. Significant Trends- Integrating KM and Data Warehouse A specific corporate scenario: • Sales have dropped in the South region. • Your marketing VP is able to discern this from your data warehouse by running some queries and doing some preliminary analysis. If he or she has access to a document prepared by an analyst explaining why the sales are low and suggesting remedial action. • Knowledge must be linked to the sales result to provide context to the sales numbers from the data warehouse.

  30. Significant Trends- Integrating KM and Data Warehouse An airplane sales scenario: The following information is essential For a successful pitch for airplane sales. • Model configuration • Production schedule (Delivery schedule) • Part replacement • Warranty Knowledge obtained from the knowledge management system can provide context to the information received from the data warehouse to understand the story behind the above information.

  31. Summary of Trends • Ad Hoc Questions • Multidimensional Analysis (OLAP) • Web-Enabled Data Warehouse • Multimedia • Middleware • Metadata Interchange • Integrating ERP with Data Warehouse • Integrating KM with Data Warehouse

  32. Complete E-Business Suite– A Review ERP EAI Marketing Sales Projects Financial Services One Database Order Mgt Procurement Human Resources Customer Relationship(CRM) Manufacturing Supply Chain (SCM)

  33. Information System Categories

  34. Information System Categories

  35. Data Warehouse & ERP • ERP = Enterprise Resource Planning – A software solution that addresses enterprise needs taking the process view of an organization to meet the organization goals tightly integrating all the functions of an organization. -- It integrates all the departments and functions across a company into a single computer system that can serve all those different departments’ particular needs.

  36. WHY ERP? • Business Customer satisfaction Business development – new areas, products and services Ability to face competition Efficient processes required for company’s growth • IT Present software does not met business needs. Legacy systems difficult to maintain Obsolete hardware/software difficult to maintain

  37. How ERP? • ERP Combines various department systems into a single, integrated software program that runs off a single database so that the various departments can more easily share information and communicate with each other. • The best part of ERP is the way in which it improves the order fulfillment process that is taking the customer order and process it into an invoice and revenue. • It doesn’t handle the front-end that is handled by CRM (Customer Relationship Management).

  38. How ERP?(cont’d) • When a customer service representative enters a customer order to an ERP system, he has all the information necessary to complete the order such as customer’s credit rating and order history from the finance module, the company’s inventory levels from the warehouse module and the shipping dock’s trucking schedule from the logistics module. • How it’s being done: It integrates the financial information and customer order information . It does so by integrating the following: Database Application Interfaces Tools BPR

  39. How ERP? (cont’d) • It standardizes and speeds up the manufacturing process. This saves time, increases productivity and reduces head count. • It reduces the inventory. Due to the information available about all the orders it helps to maintain the right level of stock and smoothes the manufacturing process.

  40. Data Warehouse & EAI • What is EAI? EAI refers to Enterprise Application Integration. EAI is the merging of applications and data from various new and legacy systems within a business. Various means are employed to accomplish EAI, including middleware, in order to unify IT resources, maximize new ERP investments, diminish errors and get everyone on the same page. EAI enables companies to link their existing software applications with each other and with portals. EAI provide the ability to get their applications to exchange critical data. EAI is usually close to the top of any CIO's list of concerns. There are different approaches to EAI. Some rely on linking specific applications with tailored code, but most rely on generic solutions, typically called middleware. XML, combined with SOAP and UDDI is a kind of middleware.

  41. E-Business ~ Trend toward better understanding as well as consolidation of internal processes and data ~ ~ Trend toward web-enabled data warehouse. ~

More Related