1 / 21

Basic Concepts of Datawarehousing An Overview

Basic Concepts of Datawarehousing An Overview. Prasanth Gurram. How to answer these Business Queries?. How did my revenue improve in the past 5 years?. What are the slow movers in my product line?. Which channel costs me more and pays less?. Which of my Sales Agents

Télécharger la présentation

Basic Concepts of Datawarehousing An Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basic Concepts of Datawarehousing An Overview Prasanth Gurram

  2. How to answer these Business Queries? How did my revenue improve in the past 5 years? What are the slow movers in my product line? Which channel costs me more and pays less? Which of my Sales Agents are doing better? Strategic Planning / Budgeting Currency Risk, Interest Rate Risk, Liquidity Risk What is Defaulter’s Profile? Who are my profitable customers? What is the sales distribution region wise?

  3. DSS Decision Support Systems (DSS) are interactive computer-based systems intended to help decision makers utilize data and models to identify and solve problems and make decisions. Data Warehouse is the foundation of DSS process. It is a Strategy and a Process for Staging Corporate Data. Enable users to get a “Business View” of the data Facilitate Data based Decision Making that would drive and improve the Business Discover “Hidden Trends”

  4. Driving Forces for DSS Business Speed Reform Customers RESULT: COMPETITION Technology

  5. Scenario without DSS • Unavailability of Tools and Techniques for acquisition of data from various sources for answering business questions and making decisions, in earlier days • Intensive efforts in data formatting than data analysis • Static and inflexible report generation • Time-lag in accessing the information at central place

  6. OLTP v/s DSS Environment • OLTP Environment • get data IN • large volumes of simple transaction queries • continuous data changes • low processing time • mode of processing • transaction details • data inconsistency • mostly current data • DSS Environment • get information OUT • small number of diverse queries • periodic updates only • high processing time • mode of discovery • subject oriented - summaries • data consistency • historical data is relevant

  7. OLTP v/s DSS Environment • OLTP Environment • high concurrent usage • highly normalized data structure • static applications • automates routines • DSS Environment • low concurrent usage • fewer tables, but more columns per table • dynamic applications • facilitates creativity

  8. Benefits for Business User • Flexible Information Access • High Availability • Ease of Use • Quality & Completeness of Data • Focus on Information Processing • Information Base for Knowledge Discovery

  9. Available line of technology • Advances in dbms technology • Data warehousing • On-line analytical processing • Data mining

  10. Datawarehouse • Data warehouses store large volumes of data which are frequently used by DSS.It is maintained separately from the organization’s operational databases • Data warehouse is subject-oriented,integrated, time-variant, and nonvolatile collection of data • Subject-oriented : Contains information regarding objects of interest for decision support: Sales by region, by product, etc. • Itegrated: Data are typically extracted from multiple, heterogeneous data sources (e.g., from sales, inventory, billing DBs etc.). • Time-variant: Contain historical data, longer horizon than operational system. • Nonvolatile : Data is not (or rarely) directly updated.

  11. Datawarehouse • Is the enabling technology that facilitates improved business decision-making • It’s a process, not a product • A technique for assembling and managing a wide variety of data from multiple operational systems for decision support and analytical processing It’s a journey not a destination...

  12. DW Components S T A G IN G A R E A Extraction Cleansing FS1 DM1 FS2 DM2 N E T W O R K . . . DW ODS Transmission DMn OLAP ANALYSIS FSn Legacy System Data Mart Population Aggregation Summarization Transformation Knowledge Discovery Metadata Layer

  13. Operational Process • Data extraction • Data Cleansing and Transformation • Data Load and refresh • Build derived data and views • Service queries • Administer the warehouse

  14. Extraction Process ( Data Capturing ) Feed System Application Incremental Data Business Transactions Control Metadata • Extract the incremental data from feed system • Store the extracted data into a temporary area • Extract data from multiple, heterogeneous, and external sources Data Capturing Process

  15. Extraction Process (Data Transmission ) Feed System Side Staging area Network Cloud Incremental Data Incremental Data FTP • Transmit the extracted data from Feed system to Staging area • Periodicity of transmission ( daily / weekly ) depends upon the feed system

  16. Cleansing Process Process Metadata Cleansing Rules Good Clean data Raw data (Staging Area) Cleansing Process Control Metadata Bad Cleansing Reports • Detect errors in the data and rectify them when possible • Mark it Good/Bad • Generate the cleansing Reports and mail to the DWA and Feed System representatives

  17. Transformation Process • Process Metadata • Mapping Detail • Transformation Rule Clean Operational Data Transformation Process Control Metadata • Transform the cleaned Operational Data into DSS Data • Load the DSS data into ODS • ODS contains the current DSS data at the lowest level of granularity Operational Data Store

  18. Summarization Process ODS DW Summarization Process Weekly Monthly Yearly Control Metadata • Summarizeand aggregate ODS data and Populate to the Warehouse • Periodicity of Summarization Process depends upon the level of summarization at Warehouse ( weekly, monthly, daily )

  19. Metadata • Data about Data • Used to maintain Datawarehouse • Control data • Static: • Roles, permissions, naming standards, source system names, • Locations, target names, transformation and mapping rules • Dynamic: • Scheduling, scripts, load statistics, space usage, • Backup statistics • Business data • Business rules,Who validates data,Who controls,How they validate

  20. DW Components/Tools • Extraction/transformation/load tool (family of tools including data modeling tool, extraction tool, Meta data repository, and DW administration tools) • Meta data exchange architecture (API used to integrate all components of DW with central Meta data) • Target databases (relational, multidimensional, hybrid) • Data access and analysis tools for end users • Database servers, operating systems, networks

  21. DW Tools

More Related