1 / 14

Data Warehousing

Data Warehousing. Willem Visser RW334. Somebody is watching!. Everybody seems to be recording your every move Loyalty cards Cookies Facebook, Twitter,… Check out Collusion plug-in for Firefox They want to know how to market to you Same is true in business

sanaa
Télécharger la présentation

Data Warehousing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Warehousing Willem VisserRW334

  2. Somebody is watching! • Everybody seems to be recording your every move • Loyalty cards • Cookies • Facebook, Twitter,… • Check out Collusion plug-in for Firefox • They want to know how to market to you • Same is true in business • Know your data, know your business

  3. Data Warehousing • Integrated repository of data to understand your business • Separate from the Operational Database • Supports decision making • Subject oriented • Time variant • Non-volatile

  4. Features • Only necessary data to allow modeling and decision making • Coming from potentially many sources • Time component • Even though operationally there might not be • Data doesn’t change after loading • No operational updates • Periodic refresh

  5. Operational vs Warehouse • Operational • Optimized for on-line transactional processing • OLTP • Warehouse • Optimized for online analytic processing • OLAP • Complex queries • Very large volumes

  6. Data cube • Multi dimensional data • Not just 3D (but mostly shown as such)

  7. Lattice of Cuboids all 0-D(apex) cuboid country product date 1-D cuboids product,date product,country date, country 2-D cuboids 3-D(base) cuboid product, date, country • How much of the cube is materialized before the query: • Full (complete cuboid) • None (materialized on the fly) • Partial Slide by Dr. Hany Saleeb

  8. OLAP • “Querying and presenting text and numeric data from data warehouses in a dimensional cube-style” • Slicing a dimension:Per region, per product, per period • Drill-down:Country  Region  Town  Suburb • Drill-up, drill-around, etc. • Visualization Slide by Cor Winkler

  9. other sources Extract Transform Load Refresh Operational DBs Multi-Tiered Architecture Monitor & Integrator OLAP Server Metadata Analysis Query Reports Data mining Serve Data Warehouse Data Marts Data Sources Data Storage OLAP Engine Front-End Tools Slide by Dr. Hany Saleeb

  10. Steps • Data extraction: • get data from multiple, heterogeneous, and external sources • Data cleaning: • detect errors in the data and rectify them when possible • Data transformation: • convert data from legacy or host format to warehouse format • Load: • sort, summarize, consolidate, compute views, check integrity, and build indices and partitions • Refresh • propagate the updates from the data sources to the warehouse

  11. SALES FACT TABLE PRODUCT DIMENSION TIME DIMENSION time_key (FK)‏ product_key (FK)‏ store_key (FK)‏ promo_key (FK)‏ dollars units cost product_key (PK)‏ SKU description brand category package_type size flavor time_key (PK)‏ SQL_date day_of_week week_number month PRODUCT STORE DIMENSION store_key (PK)‏ store_ID store_name address district region PROMOTION DIMENSION promotion_key (PK)‏ promotion_name promotion_type price_treatment ad_treatment display_treatment coupon_type Total Cost $ 1,058 $ 2,200 $ 650 $ 1,848 $ 2,350 $ 580 District Atherton Atherton Atherton Belmont Belmont Belmont Brand Clean Fast More Power Zippy Clean Fast More Power Zippy Total Dollars $ 1,233 $ 2,239 $ 848 $ 2,097 $ 2,428 $ 633 Gross Profit $ 175 $ 39 $ 198 $ 249 $ 78 $ 53 Star-Schema Example Slide by Cor Winkler

  12. Data Warehouse Usage • Three kinds of data warehouse applications • Information processing • supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs • Analytical processing • multidimensional analysis of data warehouse data • supports basic OLAP operations, slice-dice, drilling, pivoting • Data mining • knowledge discovery from hidden patterns • supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools. • Differences among the three tasks Slide by Dr. Hany Saleeb

  13. Information Dashboards Slide by Cor Winkler

  14. Information ExploitationBusiness Intelligence (BI) R Value of decision Intelligent agents Make it happen!Automatic response to business triggers Data Mining What might happen? Obscure data relationships and trends Analysis Why it happened? Dynamic slice&dice Complexity → Reporting What happened? Historical info ← # of users Complexity → Slide by Cor Winkler

More Related