1 / 26

Chapter 14 – Physical Design

Chapter 14 – Physical Design. Summary of Topics Design Process The Foundation for Design Design Objectives Designing a Data Mart Components of the Data Architecture Design Example. Chapter 14 – Physical Design. Design Process Iterative process with following steps

morrisa
Télécharger la présentation

Chapter 14 – Physical Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 14 – Physical Design Summary of Topics • Design Process • The Foundation for Design • Design Objectives • Designing a Data Mart • Components of the Data Architecture • Design Example

  2. Chapter 14 – Physical Design Design Process • Iterative process with following steps • ID user requirements and project scope • Develop subject area data model(s) • Develop data warehouse logical model • Develop data warehouse architecture • Design the physical database • Populate user-oriented repository/directory • Identify sources of data • Cleanse and integrate data from legacy systems • Populate the data warehouse • Test for user satisfaction (quality, performance) • Rework design as needed

  3. Chapter 14 – Physical Design The Foundation for Design • Data model is the foundation • 3rd normal form model recommended for depth of understanding, though later de-normalization is common • Top-down, global model ideal, but • Lengthy duration • Corporate funding requirements • Bottom-up may be more practical • Build up ,one subject area at a time • Integrate into complete model over time

  4. Chapter 14 – Physical Design Design Objectives • Define in terms of business objectives • Involves tradeoffs among • Performance • Flexibility • Scalability • Ease of Administration • Data Integrity • Data Consistency • Data Availability • User Satisfaction

  5. Chapter 14 – Physical Design Design Objectives: Performance • Response time in DW typically > OLTP • Important to manage user expectations • Poor performance may result from • Inadequate hardware • Inflexible data architecture • Poor physical design • Unrealistic user expectations • Build performance bottom-up (Fig 14.1) • DBMS Installation parameters • Too few concurrent users • Poor selection of installation parameters • Database Design and Optimization • Application design, SQL quality/efficiency • Query efficiency • Tune performance from top-down

  6. Figure 14.1Building Good Performance

  7. Chapter 14 – Physical Design Server Scalability • Defn: ability to handle increase in # users • Old mainframes known for poor scalability • Many adopt multi-server environment Server Flexibility • Achieve using data model and metadata as basis for physical design • May include giving users flexibility to handle analysis, query, reporting needs • Must accommodate change in today’s business environment

  8. Chapter 14 – Physical Design Designing a Data Mart • Defn: A warehouse developed to meet requirements of a specific dept., group • A subset of a corporate warehouse • Caveat: may result in isolated islands of information

  9. Chapter 14 – Physical Design Components of the Data Architecture • May include the following elements • Legacy Systems • Relational format, Model, Metadata questionable • Data Warehouse • Historical detail, summary, external data • Specialized data subsets, multidimensional data • Data Marts • Tailored to specific dept. or workgroup • Personal Data Warehouse • Draw from warehouse, but not part of it • May contain external, personal data • Operational Data Store (See Fig. 14.2) • Optional • Shared between operational and warehouse env’s • Source of detail data for warehouse • Typically updateable

  10. Figure 14.2Data Flow Across the Corporation

  11. Chapter 14 – Physical Design Types of Data Structures in the Warehouse • Detail Data • Summary Data • Multidimensional Data • Data Subsets • Specialized Data Caches • Replicated Data • Archived Data • Design choices depend on expected data usage, performance requirements. • Iterative process

  12. Chapter 14 – Physical Design Detail Data • Most basic level of data • Subsets, summaries derived from detail • Should include only the data needed for decision support • Include the element of time • In theory, all queries & reports could be run against detail data but usually denormalized for performance. • Advantages of denormalization • Better performance • More accessible • Spreadsheet view of data (intuitive) • Downside: Implementation size

  13. Chapter 14 – Physical Design Indexing for Performance • Warehouse typ. > #indexes than OLTP • Very beneficial to performance, but • Large # of indexes degrade load time • Place indexes on primary, foreign keys to start • Add others as needed & as feasible • Number of indexes varies with inversely with table size

  14. Chapter 14 – Physical Design Partitioning Data • Can centralize on mainframe/server, or • May partition across multiple servers by workgroup, dept, subject area, application • Choice depends on expected pattern of use • Horizontal partitioning: departmental data on dept. servers • Vertical partitioning: implement columnar subset of normalized table across servers

  15. Chapter 14 – Physical Design Parallel Processing • Key design issue • Appropriate database design to achieve potential for improved performance • Shared Resources • Disk • Memory • Advantages, Disadvantages • Choices based on expected use patterns

  16. Chapter 14 – Physical Design Summary Levels • Summaries may offload detail processing • Separate tables or multidimensional D/B • Key design tradeoffs: • Minimize I/O • Provide access to right data @ right time • Reasonable cost • Recommend incremental/spiral approach to build critical summary levels first

  17. Chapter 14 – Physical Design External Data • Example: government or industry-supplied demographic data • Individual queries may need to merge external, personal, and warehouse data • Document ext. data with metadata • Data quality may be questionable Data Replication • Can improve performance, availability in client/server environment • Reduce network traffic • Redundant availability if primary server down

  18. Chapter 14 – Physical Design Data Placement Guidelines • Do • Place tables commonly joined on same server • Consider replicating tables across servers if joins will cause severe network traffic issues • Consider placing shared detail data on centralized source • Don’t • Place frequently joined tables on same device (e.g. same disk) • Place indexes on same devices as table on which they are built • All these point to reducing network traffic and device I/O.

  19. Chapter 14 – Physical Design RAID Technology • Not a Dept. Of Defense Program!! • “Redundant Arrays of Inexpensive Disks” • Potential solution for Fault Tolerance • Techniques include: • Disk Mirroring (2 disks, same controller) • Disk Duplexing (2 disks, 2 controllers) • Parity/Error Detection and Correction • Disk Striping

  20. Chapter 14 – Physical Design RAID Levels • RAID 0: Sector interleave, no error checking (no redundancy) • RAID 1: Mirroring (duplicate copy) • RAID 2: Bit interleave with error correction codes on multiple drives • RAID 3: Bit interleave with error correction on single drive • RAID 4: Sector interleave with dedicated parity drive • RAID 5: Sector interleave, parity stored on all drives

  21. Chapter 14 – Physical Design Archived Data • Can’t store all the data all the time !! • Need to determine data retention needs • Issues with archiving and retrieval • What if data format changes? • What if we need archive data but no room in current table?

  22. Chapter 14 – Physical Design OLAP and Multidimensional Data • Multidimensional Model • Data stored as facts and dimensions rather than rows and columns • A.K.A. “Cubic”, “Star” model • Major “Fact” tables, usually business related • Smaller “dimension” tables, descriptive data • Dimension table: perspectives on factual data • “by” criteria • Sales by salesperson vs. sales by region • Fact : quantitative or factual data • Usually a large denormalized table • Minimizes number of joins

  23. Chapter 14 – Physical Design Designing Multidimensional Tables • Identify Requirements from analyst’s view (how will data be accessed?) • Identify “facts”, “dimensions” • Identify the roll-up (aggregation) levels • Design fact and dimension tables • Determine data retention, sizing requirements • Identify partitioning requirements • Validate the design • Modify as requirements changes

  24. Chapter 14 – Physical Design Design Example • Figures 14.6 : Partial E-R Diagram

  25. Chapter 14 – Physical Design Figure 14.7: Transformation of E-R Diagram into Multidimensional Model • Entities mapped into dimensions • Product • Location • Time • Entities that belong to more than one dimension become facts

  26. Chapter 14 – Physical Design • Figure 14.10: Modified Star Schema • Note Level ID – tags NULL entries • e.g. regional level data -> STORE_ID = NULL • Includes attribute descriptions

More Related