1 / 22

Chapter 17: Additional Slides

Chapter 17: Additional Slides. June 10, 2014. Outline. Physical Data Management Fragments Distributed Query Processing Transactions Logical Data Management Transparency Conceptual Data Management. Physical Data Management Fragments. What is a Fragment?

jonny
Télécharger la présentation

Chapter 17: Additional Slides

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 17: Additional Slides June 10, 2014

  2. Outline • Physical Data Management • Fragments • Distributed Query Processing • Transactions • Logical Data Management • Transparency • Conceptual Data Management

  3. Physical Data ManagementFragments • What is a Fragment? • Vertical subset (project operation) • Horizontal subset (restrict operation) • Mixed Fragment (combination of project and restrict) • A fragment may be allocated to a single or multiple sites • Fragments may be replicated where • there is a primary fragment as a single site • Copies of the fragment are placed at multiple sites (secondary)

  4. Physical Data ManagementDistributed Query Processing • Involves both local (intra site) and global (inter site) optimization. • Multiple optimization objectives • The weighting of communication costs versus local processing costs depends on network characteristics. • There are many more possible access plans for a distributed query.

  5. Physical Data ManagementDistributed Query Processing cont’d • Local vs. Global query processing • In Local, • Queries are performed at a central server (single site) • In Global, • Must decide which sites to access for the fragments • May need to move fragments from site to site • Multiple optimization is needed for Global • Due to the multiple sites and access plans • Many possible access plans for Global • Choosing the best one may be difficult

  6. Physical Data ManagementDistributed Query Processing cont’d • Communication Costs • Communication Time (CT) • Fixed Message Delay (MD) • Variable Transmission Time (TT) • CT = MD + TT • MD = Number of Messages * Delay per message • TT = Number of bits/Data rate

  7. Physical Data ManagementDistributed Query Processing cont’d • Global Query Example (p. 632) • List the order number, order date, product number, product name, product price, and order quantity for eastern orders with a specified customer number, date range, and product color. • Four possible access plans

  8. Physical Data ManagementDistributed Query Processing cont’d • Access Plan 1 • Move the Product table to the Tulsa site where the query is processed

  9. Physical Data ManagementDistributed Query Processing cont’d • Access Plan 2 • Restrict the Product table at the Denver Site • Then move result to the Tulsa site to execute the remainder of the query

  10. Physical Data ManagementDistributed Query Processing cont’d • Access Plan 3 • Perform join and restrictions of Eastern-Orders and Eastern Order-lines fragments at the Tulsa site • Then move result to Denver site to join with Product Table

  11. Physical Data ManagementDistributed Query Processing cont’d • Access Plan 4 • Restrict the Product table at the Denver site • Move product numbers to Tulsa and do restrict/join • Then move result back to Denver to combine with Product table to get product names

  12. Physical Data ManagementDistributed Query Processing cont’d • Obviously many different access plans can be used to answer the same query • Need to investigate • actual network costs • the local processing costs at each site to determine which access plan is the best

  13. Physical Data ManagementTransactions – 2 Phase Commit Protocol • 2 Phase Commit (2PC) • Ensures that all transactions are Atomic • One site is selected as a Coordinator while other sites are Participants • Each Participant site execute a different part of the transaction • Two phases: Voting Phase and Decision Phase • Figure 17.18, page 634

  14. Physical Data ManagementTransactions – 2 Phase Commit Protocol • Several Complications IF • Failures during recovery and Timeouts • Log records are lost • Coordinator fails • Etc. • Several methods to resolve these, but out of scope for this class

  15. Logical Data ManagementTransparency • Transparency is related to data independence. • With transparency, users can write queries with no knowledge of the distribution, and distribution changes will not cause changes to existing queries and transactions. • Without transparency, users must reference some distribution details in queries and distribution changes can lead to changes in existing queries.

  16. Logical Data ManagementFragmentation Transparency • Fragmentation transparency provides the highest level of data independence. • Users formulate queries and transactions without knowledge of fragments (locations, or local formats). • If fragments change, queries and transactions are not affected. • Table 17.6, p 626

  17. Logical Data ManagementLocation Transparency • Location transparency provides a lesser level of data independence than fragmentation transparency. • Users need to reference fragments in formulating queries and transactions. • However, knowledge of locations and local formats is not necessary. • Table 17.7, p 627

  18. Conceptual Data ManagementSchema Integration • Multiple types of schemas may exist to describe the same dataset • Integrate multiple schemas into a single schema • Best explained using an Exercise

  19. Conceptual Data ManagementSchema Integration - Exercise Engineering Database – Relational Schema E(eno, ename, title), p.k. = eno J(jno, jname, budget, loc, cname), p.k. = jno G(eno, jno, resp, dur), p.k. = eno, jno S(title, sal), p.k. = title First, find the common entities and relationships between schemas Employee Database – CODASYL Schema Department(dept-name, budget, manager) Employee(e#, name, address, title, salary) Department Employs Employee (1:N relationship) • Database III – E/R Model • Entities Engineer(Engineer No, name, title, salary) Project(PNo, project name, budget, location) Client(Client Name, Address) • Relationships Engineer Works_In Project : (Responsibility, Duration) Project Contract_By Client : (Contract Date)

  20. Conceptual Data ManagementSchema Integration - Exercise Engineering Database – Relational Schema E(eno, ename, title), p.k. = eno J(jno, jname, budget, loc, cname), p.k. = jno G(eno, jno, resp, dur), p.k. = eno, jno S(title, sal), p.k. = title Second, draw the conceptual diagram for the common entities and relationships Employee Database – CODASYL Schema Department(dept-name, budget, manager) Employee(e#, name, address, title, salary) Department Employs Employee (1:N relationship) • Database III – E/R Model • Entities Engineer(Engineer No, name, title, salary) Project(PNo, project name, budget, location) Client(Client Name, Address) • Relationships Engineer Works_In Project : (Responsibility, Duration) Project Contract_By Client : (Contract Date)

  21. Conceptual Data ManagementSchema Integration • No single correct solution to this exercise • It may be seen that there are multiple solutions to this problem • Considered a very HARD problem • Often hard to find the best synonyms especially from a large set of schemas

  22. Weekly Exercise • Questions 2, 4, and 5

More Related