1 / 47

Distributed Databases

Distributed Databases. Data in database is distributed among many computers. Why do this? Distributed nature of some applications, a bank with many branches, for example. Global users may share data with local users. Local users may experience increased performance, reliability, availability.

kenyon
Télécharger la présentation

Distributed Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Databases • Data in database is distributed among many computers. • Why do this? • Distributed nature of some applications, a bank with many branches, for example. • Global users may share data with local users. • Local users may experience increased performance, reliability, availability. • Able to integrate different types of databases.

  2. Configuration Client Client . . . Server Server Site 1 Site 2 Site n Network

  3. DBMS Requirements for DDBs • Persistence • Not a problem • Querying • Global vs. Local queries. • Network communication changes query optimization heuristic. • Versioning • As long as network is up, this is no problem

  4. Requirements II • Concurrency Control • Problem: what if a client from site 1 requests a write lock on site 2, then site 1 crashes? • Site 2 has a write lock which will not be released. • Security • If site 1 requests data from site 2, can site 2’s server expect site 1 to protect its data from invalid users?

  5. Requirements III • Integrity • Not a problem as long as network is up. • Transactions • Should there be one log or a distributed logging mechanism? • If site 1 makes a change to site 2, who logs it? • Recovery • If site 2 needs to run the recovery algorithm, where are the transactions to examine?

  6. Types of Distributed Databases: • There is a hierarchy of types of Distributed DB’s. • What follows is a “tree” of DDB types:

  7. Distributed DB Systems Homogeneous Heterogeneous Autonomous Non-Autonomous Integrated via Systems Integrated via Gateways Partial DBMS Functionality Full DBMS Functionality Federated Unfederated Loose Integration Tight Integration Single-Scheme Multi-Scheme

  8. Homogeneous Heterogeneous Autonomous Non-Autonomous Distributed DB Systems Homogeneous: Same DBMS on all sites Heterogeneous: Different DBMS’s and even different types Autonomous: All control to local users Non-Autonomous: No control to local users This distributes control rather than data ADV: Local users gain control of their data. DIS: System integration more difficult. .

  9. Distributed DB Systems Homogeneous Heterogeneous Autonomous Non-Autonomous Integrated via Systems Integrated via Gateways Systems: complete DBMS’s Gateways: simple “hooks” to permit linkage

  10. Distributed DB Systems Homogeneous Heterogeneous Autonomous Non-Autonomous Integrated via Systems Integrated via Gateways Partial DBMS Functionality Full DBMS Functionality Federated Unfederated Federated: Local users exist. Unfederated: no local users. How much of DBMS functionality is available?

  11. Distributed DB Systems Homogeneous Heterogeneous Autonomous Non-Autonomous Integrated via Systems Integrated via Gateways Partial DBMS Functionality Full DBMS Functionality Federated Unfederated Loose Integration Tight Integration Single-Scheme Multi-Scheme How many global schemas exist? Loose Integration: no global schema; Tight: with global schema.

  12. Building Distributed DB Systems • Normal DBMS has 3 levels: • Internal Level • Conceptual Level • External Level • Distributed complexity comes from • Global vs. local users • Global vs. local schemas

  13. Normal DBMS Level Diagram External View 1 External View 2 External View n Conceptual Level Internal Level

  14. Distributed DBMS Diagram • Each Server Site will have its own normal 3 level version. • Global Users will have an external and conceptual level. • Conceptual level will not be mapped to a global Internal level, but to the Server Sites. • How?

  15. Global External View 1 Global External View n Global Conceptual Level Fragmentation Schema Allocation Schema Server Site 1 Server Site 2 Server Site n Loc Ext 1 Loc Ext n Loc Ext 1 Loc Ext n Loc Ext 1 Loc Ext n Concep. Level Concep. Level Concep. Level Int Level Int Level Int Level

  16. New Levels: • Fragmentation Schema: • How the tables that appear at the global conceptual level are broken up into table fragments (more on this next). • Allocation Schema: • How the fragments are allocated to the server sites.

  17. Fragmentation • Horizontal Fragmentation • A table is split so that some of its rows go into one fragment and some into another. • Vertical Fragmentation • A table is split so that some of its columns go into one fragment and some into another. • Don’t forget the primary key in all fragments! • Mixed Fragmentation • Both Horizontal and Vertical Fragmentation.

  18. Data Replication • Full Replication -- replicate all data on all machines • Fast querying time • Updates require synchronization • Partial Replication -- replicate some elements, but not all. • Optimize query time while minimizing synchronization • Problem: how to decide?

  19. Data Replication II • Question: how to keep copies current? • Synchronous Replication: • A transaction wishing to update a replicated table must update all copies before it can commit. • Asynchronous Replication: • Copies are updated only periodically

  20. Data Replication III • If all copies aren’t updated, what is updated? • Primary Copy Data Replication: • There is one copy designated the primary copy that is updated; these changes propagate to the other copies periodically. • Equal Participation Replication (Peer-to-Peer): • Any copy may be updated any time.

  21. Data Replication IV • With Equal Participation, how are concurrent updates restricted to ensure no two copies have incompatible updates? • In general, it is impossible to guarantee that no two updates will be changing the same piece of data. • In practice, it may be possible: a company with offices in Chicago and Peoria may allow each site a complete copy of the employee table, but allow them to change only there own ones.

  22. Distributed DB Example • I have a consulting firm with offices is Chicago, Peoria, and Carbondale. • I want to manage my employees from one central location, but each site has its own local users. • Chicago and Peoria have two users: payroll and managers, while Carbondale has only one user.

  23. Example Partitioning • Chicago: • payroll: EmpCP(ID#, Name, Salary, #deducts) • MGRs: EmpCM(ID#, Job, Mgr-ID#) • Peoria: • payroll: EmpPP(ID#, Salary, #deducts) • MGRs: EmpPM(ID#, Name, Job, Mgr-ID#) • Carbondale: • EmpDS(ID#, Name, Salary, #deducts, Job, Mgr-ID#)

  24. Example: Global Schema • Global Conceptual Level: • EmpGlobal(ID#, Name, Salary, #deducts, Job, Mgr-ID#) • Fragmentation Schema: • Three horizontal fragments (one for each location) • Two of these horizontal fragments are then fragmented vertically. • This is a mixed scheme.

  25. Example: Global Schema II • Global Allocation Schema: • Chicago: two fragments, one to Payroll, one to Managers. • Peoria: two fragments, one to Payroll, one to Managers. • Carbondale: just one fragment.

  26. Example: Architecture Steps • Examine User’s External View • Determine access rights • Examine Ext/Conceptual Level Mapping • Determine how view is mapped to tables. • Examine Conceptual Level • Determine what tables are needed • Examine Fragmentation Schema • Determine how tables are fragmented

  27. Example: Architecture Steps II • Exame Allocation Schema • Determine where fragments are located • Request data from local sites • Merge Vertical Fragments with a join • Merge Horizontal Fragments with a union

  28. Example: Global Querying: • Global user wishes to retrieve all the information from all offices: • Logically: EmpCP  EmpCM  EmpPP  EmpPM  EmpDS

  29. Query Processing • Example: • Emp(Name, SSN, Bdate, Addr, Sex, Salary, Mgr, Date) • 10,000 Records, each is 100 bytes • Name is 30 bytes, SSN: 9, DNO: 4 • AT SITE 1 • Dept(D#, Dname, MgrSSn, MgrStartDate) • 100 records, 35 bytes each • D#: 4 byts, Dname: 10 bytes, Mgr: 9 Bytes • AT SITE 2

  30. Querying • FROM SITE 3: • “Retrieve every employee’s name, and the name of his/her department.” • Easy in SQL: • Select Ename, DnameFrom Emp, DeptWhere Dno = D# • Hard part: • Where to join them?

  31. Querying II: Where to join? • Where should the join be done? • Let’s look at each possibility, and calculate total number of bytes that need to be transferred to do the join there.

  32. Joining at Site 1: • Site 2: 3,500 bytes to site 1 • Join done at site 1 • Result: send 400,000 bytes to site 3 • Total: 403,500 bytes transferred

  33. Joining at Site 2: • Site 1: 1,000,000 bytes to site 2 • Join done at Site 2 • Result: send 400,000 bytes to site 3 • Total: 1,400,000 bytes transferred

  34. Joining at Site 3: • Site 1: 1,000,000 bytes to Site 3 • Site 2: 3,500 bytes to Site 3 • Join done at Site 3 • Total: 1,003,500 bytes transferred

  35. Query Analysis • Joining at Site 1: • Total: 403,500 bytes transferred • Joining at Site 2: • Total: 1,400,000 bytes transferred • Joining at Site 3: • Total: 1,003,500 bytes transferred • PICK JOINING AT SITE 1

  36. Query with Optimization • Now, optimize query: • (EN,DN) [(EN,D#) (Emp)  (D#,DN) (Dept)] • Projection on Emp has 340,000 bytes • Projection on Dept has 1,400 bytes • Result will have 400,000 bytes.

  37. Optimized Join • Join at Site 1: • 1,400 + 400,000 = 401,400 bytes transferred. • Join at Site 2: • 340,000 + 400,000 = 740,000 bytes transferred. • Join at Site 3: • 340,000 + 1,400 = 341,400 bytes transferred. • DO JOIN AT SITE 3.

  38. Distributed Transactions • From the previous example: suppose I wish to add a new department and also add some employees to it. • This requires updating the Dept table at Site 2 and also the Emp table at site 1. THESE UPDATES MUST BE IN THE SAME TRANSACTION.

  39. Distributed Transaction Analysis • Transactions are broken down into “Sub-transactions” which can be executed at one site. • Each Sub-Transaction is maintained at the primary site of its parent transaction, as well as at the child site (host site of data being updated). • Next Question: • How to commit and rollback whole transaction?

  40. Commit and Rollback Problems • To commit, all actions of the transaction must be completed. • In the distributed environment, it means all sub-transactions must be completed before the whole transaction can be completed. • How does main transaction processor (coordinator) know if all sub-transactions at their sites (subordinates) have been completed?

  41. New Commit Protocol • Two-Phase Commit: • When the main transaction is completed, the coordinator sends a “prepare to commit” message to all subordinates. This is also written to its transaction log. • These sites write a “prepare sent” message to their transaction logs and return either “Yes” or “No” back to coordinator. This signals whether or not their sub-transaction was completed.

  42. New Commit Protocol II • If coordinator receives all “Yes” responses from all subordinates: • It issues a commit to its transaction log and sends “Commit” to all subordinates. • If coordinator receives at least one “No”, or it times-out waiting for a response from any site: • it must rollback the entire transaction. • What does the subordinates do next?

  43. New Commit Protocol II • If subordinate receives a rollback request: • It performs this action on its transaction log and sends an acknowledgement back to the coordinator. • If it receives a commit: • It writes this to its transaction log and sends an acknowledgement back to the coordinator. • If coordinator receives all acknowledgements: • It writes an end-of-transaction notice to its log.

  44. Distributed Recovery • Problem: a site crashes and then comes back up. • What to do about transactions for which it was the coordinator? • What to do about transactions for which it was a subordinate? • What to do if a site you are communicating with goes down?

  45. Dist. Recovery -- Coordinator • If a transaction has been committed • the send a commit to each subordinate. • Same idea for a rollback (abort). • If no commit or rollback appears in the log • It must be undone, but the site doesn’t know if it is coordinator or subordinate. • To decide, wait for a coordinator or subordinate request.

  46. Dist. Recovery -- Subordinate • If a transaction has been prepared and committed • send an ACK to coordinator. • Same idea for a rollback (abort). • If a PREPARE is found, but no commit or rollback • Contact coordinator and ask for a commit or rollback.

  47. Dist. Recovery -- Subordinate II • If no PREPARE appears in the log, • It must be undone, but the site doesn’t know if it is coordinator or subordinate. • To decide, wait for a coordinator or subordinate request.

More Related