1 / 18

Distributed Databases

Distributed Databases. Logical next step in geographically dispersed organisations goal is to provide location transparency starting point = a set of decentralised DBs located in different places, developed for the specific information needs of each site

lovie
Télécharger la présentation

Distributed Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Databases • Logical next step in geographically dispersed organisations • goal is to provide location transparency • starting point = a set of decentralised DBs located in different places, developed for the specific information needs of each site • Aim: to integrate these decentralised DBs into a coherent DDB

  2. Advantages of Distributed DBs: • Increased reliability of systems and availability of data • Local control preserved • Modular growth possible at each site and at new sites • Optimised communication costs • Faster response times

  3. Control in normal DBs • transaction control: ability of the DBMS to ensure the successful completion of transactions • commit transactions • roll-back to previous state • concurrency control: ability of the DBMS to arbitrate between concurrent uses of data: • simultaneous access • simultaneous update • deletion

  4. Control in Distributed DBs • Different portions of the overall database reside at different locations • these portions are controlled by different processors running sometimes different DBMSs • common schema means queries can involve any portion of the DB residing at any location

  5. Options for Distributed DBs • Issue of physical design (data structure) • performance of the DB (response time...) depends upon good design • There are a number of options: • data replication • horizontal partitioning • vertical partitioning • combinations of the above

  6. Data replication • store a separate copy of the full tables in each location • if a copy is stored at every site: Full Replication • Advantages: • reliability • fast response • Disadvantages • storage requirements • complexity and cost of updating

  7. Horizontal partitioning • some of the rows of the tables are stored in one location; others are stored at other locations • eg: customers banking out of a particular branch • Advantages: • efficiency • local optimisation • security • Disadvantages: • inconsistent speed access • backup vulnerability

  8. Vertical partitioning • some columns are projected into base relationship at different sites • all relations share a common domain so the full table can be reconstructed • Advantages: • tailor-made support for functional areas • same as horizontal partitioning • Disadvantages: • some queries might be very slow • users must understand some design issues

  9. Combinations of the three methods • most of the time, companies will use different methods • each method is efficient in certain situations + some other security requirements • eg: local customers, information originating at a certain site, shared processes that require the same data at all sites • it is a design issue to try to identify the optimal distribution - data at the sites where it is used most

  10. Distributed DBMS • additional roles to play in the case of a distributed DB • determine the location where data to be retrieved is located • translate the request into the language used by the local DBMS • deal with normal data management functions, security matters, locking, query optimisation...

  11. Heterogeneous Distributed DBMS • a different DBMS running at each site • a master DBMS controlling the interactions amongst the parts • not practical today (compatibility) • more often, each DBMS follows the same data architecture

  12. Problems with global transactions • DBMSs can be radically different - relational versus network • only some state-of-the-art commercial products have translating capabilities • one alternative solution is to put some essential data and the directory of the data locations on a central server • Real distributed DBMS solve these problems for the users with the help of the NOS

  13. Commit Protocol • to ensure the integrity of the data in update operations • well defined procedure based on the exchange of messages (“ok” or “not ok”) • each global transaction can either be complete (and completed) or aborted • Two-phase commit: • site originating the transaction sends requests to all sites involved in the update • all sites attempt to process their part of the transaction without committing the data (temp files) • they notify the first site whether OK or not • the first site collects all OKs and sends order to commit the data

  14. Timestamping • Alternative to locking (possibility of deadlocks) • ensures that transactions are processed in serial order so locking in not needed • All updated records carry the timestamp of the transactions that modified them • if new transaction attempts to update a record with an earlier timestamp = OK • If new transaction ...with a later stamp, update access is denied, the transaction is re-stamped and is re-started

  15. Example: Record in a DB 168 Record update: 170 OK Updated record 170 Record Update: 165 Denied Record Update: 170 Transaction re-started (ie: do it again) Updated record 170 +++: costly deadlock situations are avoided ----: transactions may sometimes be restarted even though they did not conflict with previous ones.

  16. Effect of design on speed • how to design fast queries • simple example with two sites in relational DB: • supplier (Supplier#, ...,City): 10,000 records stored in Detroit • part (part#, .., colour): 100,000 records stored in Chicago • Shipment (supplier#,..., Part#): 1,000,000 records stored in Detroit • each record is 100 characters long + there are 10 red parts • data transmission is 10,000 character/second, 1 second delay in any communication • data processing negligible • Write the SQL statement • Imagine how the query can be carried out between the two sites

  17. SQL statement select supplier.supplier# from supplier, part, shipment where supplier.city = ‘Cleveland’ and supplier.supplier# = shipment.supplier# and shipment.part# = part.part# and part.color = ‘Red’

  18. Conclusions • Reasonably easy to optimise query with two tables • Very complex with more than two (try with 30!) • Rules: • Queries must be broken down into components isolated at different sites (minimise communication time and traffic) • Determine which site has the potential to yield FEWER selected records • Move preliminary results to site where rest of the work can be performed (ie: try to move as few records as possible)

More Related