1 / 20

Distributed Databases

Distributed Databases. Agenda. Definition Reasons for distributing Pieces and parts Distribution options and schemes Distributed design Allocation and replication Concurrency, recovery and security. A distributed system …. Stores data at multiple sites

Télécharger la présentation

Distributed Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Databases

  2. Agenda • Definition • Reasons for distributing • Pieces and parts • Distribution options and schemes • Distributed design • Allocation and replication • Concurrency, recovery and security

  3. A distributed system … • Stores data at multiple sites • Is connected by a communications network • Has a DBMS at each site

  4. Reasons for distributing data • Match information uses of organization • Increase reliability and availability • Combining data sharing and local control • Improve performance • Easier to add new sites

  5. Disadvantages of distributing data • Db and supporting systems are difficult to manage and control • Increased opportunity for security problems • Varied standards • Increased development and operating costs

  6. DDBs require methods for • Communication between sites • Knowing what data is where • Planning queries and transactions • Maintaining consistency of replicated copies on updates • Recovering from site/link failures • Coordinating software and record structures if local DBMSs aren’t homogenous.

  7. Parts of a distributed network • Multiple sites • Network between sites • Transaction processor(s) - (requestor) • Data processor(s) - (data storage)

  8. DDB Types • Single site processing, single site data • Multiple site processing, single site data • Multiple site processing, multiple site data • Homogeneous • Heterogeneous • Multidatabase

  9. Multidatabases • Integrate pre-existing resources • Local database maintains autonomy • Global schema

  10. MDB issues • Site autonomy • Differences in data representation • Missing or conflicting data • Heterogeneous local dbs • Global constraints • Global query processing • Concurrency • Security • Local node requirements • Global schema

  11. Transparency • Data distribution • Transaction • Failure • Performance

  12. Distributed design • Where are possible sites? • Fragmentation • Base tables vs. horizontal vs. vertical • Disjoint vs. overlapping • Allocation • Replication • Selective vs. complete

  13. Projects Projects ID Name StartDate Dept ID Dept 27 Upgrade 2002-10-08 20 27 20 17 Test int 2002-10-11 40 17 40 42 remove 2000-08-01 20 42 20 Admin (UK) HR (Austin) IS (RTP) Department ID Name Dept Department 10 Administration Austin 40 Human Resources Austin ID Name Dept 20 Information Sys RTP Department vertical ID Name Dept horizontal 10 Administration Austin 20 Information Sys RTP 40 Human Resources Austin Employee ID Name Dept 15 Andrea Smith 10 17 Peter Wilson 40 42 Anan Gopal 20 Fragmentation

  14. Allocation schemes • Nonredundant best fit • Look at usage of fragments • Put fragment at site with most references • All beneficial sites (Example) • Cost = local updates + remote updates • Benefit = (remote update – local update) * number of queries

  15. Replication • How much data should be replicated? • Where? • Issues of dealing with multiple copies

  16. Queries • Minimize amount of data transfer • Use local copies where possible • Minimize size of tables transferred • Semi join Table 1 (site 1) Table 2 (site 2) A1 b3 A2 b2 A3 b12 A4 b5 A5 b1 A1 c5 A5 c1 A6 c43

  17. Concurrency • Locks for reading and update become trickier if there is more than one copy around. Must either: • Notify all copies that there is a lock • Have one place to check if something is locked • Two phase commits

  18. Recovery • Need to consider different situations: • If one site fails in the middle of a tx before a commit • If communication link fails (but can still get to site through another route, though it will take longer) • How to bring a down site up to date with updates made while it was down.

  19. Security and access control • Open (maximized sharing) • Closed (need to know) • Content independent vs. content dependent • Statistical control • Context dependent control • Where does security checking take place?

More Related