1 / 12

Distributed Databases

Distributed Databases. Arijit Sengupta. Learning Objectives. Understand the concept and necessity of distributed databases Understand the types of distributing data: Replication Vs Fragmentation Understand the process of improving availability

niles
Télécharger la présentation

Distributed Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Databases Arijit Sengupta

  2. Learning Objectives • Understand the concept and necessity of distributed databases • Understand the types of distributing data: Replication Vs Fragmentation • Understand the process of improving availability • Understand the method for query processing under distributed environments

  3. Introduction • Data is stored at several sites, each managed by a DBMS that can run independently. • Distributed Data Independence: Users should not have to know where data is located (extends Physical and Logical Data Independence) • Distributed Transaction Atomicity: Users should be able to make distributed transactions just like local transactions

  4. Types of distributed databases • Homogeneous • Same structure of data • Same DBMS • Sites aware of one another • Reduced autonomy of individual sites • Heterogeneous • Potentially different schema • Potentially different DBMS • Sites may not be aware of one another • No reduced autonomy of sites

  5. Distributed data storage • Replication • Identical replicas of tables in each site • Higher availability • Higher parallelism • Increased update overhead • Fragmentation • Tables are fragmented across sites, only partial data available in each site • Vertical fragmentation: subsets of columns only available (PK must be in all sites – why?) • Query results obtained by joining individual results • Horizontal fragmentation: subsets of rows available in each site. • Query results obtained by performing union on individual results.

  6. Why distributed databases? • Availability • Robustness • Detect failures • Reconfigure system for continued functionality • Recover when failure is corrected

  7. Distributed Query Processing • Cost of data transmission over the network • Evaluation of gains by the process of transformation

  8. Distributed Joins • Example: London: 500 pages of Sailor table, Paris: 1000 pages of reserves table • Method 1: Fetch as Needed: • Cost: 500 D + 500 * 1000 (D+S) • D is cost to read/write page; S is cost to ship page. • If query was not submitted at London, must add cost of shipping result to query site. • Can also do INL at London, fetching matching Reserves tuples to London as needed. • Ship to One Site: Ship Reserves to London. • Cost: 1000 S + 4500 D (SM Join; cost = 3*(500+1000)) • If result size is very large, may be better to ship both relations to result site and then join them!

  9. Semijoin • At London, project Sailors onto join columns and ship this to Paris. • At Paris, join Sailors projection with Reserves. • Result is called reduction of Reserves wrt Sailors. • Ship reduction of Reserves to London. • At London, join Sailors with reduction of Reserves. • Idea: Tradeoff the cost of computing and shipping projection and computing and shipping projection for cost of shipping full Reserves relation. • Especially useful if there is a selection on Sailors, and answer desired at London.

  10. Updating Distributed Data • Synchronous Replication: All copies of a modified relation (fragment) must be updated before the modifying transaction commits. • Data distribution is made transparent to users. • Asynchronous Replication: Copies of a modified relation are only periodically updated; different copies may get out of synch in the meantime. • Users must be aware of data distribution. • Current products follow this approach.

  11. Data Warehousing and Replication • A hot trend: Building giant “warehouses” of data from many sites. • Enables complex decision support queries over data from across an organization. • Warehouses can be seen as an instance of asynchronous replication. • Source data typically controlled by different DBMSs; emphasis on “cleaning” data and removing mismatches ($ vs. rupees) while creating replicas. • Procedural capture and application Apply best for this environment.

  12. Summary • Distributed Databases improve availability and parallelism • Issues with query processing and updates • Useful for Data warehousing

More Related