Distributed Database research group

In the name of God Distributed Database research group Instructor: Dr. M. Rahgouzar Samira Tasharofi Reza Basseda

Outline • Introduction • Distributed Data Storage • Distributed Transaction • Commit Protocols • Concurrency Control in Distributed Database • Availability • Distributed Query Processing • Heterogeneous Distributed Databases • Directory Systems • Conclusion • Acknowledgement

Introduction • Distributed computing • Consists of a number of processing elements (not necessarily homogeneous ) that interconnected by a computer network and that co-operate in performing their assigned tasks • Distributed Database • Database whose relations reside on different sites • Database some of whose relations are replicated at different sites • Database whose relations are split between different sites

Introduction • Distributed database management system • Software system that permits the management of distributed database system • Advantages • Local autonomy • Improved performance (by proper fragmentation) • Improved reliability/availability (by replication) • Greater expandability • Greater shareability

Introduction (Cont.) • Disadvantages • Higher complexity • Higher software and hardware cost • Synchronization and co-ordination among the sites • Higher maintenance overhead in case of replication • Greater security problem

Distributed Data Storage • On Dynamic Fragmentation of Distributed Databases Using Partial Replication – D. Pinto, G. Torres ,2002 • Fragmentation • Three kinds of fragmentation : Vertical Fragmentation, Horizontal Fragmentation and Mixed Fragmentation • Solution • horizontal fragmentation with partial replication • RBy2(bound) Algorithm • 1. For each query requested, a slave computer increments a counter (ctr) for the user that have made the request. • 2. If ctr reaches bound number (parameter of this algorithm), then this computer is a candidate to have a set of records replicated and need to follow steps 3 and 4, else step 5. • 3. Request the set of records that the user is asking for and save this information into the slave database. • 4. Reset the user local counter to cero. • 5. end • Local access database before sending the query to the master computer

Distributed Data Storage (Cont.) • The algorithm is based on two techniques • Slave-Master Search, to provide fast access to database on user queries • Replication slave-master (two computers) in order to get availability • Allows database availability even when the connection between slave and master database is broken • Proved on local network • Future Work • Check how the algorithm works under dial-up connection

Distributed Data Storage (Cont.) • Transparent Data Relocation in Highly Available Distributed Systems,S. Voulgaris, M.V. Steen, A. Baggio, and G. Ballintjn, 2002 • Management issue of distributed services: redistribution of non-replicated data among the servers comprising a distributed service • Redistribute the data without disrupting the service’s availability • Solution Base • Shipping the data records that need to be relocated to their new hosting server • Updating the servers’ mapping information to reflect the new configuration of the distributed service

Distributed Data Storage (Cont.) • Solution For a Single Redistribution • Initialization • Distribute new mapping M’ • Record Relocation • Termination • Replace M with M’ • Solution for Overlapping Redistributions • Per-server Sequential Redistribution • Using redistribution R2 after R1 completed • Per-server Mixed but Ordered Redistributions • The server ships each record as soon as possible, based on the virtual mapping with first preference • Direct Shipping to Final Destination

Distributed Data Storage (Cont.) • Advantages • low delays in the servicing of client requests during a configuration change • Adding no significant processing requirement to the servers involved, and terminates in a timely fashion • conceptual simplicity • sequential concurrent versions

Transaction Management • Ensuring Relaxed Atomicity for Flexible Transaction in Multidatabase Systems, A.Zhang, M.Nodine, B.Bhargava, O.Bukhres, ACM • Global transaction: A set of sub transactions, where each sub transaction is atransaction accessing the data items at a single local site • Flexible global transaction: Specifying Definitions of • Execution ordering dependencies between two sub transaction • Alternating dependencies between two subsets of sub transactions • Eliminate prepare-to-commit stage • Sub transaction • Retriable • Compensable • Pivot • In global transactions at most one sub transaction can be pivot

Transaction Management (Cont.) • Semi-atomicity in Flexible Transaction: Weaker than atomicity in global transaction • All its sub transaction in one <-rpo commit and all attempted sub transactions not in the committed <-rpo are either aborted or have their effects undone • No partial effects of its sub transactions remain permanent in local database • Ensuring semi-atomicity • Using retry and compensetable techniques • Flexible transaction Advantages • Enhances the scope of global transaction management beyond that offered by the traditional global transaction model • Blocking that caused by two-phase commit can be prevented

Transaction Management (Cont.) • Global scheduling for flexible Transactions in Heterogeneous distributed Database Systems, A. Zhang, M. Nodine, B. Bhargava • Global Serializability • If the projection of committed local, flexible and surplus transactions is conflict equivalent to some serial execution of these transactions • Compensation-interference free • For any sub transaction tj which is serialized between a subtransaction ti and its compensating transaction cti in S, WC(ti) /\ AC(tj) = 0

Transaction Management (Cont.) • F-serializability • Prevents the flexible transactions which are serialized between a flexible transaction and its compensating sub transactions to affect any data items that have been updated by flexible transaction (global serializable and compensation free) • Avoids unnecessary abort or compensation • Scheduling Protocols • Stored Sub transaction Execution Graph (SSEG) • Avoid cascading abort

Commit Protocols • A Two-Phase Commit Protocol for Mobile Wireless Environment, N. Nouali, A. Doucet, H. Drias, 2005 • IF the traditional 2PC is executed in mobile environment, disconnections will increase the number of, may be wrong, abortion decisions of transaction because if a FH tries to communicate with it a disconnected MH this will cause a failure • Disconnections are not exceptions but rather are part of the normal mode of operation, so they should not be treated as failures

Commit Protocols (Cont.) • The case of mobile client and fixed servers, Fixed Coordinator • To mitigate the unforeseeable breakdowns, the client must force-write the identity and location information of the coordinator (commit-BS) just before sending the commit-request • Only one force-write is needed to record the coordinator information during the entire execution of Atomic commitment Protocol (ACP)

Commit Protocols (Cont.) • The case of mobile client and mobile servers • The participant-agent is responsible of transmitting the result to the participant at reconnection time and also of keeping logs and eventually recovering in the case of failure • When participant registers to a new BS, the participant MH (or mobile participant) informs its participant-agent about its new location • Workload is shifted to the fixed part of the network thus preserving processing power and communication resources and minimizing traffic cost over the wireless links

Commit Protocols (Cont.) • Reducing the Latency of Non-Blocking Commitment using Optimism and Replication, R.J.Peris, M.P.Martinez,G.Alonso, S.Arevalo, 2001 • 2Phase Commit : Blocking • 3Phase Commit • Sending too much messages • Low performance • Flushing log records adds to the overall latency as messages cannot be sent or responded to before writing to the log • Delay is reduced by allowing sites to send messages instead of flushing log records

Commit Protocols (Cont.) • Solution • To use the main memory of a replicated group as stable memory instead of a mirrored log with careful writes • The group of the participating transaction managers (the TM group) • A replicated group providing the commit service that acts as coordinator (the CS group) • A participant does not wait to flush its log, instead it uniformly multicasts its vote together with its log entry

Commit Protocols (Cont.) • If the message corresponds to the last vote, and all were yes votes, the transaction is optimistically committed, and the fact is communicated to the TM group • The optimistic commit changes the locks held by the transaction to opt-mode • Do not allow to the holding transaction to commit until the transaction that released them, definitively commits

Commit Protocols (Cont.) • The message is optimistically delivered right away without waiting for the stabilization of the message (e.g., waiting for the message to be received by all the members of the group)

Commit Protocols (Cont.) • The PROMPT Real-Time Commit Protocol, J.H.Harista, K.Ramamritham, R.Gupta,IEEE,1999 • Distributed commit processing can have considerably more effect than distributed data processing on real-time performance • PROMPT • New commit protocol in real-time distributed transactions • Preventing borrowers from continuing to execute if their associated lenders had not been received their decisions was addressed by incorporating an additional bit and message that informed the master about the borrowing state and the completion of borrowing by a cohort

Commit Protocols (Cont.) • Features • Controlled optimistic access to uncommitted data • Reduce data inaccessibility and priority inversion • Active abort • Cohorts inform the master as soon as they decide to abort locally • Silent kill • Aborts due to deadline misses that occur before the master has initiated the commit protocol are implemented silently • Healthy-Lending • Health factor : deadline of transaction • Using HF to decide whether transaction can lend its data (best choice)

Commit Protocols (Cont.) • One-Phase Real-Time Commit Protocols, P.Saha, 1999 • Comparing one-phase protocols (e.g. EP) and PROMPT, the best-performing two-phase real-time commit protocol • For parallel distributed transaction, EP outperforms PROMPT • For sequential distributed transaction, EP perform rather poorly • For high workload cases EP performs better • Future Works • Addressing the security considerations in Multi-Level Secure (MLS) distributed RTDBS • Combination of EP and PROMPT

Concurrency Control in Distributed Databases • Distributed Concurrency Control Performance: A Study of Algorithms, Distribution, and Replication • Michael J. Carey & Miron Livny • It express Distributed Concurrency Control Algorithms and evaluate their performance in some of conditions. At the start, It describe Concurrency Control classic algorithms such as 2PL, Wound-Wait, Basic Timestamp ordering and discussed on the structure of distributed concurrency control algorithm. Then it suggest a basic model for DDB and has some experiments with those algorithm and evaluate the model.

Concurrency Control in Distributed Databases (Cont.) • Concurrency Control in Distributed Database Systems • PHILIP A. BERNSTEIN & NATHAN GOODMAN • It explain mythological proofs for Distributed Concurrency Control Algorithms and theoretically evaluate them. It define serializablity in DDB and define a formal language to formulate transactions in DDB. Then it express each of classic algorithms in his formal language and prove their correctness and completeness.

Concurrency Control in Distributed Databases (Cont.) • Concurrency Control in Distributed Object-Oriented Database Systems • Kjetil Nørv°ag & Olav Sandst°a & and Kjell Bratbergsengen • The simulation results in this paper is a comparison of performance and response times for two concurrency control algorithms, timestamp ordering and two-phase locking. The simulations have been run with different number of nodes, network types, data declustering and workloads. The results show that for a mix of small and long transactions, the throughput is significantly higher for a system with a timestamp ordering scheduler than for a system with a two-phase locking scheduler. • Implementing Atomic Actions on Decentralized Data • DAVID P. REED • It’s a general survey on Concurrency Control in DDB and describe classic method.

Concurrency Control in Distributed Databases (Cont.) • Dynamic Voting Algorithms for Maintaining the Consistency of a Replicated Database • SUSHIL JAJODIA & DAVID MUTCHLER • The best known pessimistic algorithm, voting, is a “static” algorithm, meaning that all potential distinguished partitions can be listed in advance. It presents a dynamic extension of voting called dynamic voting. This algorithm permits updates in a partition provided it contains more than half of the up-to-date copies of the replicated file. It also presents an extension of dynamic voting called dynamic voting with linearly ordered copies (abbreviated as dynamic-linear). These algorithms are dynamic because the order in which past distinguished partitions were created plays a role in the selection of the next distinguished partition.

Concurrency Control in Distributed Databases (Cont.) • Deadlock Detection in Distributed Databases • EDGAR KNAPP • This paper is concerned only with the aspect of deadlock detection. Recent developments in the area of distributed deadlock detection algorithms are surveyed, with a special emphasis on their relation to distributed DBSs. The paper introduces a uniform framework for the discussion of these algorithms. The abstraction achieved this way allows us to talk about the algorithms in terms of the underlying theoretical concepts, instead of just giving a phenomenon-logical description of the workings of the algorithms.

Concurrency Control in Distributed Databases (Cont.) • MODELS OF A VERY LARGE DISTRIBUTED DATABASE • Mark Blakey • The best known pessimistic algorithm, voting, is a “static” algorithm, meaning that all potential distinguished partitions can be listed in advance. It presents a dynamic extension of voting called dynamic voting. This algorithm permits updates in a partition provided it contains more than half of the up-to-date copies of the replicated file. It also presents an extension of dynamic voting called dynamic voting with linearly ordered copies (abbreviated as dynamic-linear). These algorithms are dynamic because the order in which past distinguished partitions were created plays a role in the selection of the next distinguished partition.

Concurrency Control in Distributed Databases (Cont.) • Performance Study of a Centralized Concurrency Control Algorithm for Distributed Database Systems using SIMULA • K. Viswanathan Iyer & L. M. Patnaik • One objective of this paper is to elaborate the simulation methodology using SIMULA. Detailed studies have been carried out on a centralized CC algorithm and its modified version. The results compare well with a previously reported study on these algorithms. Here, additional results concerning the update intensiveness of transactions and the degree of conflict are obtained. The degree of conflict is quantitatively measured and it is seen to be a useful performance index. It seems that, It is going to formulate the effectiveness of Concurrency Control Algorithm and it focused on the behavior of a class of performance index.

Availability • Maintaining Availability in Partitioned Replicated Databases • AMR EL ABBADI & SAM TOUEG • It describes a new replica control protocol that allows the accessing of data in spite of site failures and network partitioning. It claims that this protocol provides the database designer with a large degree of flexibility in deciding the degree of data availability, as well as the cost of accessing data. • Providing High Availability Using Lazy Replication • RIVKA LADIN & BARBARA LISKOV & SANJAY GHEMAWAT • This paper describes a new technique that supports causal order. An operation call is executed at just one replica; updating of other replicas happens by lazy exchange of “gossip” messages—hence the name “lazy replication.” The replicated service continues to provide service in spite of node failures and network partitions.

Distributed Query Processing • Query Brokers for Distributed And Flexible Query Evaluation • Tuyet-Trinh yu & Christine Collet • This paper provides an approach for designing query processor of a DDB by using hierarchical mediators and using query brokers which translate a global query in DDB context to local queries • Query Decomposition, Optimization and Processing in Multidatabase Systems • Cem Evrendilek & Asuman Dogac • This paper suggest an approach to decomposing queries in a optimized manner. In this way, we need to dynamically calculate cost of query processing in every sites for all of the sub queries and using this factors in calculating minimum cost.

Distributed Query Processing (Cont.) • Dynamically Distributed Query Evaluation • Trevor Jim & Dan Suciu • This paper provides an approach for evaluation of queries over the web and a directory system dynamically. It provides a language for explaining information requirements over a multi database system. It uses this language for defining a DDB and its queries in a formal way. So it suggest an algorithm for dynamic query evaluation in DDB and by this logic it proves its algorithms correctness.

Distributed Query Processing (Cont.) • Database Connectivity Using an Agent-Based Mediator System • Larry M. Stephenes & Michael N Huhns • This paper provides an Agent-Based approach for managing a DDB. It uses Agents as proactive components which include KB about system and have reaction to topology changes to manage query processing and concurrency control , …. It uses KQML and a specific coordination strategy for this system.

Distributed Query Processing (Cont.) • Optimizing Equijoin Queries In Distributed Databases Where Relations Are t-lash Partitioned • DENNIS SHASHA & TSONG-LI WANG • It studies the optimization problem that arises when the query processor must repartition the relations and intermediate results participating in a multi join query. Using estimates of the sizes of intermediate relations, it shows (1) optimum solutions for closed chain queries; (2) the NP-completeness of the optimization problem for star, tree, and general graph queries; and (3) effective heuristics for these hard cases.

Conclusion • DDB is a mature topic and many model provided for expressing its approaches such as concurrency control, availability , … • Various models using to prove correctness of its approaches in concurrency controls , … • It is faced with many of DB problems in a new viewpoint because of its distribution • Legacy system and wrapper design to …

Conclusion (Cont.) • High availability by increasing replication vs. performance of transaction management and concurrency control: A trade off • DDB system Concurrency control open problems • Improving recent approaches to increase availability with performance • Improving distributed query evaluation over large distributions

Conclusion (Cont.) • Tuple routing over a distribution and data distribution with high performance and performance evaluation factors in DDB • Using autonomous components to manage a DDB and using AI in peer-to-peer DDB

Acknowledgement

References • Silbershots et al , “Database System Concepts” 4th edition , McGraw-Hill, 2002

Distributed Database research group

Distributed Database research group

Presentation Transcript

CS4404 Distributed Database

Distributed Database

Distributed Database Systems

Distributed Database Applications

Distributed Database Systems

Distributed Database

Distributed Database

DISTRIBUTED DATABASE SECURITY

Distributed Accounting Working Group (DAWG) Distributed Accounting Models Research Group

Distributed Database Applications

Distributed Database Security

Distributed Database Systems

Distributed Database Systems

DISTRIBUTED DATABASE ARCHITECTURE

Distributed Database Design

Distributed Database Applications

Distributed Database Systems

Distributed Database Services