1 / 17

Replicated Distributed Programs

Replicated Distributed Programs. Eric C. Cooper University of California Berkeley, 1985 4. 21. 2004 Presented by Roh, HyunGul hgroh@camars.kaist.ac.kr. Contents. Introduction Overview A Model of Replicated Distributed Programs Replicated Procedure Calls Performance analysis

Télécharger la présentation

Replicated Distributed Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Replicated Distributed Programs Eric C. Cooper University of California Berkeley, 1985 4. 21. 2004 Presented by Roh, HyunGul hgroh@camars.kaist.ac.kr

  2. Contents • Introduction • Overview • A Model of Replicated Distributed Programs • Replicated Procedure Calls • Performance analysis • Synchronization • Transaction • Binding • Summary 2

  3. Introduction • Want highly available distributed programs • Despite failures of some of its component • Fault-tolerant • Nonstop • Replicated Distributed Programs • Replication ; von Neumann(1956) • How in this paper? • Replication on per-module basis; • Flexible & not burdening the programmer • Provide transparency to programmer • Fundamental mechanism • Troupes, or replicated modules • Replicated procedure call, many-to-many 3

  4. Replicated DistributedPrograms Active agent as single thread State information Troupe module procedures Overview - A model of Replicated Distributed Program - How can replications be added? How can controls be transferred? Which protocols are used? • What are problems issued? • Synchronization problem • Transaction • Binding distributed & replicated Programs • Reconfiguration & Recovery from partial failure 4

  5. Replicated DistributedPrograms Active agent as single thread State information Troupe module procedures Distributed Modules & Treads • Modules • Packaging state information & procedures • Separating the interface to that abstraction from its implementation • Threads • “A thread of control is an abstraction intended to capture the notion of an active agent in a computation” • Particular thread runs in exactly on module at a given time • Multiple threads in same module • Moving among modules • Implementation • Provide location transparency • Module; implemented by server • Thread; implemented by using RPC to transfer control from server to server 5

  6. Adding Replication to Distributed Programs • Partial failures of the distributed program • Masking failures is replication • Replication transparency (RT) • Terminology • Troupes; replicated module • Troupe members; the replicas • Assumptions • Troupe members; execute on fail-stop processor • if not => Byzantine agreement • How is replication transparency in troupe model guaranteed? • Deterministic troupe: a set of replicas of a deterministic modules • (input → unique output) • Troupe consistent (TC) ; When all its members are in the same state • In the absence of application-specific knowledge, TC ⇔ RT • All troupes is deterministic ⇒ guarantee RT troupe Replicated Procedure Calls Troupe member Troupe member Troupe member 6

  7. client server Call P P: proc Call P P: proc Call P P: proc Replicated Procedure Calls(1/3) • RPC (remote procedure call) : distributed programs can be written as local programs • Replicated Procedure Calls: When modules are replaced by troupes, natural generalization of RPC • Server & Client? • Server troupe: have a procedure module • Client troupe: caller • How? The Circus Paired Message Protocol • Previous works by author • Characteristic • Paired msgs (e.g. call and return) • Reliably delivered • Variable length • Call sequence numbers • Based on the RPC • Use UDP, the DARPA User Datagram Protocol • Connectionless but retransmission 7

  8. server client P: proc Call P P: proc Call P Call P P: proc P: proc Call P [Many-to-one call] Replicated Procedure Calls(2/3) • RPPCs are implemented on the top of the paired message layer • One-to-Many calls • Each client troupe member to the entire server troupe • The client will normally wait for all return msgs from the server troupe • Many-to-One calls • Each server troupe member handles from the entire client troupe • Two problems: • Distinguish unrelated call • How many other call msgs to expect as part of the same replicated call • (Author’s previous work ) [One-to-many call] 8

  9. client server Call P P: proc Call P P: proc Call P P: proc Replicated Procedure Calls(3/3) • Many-to-Many calls • Client: call msgs to entire server troupes • Server: return msgs to entire client troupes • Waiting for Message to arrive • Since troupes are assumed to be deterministic, all msgs will be identical • When should computation proceed? • only after the entire set has arrived • First come • only after the entire set has arrived • error detection, error correction • Expensive execution time • First come • Determined by fastest member of each troupe • Send return to un-received members as soon as fastest member call got • Crashes and Partitions • crash detection: Probing & timeout • Network partition: Which member receive Majority of the expected set of message? • Collators • A function that maps a set of msgs into a single result 9

  10. Performance Analysis • Measuring the cost of replicated procedure calls as a function of the degree of replication • Six VAX-11/750 by a single 10 Mb/s Ethernet 10

  11. The Synchronization Problem for troupes • Multiple threads of control • If they want same resource? • Serializability can be achieved by any of a number of concurrency control algorithm • When Server module is a troupes; • Serialized by each server troupe member • Serialized by the same order 11

  12. Replicated Transactions • The Transaction mechanism • Guaranteeing serializability & atomicity • Conventional transaction • The permanence of committed updates • Crash recovery algorithm • Correctness condition for conventional transactions • serializability • Troupe consistency must also be preserved • Existing concurrency control algorithm for replicated DB • Require communication among replicas • Can’t be used in troupe model • One well-known multiple-copy concurrency control algorithm • two-phase locking with unanimous update 12

  13. A Troupe Commit Protocol • Optimistic • “concurrent transaction are unlikely to conflict” • Detect un-serialized transaction • transform such un-serialized transaction into a deadlock • Essential property • Two troupe members succeed in committing two transaction iff both troupe members attempt to commit the transactions in the same order ready_to_commit Client troupe C Server troupe S Client troupe C’ S1 (T,T’) (T, T’) T’ T S2 (T,T’) (T,T’) (1) (2) Commit or not Commit or not 13

  14. Binding Agents for Distributed Programs • “A binding agent is a mechanism the enables programs to import and export modules by interface name” • [lookups, registration, deletion] can be provided by a general purpose name server • Clients cache the result of lookups • The classic cache invalidation problem • Garbage collection: obsolete registration information 14

  15. Binding Agents for Replicated Programs • Import and export troupes rather than single modules • Binding agent must manipulate sets of module addresses rather than single addresses • More complicated cache invalidation problem • Troup ID as a form of incarnation number 15

  16. Reconfiguration & Recovery from Partial Failure • Detect crash by timeout • Replace crashed troupe • New troupe member (add_troupe_member) • State consistent with that of the other members • Be registered with the binding agent • “get_state” procedure 16

  17. Summary • Details are invisible • Replication transparency • Transfer of control; replicated procedure calls • Circus paired protocol • Many-to-many • Serializability for concurrency control • Binding & reconfiguration 17

More Related