560 likes | 689 Vues
Design of a New WAN-oriented Group Communication Service. Formal Approach. Roger Khazan MIT LCS April 1, 2002. Distributed Environment. Processes communicate by sending messages Asynchrony can’t distinguish failed and slow processes Failures and Recoveries of processes and communication
E N D
Design of a New WAN-orientedGroup Communication Service.Formal Approach Roger Khazan MIT LCS April 1, 2002
Distributed Environment • Processes communicate by sending messages • Asynchrony • can’t distinguish failed and slow processes • Failures and Recoveries • of processes and communication • Dynamic changes • sets of participating processes • network connectivity and topology • other (e.g., mobility, etc.)
time Modern Distributed Applications • Highly available servers • Web; Video-on-Demand • Collaborative computing • Military command and control • Shared white-board, shared editor, etc. • Online strategy games • Distributed sensors and monitoring
Group Communication (GC) Useful “Building Block” • Group Abstraction • client processes interact in a group • dynamic: join/leave/fail/partition/merge • Group Multicast • various reliability and ordering properties • Group Membership • informs each process of current membership • generates “views”
Deliver ( Msg ) Send ( Grp, Msg ) Join / Leave ( Grp ) View ( Grp, Members, Id) GC Abstraction Group Communication
Virtual Synchrony [Birman, Joseph 87] • Integration of Multicast and Membership • Synchronization of Messages and Views • Includes many different properties • One key property: Processes that go together through same views, deliver same sets of messages. • Powerful abstraction for state-machine replication
Talk Roadmap • Introduction • Group Communication in WANs • Sound Theoretical Foundation
WAN: The Challenge • High and unpredictable message latency • Frequent & often transient network changes • Existing GC systems designed for LANs: • once started, unable to respond to changes • protocols take several communication rounds • Performance issues • Number of communication rounds matters • View changes are costly • Scalability issues • Large groups
Our Contributions[KK ICDCS00, Submitted to SICOMP] • WAN-oriented Architecture • New Efficient Algorithm Design of a GC system for WANs: Foundation for the Xpand GC system for WANs implemented at MIT and HUJI
Deliver ( Msg ) Send ( Grp, Msg ) Join / Leave (Grp) View ( Grp, Members, Id) Recall: GC Abstraction Group Communication Virtual Synchrony = Membership + Communication
Scalable Architecture Small group of membership servers manages membership for a large group of clients [Anker et al. ’98, Keidar et al. ‘00] Decouple Membership and Communication
Virtual Synchrony = Membership + Communication Virtual Synchrony Block MCAST MBRSHP View View send deliver MBRSHP Virtual Synchrony: Communication and Membership algorithms are intertwined Those designs that separate VS and Membership, are still tightly coupled However, existing designs... • VS controls membership • Communication both ways • One waits for another • Challenge: Effective Decoupling
Virtual Synchrony MCAST view start MBRSHP We succeeded in effective decoupling of VS and Membership • Membership service is free to reconfigure • VS reacts dynamically • VS executes in parallel with view formation • One-way communication • Low-message overhead • Critical for scalability & performance • Adopted by the Moshe system [KSMD00, MIT+UCSD]
Talk Roadmap • Introduction • Group Communication in WANs • WAN-oriented Architecture • Efficient Algorithm for VS • Sound Theoretical Foundation
Recall: Virtual Synchrony • Integration of Multicast and Membership • Synchronization of Messages and Views • Key property: Processes that go together through same views, deliver same sets of messages.
Example: Virtual Synchrony VS algorithm executes: r learns it missed m and delivers m
Virtual Synchrony: How To? • Before moving into new view: • Need to know which synch messages to use, since there may be several view proposals Exchange synch messages to agree which application messages to deliver in old view.
Existing Solutions • Limit Reconfiguration • Do not allow joins during reconfiguration • When someone wants to join: • first, deliver view without joiner; • then, start new reconfiguration. • Use common id to identify synch msgs for same view proposal
Problems with Existing Solutions • Limited Reconfiguration • Obsolete views delivered to application • Creates overhead • Limits usefulness of virtual synchrony • Use of common id to identify synch msgs • Pre-agreement or dissemination is required • Costly, especially in WANs Clients care how quickly a new, stable view is delivered after mbrshp event (ME) occurs
Our Goals • Quick response to changes in connectivity • no obsolete views • do not postpone view formation • Quick synchronization and view delivery • more efficient algorithm for Virtual Synchrony • decouple VS from the membership protocol • execute two protocols in parallel
Our Idea • Issue locally unique id to each process, when starting to form new views • Tag synch msgs with these local ids • View includes vector of latest local ids • View is a triple: e.g., < 4, {p, q, r}, [8, 9, 3] > • Procs use sync msgs identified by view • Hence, procs use right sync msgs
Virtual Synchrony Algorithm Summary • Single round of communication • after the final change in the membership • previous solutions: at least two. (50% speedup!) • Executes in parallel with view formation • reacts dynamically to changes in membership • These innovations are critical for WANs • Can work with the optimistic, single-round membership service, Moshe [MIT & UCSD] • Implemented as part of Xpand [MIT & HUJI]
Talk Roadmap • Introduction • Group Communication in WANs • WAN-oriented architecture • Efficient Algorithm for VS • Sound Theoretical Foundation
Fundamental Issue in Distributed Computing • Creating sound algorithms and systems • Difficult, because environment is complex • design, specification, reasoning about behavior, • establishing characteristics, e.g., correctness, f-t • The only remedy is being precise and formal • the community understands this • need cost-effective, usable, and understandable • However, widely-used approaches do not facilitate precision and rigor. • E.g., CORBA, DCE, and JAVA/Jini – not enough
One of my key areas of expertise • Precise, formal, clear designs • correctness, reliability, f-t, performance, availability, etc. • State-of-the-art techniques + Invent new ones • Highlights of the approach: • compositional approach • formulate problems as precisely-definedservices • specify both interface and behavior • distinguish between safety, liveness, performance, f-t • incremental design and modeling • rigorous proofs and analysis • Exposes inherent problems and subtle points • leads to better, more robust and efficient systems
Group CommunicationState of the Art • Systems: • Horus, Ensemble, Sphynx, Totem, Transis,... • Applications: • Highly available servers, banking, collaborative computing, interactive online games • Specifications and semantics: • often imprecise, ambiguous, confusing, ... • Algorithms and Systems: • ambiguous descriptions, often buggy (e.g., Horus) • Lack of theoretical foundations!
My contribution: Theoretical foundation for a new GC system for WANs • Useful semantics • application-driven: replicated service [KFL DISC98] • Precise specifications of the GC properties • separate safety and liveness properties • Precise, modular algorithm description • clear which part implements which property • separate safety critical parts from optimizations • Rigorous verification • assertional proofs • Formal performance analysis
New general techniques • Inheritance-based technique for incremental construction of specs, algorithms, and proofs • generic framework for reuse of proofs • analogous and complimentary to OO SW reuse • critical for cost-effectiveness and scalability • [KKLS ICSE00; TOSEM01]
Self Delivery VS Delivery Within-View RFIFO “Cornell Approach” Typical Protocol Stack for Virtual Synchrony
auth-send auth-deliver Authentication fifo-send fifo-deliver FIFO Comm. Are authenticated messages delivered in FIFO order?
New general techniques • Compositional performance analysis • performance of the entire systems is expressed as a composition of performance properties of individual components • Inheritance-based technique for incremental construction of specs, algorithms, and proofs • generic framework for reuse of proofs • analogous and complimentary to OO SW reuse • critical for cost-effectiveness and scalability • [KKLS ICSE00; TOSEM01]
My Research Interests • Fault-tolerant distributed and parallel systems • Sound foundation; rigorous, precise approaches • Correctness, reliability, security, performance, etc. • Traditional and highly-dynamic environments • e.g., peer-to-peer, ad hoc, mobile, wireless etc. • Adaptive, dynamic systems. • Theory: specs, designs, verification, analysis,... • Empirical studies, simulations, development,... • Utilize feedback cycles among different phases • Goal: synergy of theory and practice
Scalable Formal Methods [KKLS: ICSE 00, TOSEM 01]
State of the Art Software Engineering • Managing complexity of software systems • Modularity: interacting system components • Incremental techniques: OO, inheritance • Formal modeling and verification • Modularity: compositional theorems • Incremental techniques: lag behind • Limited scalability • Not sufficiently cost-effective
OO SWE Techniques Formal Modeling and Verification Techniques Incremental Techniques for Specifying, Modeling, and Verifying Systems Our Approach, in a Nutshell
parent parent Specification S’ System A’ Incremental Spec and Proof Reuse Specification S System A Implements ?! Prove that A’ implements S’ by relying on proof that A implements S, but without repeating reasoning of that proof.
Why Not Immediate Traces of S Traces of A Traces of S’ Traces of S’ Traces of S’ Traces of A’
Inheritance-based Methodology • Formal Frameworkfor incremental modeling and verification (simulation proofs) • Two modification constructs: • Specialization and interface extension • “Proof Reuse” Theorem • defines simulation between children • reuses and extends simulation between parents • requires proving conditions only about the extension • involves reasoning only about modifications
Some statistics Modeling and Verification (I/O Automaton Model): • Env. specification: Mbrshp (~25 loc). RFIFO (~25loc) • Service spec: incremental. 4 steps. ~50 loc total • Algorithm: ~120 loc. 15 actions. ~10 data-structures • Verification: incremental simulations. ~12 pp. ~20invs. Implementation: • VS library [Tarashchansky] (C++, ~9K loc) • linked with application • membership service [KSMD] (C++ ,~20K loc) • socket interface with members • reliable FIFO service [Anker, et al] (C++, ~4K loc) • linked with VS; uses IP multicast, recovers lost msgs
Contributions:Formal Foundations for a new GCS • Formal design, verification, and analysis • large-scale system • Scalable WAN-oriented architecture • Efficient Algorithm for Virtual Synchrony • only one round, in parallel with view formation • responds immediately to connectivity changes • Scalable formal methods: • incremental modeling and verification • compositional approach to performance analysis
My Approach • Useful abstractions and generic services • application-driven; compositional • Algorithms • environment-driven; separate fast and slow paths • Formal and rigorous • modeling, verification, performance analysis • Supporting theories and methodologies • assist in design, specification, development, etc. • Feedback cycles