200 likes | 226 Vues
This paper discusses the key enablers of fault-tolerance in the Legion system, including flexible and extensible protocol stack, objects, reflective architecture, computation graphs, and exception propagation model.
 
                
                E N D
Fault-Tolerance Enablers in Legion Anh Nguyen-Tuong February 18, 1997
Fault-Tolerance • Fault-Tolerance is one of the The Ten Challenges - “SHEEMSPRAF” • “Millions of Hosts, Billions of objects” • Very high probability of failure • scale • networks • Writing distributed/parallel fault-tolerant applications is hard
Literature full of FT protocols Most of these are never implemented FT protocols are difficult to understand write correctly reuse FT protocols thus the domain of experts Short-Term Reality
Key Enabling Technologies “Use the FORCE” • Flexible and extensible protocol stack • Objects • Reflective architecture • Computation graphs • Exception propagation model
Flexible & Extensible Protocol Stack • Event-based abstraction for building & extending the protocol stack • n.b.: actually more like a protocol graph • Fault-tolerance “wrapping” technology • Fundamental events for FT • messageIn, messageOut, messageError • methodIn, methodOut, methodError
Objects • Legion architecture is object-oriented • n.b.: does not imply OO language! • Advantages of objects for fault-tolerance • unit of failure • communication via method invocation • semantic information available to FT designers • generic framework & services
Framework for rollback-recovery FT protocols Exploit semantics Replication Service stateless & worm objects transparent replication of objects Generic Framework & Services
Reflective Architecture • Introspective system: dynamic access to reflective information • access to their own implementations • protocol stacks & method invocations • access to their calling environments • similar to Unix shell variables • access to the future of the computation • access to semantic information • access to generic attributes via a Prolog-like interface
Computation Graphs • Graphs have first class status • enables generic FT components that manipulate graphs • e.g. replicators & voters • enables development of new FT protocols by encapsulating information about the future of a computation • enables flexible exception propagation model
D.yo() 2 3 A.bar() B.yo() C.foo() D.retVal() Graphs D.yo() { x = C.foo(A.bar(2),B.yo(3)); print x; }
Generic Voting Replication(in: Graph, out: Graphs) 1st Class Graph V
Environment Graph annotation List of generic items <String : Data> Method invocation carry calling environment Automatic propagation Hidden dynamic parameters useful for library writers fault-tolerance, debugger, security, exceptions... “debugger” : dLOID “console” : cLOID “debugger” : dLOID “console” : cLOID “debugger” : dLOID “console” : cLOID Legion Environments
Legion Exception Propagation Model • “Exception” is a misnomer • security violations, communication errors, IDL errors, resource acquisition errors... • Basic failure detector • communication error detection & notification is oftentimes sufficient! • Exception propagation (not handling) • enables programming language specific exception handling models
Exception Propagation • Key feature of model: • Associate Legion exceptions with computation graphs! • Flexible enough to handle • Backward error propagation • masking • Forward error propagation • Generic • does not have to use “call chain”
Propagate to caller Forward propagate error token propagates forward through graph “excTracker” : computation graph D.yo “excTracker” : D A.bar B.yo C.foo D.ret Exception Propagation Graphs D.yo() { PL_Exception watch(&x); x = C.foo(A.bar(2),B.yo(3)); print x; if (watch.exceptionRaised()) // x is not valid … } Exception! Exception!
Exception Graph • Generic graphs possible! “excTracker” : FD D.yo Failure Detector FD A.bar B.yo Exception! B cannot communicate with C C.foo D.ret
Wrap-Up • Target audience for FT enablers are FT protocol designers • Encapsulate expert knowledge in reusable form • generic framework & services • generic components • Encourage reuse of FT protocols
Key building blocks in place Flexible & extensible protocol stack Objects Reflection Computation graphs Exception model Need to populate with concrete implementations Mentat exception handling (Legion 0.5) FT protocols Method Based Logging (UCSD) 2 Phase Commit Coordinated Checkpointing Replication for stateless objects Status