CS 603 Mid-Semester Review

CS 603Mid-Semester Review March 4, 2002

One or Two Day Review? • One day: Skim material and Test Overview • What to do with Wednesday? • More on replication • Start on distributed processes • Two day: Discuss material to date • Wednesday: • Finish Review • Work out sample question

Basics • Why do we want distributed systems? • Scaling • Heterogeneity • Geographic Distribution • What is a distributed system? • Transparency vs. Exposing Distribution • Hardware Basics • Communication Mechanisms

Basic Software Concepts • Hiding vs. Exposing • Distribution – Distributed OS • Location, but not distribution – Middleware • None – Network OS • Concurrency Primitives • Semaphores • Monitors • Distributed System Models • Client-Server • Multi-Tier • Peer to Peer

Communication Mechanisms • Shared Memory • Enforcement of single-system view • Delayed consistency: δ-Common Storage • Message Passing • Reliability and its limits • Stream-oriented Communications • Remote Procedure Call • Remote Method Invocation

RPC Example: DCE • Language / Platform Independent • Implementation Issues: • Data Conversion • Underlying Mechanisms • Fault Tolerance Approaches

Java RMI • Supports remote invocation of Java objects • Key: Java Object SerializationStream objects over the wire • Language specific • Advantages • True object-orientation: Objects as arguments and values • Mobile behavior: Returned objects can execute on caller • Integrated security • Built-in concurrency (through Java threads) • Disadvantage – Java only • Implementation / Use • Registry

SOAP • Goal: RPC protocol that works over wide area networks • Interoperable • Language independent • Problem: Firewalls • Solution: HTTP/XML • Client side: Ability to generate http calls and listen for response • Server: • Listen for HTTP • Bind to procedure • Respond with HTTP • SOAP message format and use mechanisms

Naming Requirements • Disambiguate only • Access resource given the name • Build a name to find a resource • Do humans need to use name? • Static/Dynamic Resource • Performance Requirements

Naming Approaches • Scope • Global vs. Hierarchical • Unique ID vs. Non-Unique Description • Namespaces • URN, URI, URL • Registries

Registry Example: X.500 • Goal: Global “white pages” • Lookup anyone, anywhere • Developed by Telecommunications Industry • ISO standard directory for OSI networks • Idea: Distributed Directory • Application uses Directory User Agent to access a Directory Access Point

Directory Information Base(X.501) • Tree structure • Root is entire directory • Levels are “groups” • Country • Organization • Individual • Entry structure • Unique name • Build from tree • Attributes: Type/value pairs • Schema enforces type rules • Alias entries

X.500 • Directory Entry: • Organization level – CN=Purdue University, L=West Lafayette • Person level – CN=Chris Clifton, SN=Clifton, TITLE=Associate Professor • Directory Operations • Query, Modify • Authorization / Access control • To directory • Directory as mechanism to implement for others

X.500 – Distributed Directory • Directory System Agent • Referrals • Replication • Cache vs. Shadow copy • Access control • Modifications at Master only • Consistency • Each entry must be internally consistent • DSA giving copy must identify as copy

X.500 Subsets • LDAP • X.500 without OSI • Intended for use over IP • Active Directory • Microsoft’s answer to LDAP • Extensible “default” naming schema • Limited replication facilities

Clock Synchronization • Definition: All nodes agree on time • What do we mean by time? • What do we mean by agree? • Lamport Definition: Events • Events partially ordered • Clock “counts” the order

Event-based definition(Lamport ’78) Define partial order of processes • A  B: A “happened before” B: Smallest relation such that: • If A and B in same process and A occurs first, A  B • If A is sending a message and B is receipt of a message, A  B • If A  B and B  C, then A  C • Clock: C(x) is time x occurs: • C(x) = Ci(x) where x running on node i. • Clocks correct if  a,b: ab  C(a) < C(b)

Lamport Clock Implementation • Node i Increments Ci between any two successive events • If event a is sending of a message m from i to j, • m contains timestamp Tm = Ci(a) • Upon receiving m, set Cj≥ current Cj and > Tm • Can now define total ordering. a  b iff: • Ci(a) < Cj(b) • Ci(a) = Cj(b) and Pi < Pj

What if we want “wall clock” time? • Ci must run at correct rate: • κ << 1 such that | dCi(t)/dt – 1 | < κ • Synchronized: •  small ε such that  i,j: | Ci(t) – Cj(t) | < ε • Assume transmission time between μ and μ+ξ • Algorithm: Upon receiving message m,set Cj(t) = max(Cj(t), Tm+μ) • Theorem: Assume every τ seconds a message with unpredictable delay ξ is sent over every arc. Then t ≥ t0 + τd, ε≈ d(2κτ + ξ)

Clock Synchronization:Limits • Best Possible: Delay Uncertainty • Actually ε(1 – 1/n) • Synchronization with Faults • Faulty clock • Communication Failure • Malicious processor • Worst case: Can only synchronize if < 1/3 processors faulty • Better if clocks can be authenticated

Real example: NTP I doubt you need to review this...

Process Synchronization • Problem: Shared Resources • Model as sequential or parallel process • Assumes global state! • Alternative: Mutual Exclusion when Needed • Coordinator approach • Token Passing • Timestamp

Mutual Exclusion • Requirements • Does it guarantee mutual exclusion? • Does it prevent starvation? • Is it fair? • Does it scale? • Does it handle failures?

CS 603Mid-Semester Review March 6, 2002

Mutual Exclusion:Colored Ticket Algorithm • Goals: • Decentralized • Fair • Fault tolerant • Space Efficient • Idea: Numbered Tickets • Next number gets resource • Problem: Unbounded Space • Solution: Reissue blocks

Multi-ResourceMutual Exclusion • New Problem: Deadlock • Processes using all resources • Each needs additional resource to proceed • Dining Philosophers Problem • Coordinated vs. truly distributed solutions • Problems with deterministic solutions • Probabilistic solution – Lehman & Rabin • Starvation / fairness properties

Distributed Transactions • ACID properties • Issues: • Commit Protocols • Fault Tolerance Why is this enough? • Failure Models and Limitations • Mechanisms: • Two-phase commit • Three-phase commit

Two-Phase Commit(Lamport ’76, Gray ’79) • Central coordinator initiates protocol • Phase 1: • Coordinator asks if participants can commit • Participants respond yes/no • Phase 2: • If all votes yes, coordinator sends Commit • Participants respond when done • Blocks on failure • Participants must replace coordinator • If participant and coordinator fail, wait for recovery • While blocked, transaction must remain Isolated • Prevents other transactions from completing

Transaction Model • Transaction Model • Global Transaction State • Reachable State Graph • Local states potentially concurrent if a reachable global state contains both local states • Concurrency set C(s) is all states potentially concurrent with s • Sender set S(s) = {local states t | t sends m and s can receive m} • Failure Model • Site failure assumed when expected message not received in time • Independent Recovery

Problems with 2-PC • Blocking on failure • 3-PC as solution • Theorems on recovery limits • Independent recovery: No two-site failure • Non-independent recovery • Anything short of total failure okay • Recovery protocol for total failure

c1 a1 c2 a2 3PC assuming timeout on receipt of message Coordinator Participant q1 q2 start xact/ no start xact/ yes xact request/ start xact abort/ - w1 w2 no/ abort yes/ pre-commit pre-commit/ ack p1 p2 ack/commit commit/ -

Termination Protocol • If participant times out in w2 or p2: • Elect new Coordinator If coordinator alive, would have committed/aborted • New coordinator requests state of all processes. Termination rules: • If any aborted, broadcast abort • If any committed, broadcast commit • If all w2, broadcast abort • If any p2, send pre-commit and enter state p1 • Complete failure protocol

Test Basics • Mechanics: Open book/notes • No electronic aids • Two questions • Each multi-part • Will include scoring suggestions • Underlying question: Do you understand the material? • No need to regurgitate “best in literature” answer • Reasonable self-designed solution fine • Key: Do you really understand your answer • Can you build CORRECT distributed systems?

Develop synchronization protocol for a four processor system with fully-connected processors. Linear envelope of real time Bounded difference between clocks on correct processors. Time set to 0 when the protocol begins (but not synchronized). Assume: Clocks don't drift Messages take between time 0 and e At most one faulty processor No authentication Discuss the correctness of your algorithm, including the types of faults handled. Scoring: Protocol: Up to five points Argument for correctness: 2 points requires believable proof sketch for full 2 points Faults supported / not supported: 1-3 points 3 points requires proof sketch that it handles supported faults and examples showing failure with unsupported fault types. Sample Question:Clock Synchronization

CS 603 Mid-Semester Review

CS 603 Mid-Semester Review

Presentation Transcript

CS 603 Failure Recovery

Mid-Semester Examination Information

CS 603 Data Replication

Project Mid Semester

CS 603 Distributed Transactions

CS 603: Programming Languages

Mid-semester Reflection

Mid-Semester Project:

CS 603 CORBA Security

CS 603: Programming Languages

CS 603 DCOM

Mid Semester Presentation

The Mid-Semester Review – Bridge the Gap

CS 603 Jini

CS 603 Three-Phase Commit

CS 603 Failure Models

SEMESTER 2 MID TERM REVIEW

Mid-Semester Exam Review

Project Mid Semester

Semester 2 Mid Term Review

Mid-Semester Review

Mid-Semester Design Review