1 / 22

Detour: Distributed Systems Techniques

Detour: Distributed Systems Techniques. Paxos overview (based on Lampson ’ s talk) Google: Paxos made live (only briefly) Zookeeper: -- wait-free coordination system by Yahoo!. Paxos : Basic Ideas. Paxos : Agent States & Invariants. which follows from. Paxos : Leaders.

colleens
Télécharger la présentation

Detour: Distributed Systems Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detour: Distributed Systems Techniques • Paxos overview (based on Lampson’s talk) • Google: Paxos made live (only briefly) • Zookeeper: -- wait-free coordination system by Yahoo! CSci8211: Distributed Systems: Paxos & zookeeper

  2. Paxos: Basic Ideas

  3. Paxos: Agent States & Invariants which follows from

  4. Paxos: Leaders

  5. Paxos Algorithm

  6. PaxosAlgorithm in Plain English • Phase 1 (prepare): • A proposer selects a proposal number n and sends a prepare request with number n to majority of acceptors. • If an acceptor receives a prepare request with number n greater than that of any prepare request it saw, it responses YES to that request with a promise not to accept any more proposals numbered less than n and include the highest-numbered proposal (if any) that it has accepted.

  7. Paxos Algorithm in Plain English … • Phase 2 (accept): • If the proposer receives a response YES to its prepare requests from a majority of acceptors, then it sends an accept request to each of those acceptors for a proposal numbered n with a values v which is the value of the highest-numbered proposal among the responses. • If an acceptor receives an accept request for a proposal numbered n, it accepts the proposal unless it has already responded to a prepare request having a number greater than n.

  8. Paxos’s Properties (Invariants) • P1: Any proposal number is unique. • P2: Any two set of acceptors have at least one acceptor in common. • P3: the value sent out in phase 2 is the value of the highest-numbered proposal of all the responses in phase 1.

  9. The Paxos Atomic Broadcast Algorithm • Leader based: each process has an estimate of who is the current leader • To order an operation, a process sends it to its current leader • The leader sequences the operation and launches a Consensus algorithm (Synod) to fix the agreement

  10. Failure-Free Message Flow C C request response S1 S1 S1 S1 S1 S2 S2 S2 . . . . . . (“prepare”) . . . (“ack”) (“accept”) Sn Sn Sn Phase 1 Phase 2

  11. Message Flow: Take 2 w/ Optimization C C request response S1 S1 S1 S1 S1 S1 S2 S2 S2 (“prepare”) . . . (“ack”) . . . . . . (“accept”) Sn Sn Sn Phase 1 Phase 2

  12. Highlights of Paxos Made Live • Implement Paxos in a large, practical distributed system • have to consider many practical failure scenarios as well as efficiency issues, and “prove” implementation correct! • e.g., disk failures • Key Features/Mechanisms: • Multi-Paxos: run multiple instances of Paxos to achieve consensus on a series of values, e.g., in a replicated log • Master & Master Leases • (Global) epoch numbers (to handle master crashes) • Group membership: handle dynamic changes in # of servers • Snapshot to enable faster recovery (& catch up) • Handling disk corruption: a replica w/ corrupted disk re-builds its log by participating as a non-voting member until catch up • & good software engineer: runtime checking & testing, etc.

  13. Paxos Made Live: Architecture

  14. Paxos Made Live: Client API

  15. Highlights of ZooKeeper • Zookeeper: wait-free coordination service for processes of distributed applications • wait-free: asynchronous (no blocking) and no locking • with guaranteed FIFO client ordering and linearizable writes • provide a simple & high-performance kernel for building more complex primitives at the client • e.g., rendezvous, read/write locks, etc. • this is in contrast to Google’s Chubby (distributed lock) service, or Amazon’s Simple Queue Service, … • For target workloads: 2:1 to 100:1 read/write ratio, can handle 10^4 – 10^5 transactions per second • Key Ideas & Mechanisms: • A distributed file system like hierarchical namespace to store data objects (“shared states”): a tree of znodes • but with simpler APIs for clients to coordinate processes

  16. ZooKeeper Service Overview • server: process providing ZooKeeper service • client: user of ZooKeeper service • clients establish a session when they connect to ZooKeeper and obtain a handle thru which to issue requests znode: each associated w/ a version #, & can be of two types • regular: create/delete explicitly • ephemeral: delete explicitly or automatically when the session creates it terminates • znode may have a sequential flag: created w/ a monotonically increasing counter attached to the name • watch (on znode): one-time trigger associated with a session to notify a change in znode (or its child subtree) Zookeeper’s hierarchical namespace (data tree)

  17. ZooKeeper Client API • Each client runs a ZooKeeper library: • expose ZooKeeper service interface thru client APIs • manage network connection (“session”) between client & server • ZooKeeper APIs: • Each API has both a synchronous and asynchronous versions

  18. ZooKeeper Primitive Examples • Configuration Management: • E.g., two clients A & B shares a configuration, and can directly communicate w/ each • A makes a change to the configuration & notify B (but the two servers’ configuration replicas may be out of sync!) • Rendezvous • Group Membership • Simple Lock (w & w/o Herd Effect) • Read/Write Locks • Double Barrier Yahoo and other services using ZooKeeper: • Fetch Service (“Yahoo crawler”) • Katta: a distributed indexer • Yahoo! Message Broker (YMB)

  19. ZooKeeper Implementation convert writes into idempotent transactions • ensure linearizable writes ensure client ordering via a pipelined architecture to allow multiple pending requests • each write is handled by a leader, which broadcast the change to others via Zab, an atomic broadcast protocol • server handling a client request uses a simple majority quorum to decide on a proposal to deliver the state change to the client

  20. ZooKeeper and Zab • Zab: atomic broadcast protocol used by Zookeeper to ensure transaction integrity, primary-order (PO) causality total order, and agreement (among replicated processes) • Leader (primary instance)-based: only leader can abcast • Atomic 2-phase broadcast: abcast + abdeliver => transaction committed, otherwise considered “aborted”

  21. More on Zab • Zab atomic broadcast ensures primary-order causality: • “causability” defined only w.r.t. primary instance • Zab also ensures strict causality (or total ordering) • if a process delivers two transactions, one must precede the other in the PO causality order • Zab assumes a separate leader election/selection process (with a leader selection oracle) • processes: leader (starting w/ a new epoch #) and followers • Zab uses a 3-phase protocol w/ quorum (similar to Raft): • Phase 1 (Discovery): agree on new epoch # and discover history • Phase 2 (Synchronization): synchronize the history of all processes using 2PC-like protocol, commit based on quorum • Phase 3 (broadcast): commit a new transaction via a 2PC-like protocol, commit based on quorum

  22. PO Causality & Strict Causality (a) In PO causality order, but not “causal order” (b) In PO causality order, but not “strict causality” order

More Related