1 / 76

Hermes Reliable Replication Protocol

The presentation slides as appeared in ASPLOS'20 for the paper<br>"Hermes: A Fast, Fault-Tolerant and Linearizable Replication Protocol".

akatsarakis
Télécharger la présentation

Hermes Reliable Replication Protocol

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hermes A Fast, Fault-tolerant and Linearizable Replication Protocol Antonios Katsarakis, V. Gavrielatos, S. Katebzadeh, A. Joshi*, B. Grot, V. Nagarajan, A. Dragojevic† University of Edinburgh, *Intel, †Microsoft Research hermes-protocol.com Thanks to:

  2. Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore 2

  3. Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore 3

  4. Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore 4

  5. Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore 5

  6. Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore 6

  7. Distributed datastores In-memory with read/write API Backbone of online services Need: High performance Fault tolerance Distributed Datastore Mandates data replication 7

  8. Replication 101 Typically 3 to 7 replicas Consistency Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance 9

  9. Replication 101 Typically 3 to 7 replicas Consistency Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance 10

  10. Replication 101 Typically 3 to 7 replicas Consistency Reliable Replication Protocol Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance 11

  11. Replication 101 Typically 3 to 7 replicas Consistency Reliable Replication Protocol Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance 12

  12. Replication 101 Typically 3 to 7 replicas Consistency Reliable Replication Protocol Weak: performance but nasty surprises … … … … Strong: programmable and intuitive Reliable replication protocols • Strong consistency even under faults • Define actions to execute reads & writes à these determine a datastore’s performance Can reliable protocols provide high performance? 13

  13. Paxos Golden standard strong consistency and fault tolerance Low performance reads à writes à à inter-replica communication à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) 15

  14. Paxos Golden standard strong consistency and fault tolerance Low performance reads à writes à à inter-replica communication à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) 16

  15. Paxos Golden standard strong consistency and fault tolerance Low performance reads à writes à à inter-replica communication à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) 17

  16. Paxos Golden standard strong consistency and fault tolerance Low performance reads à writes à à inter-replica communication à multiple RTTs over the network Common-case performance (i.e., no faults) as bad as worst-case (under faults) State-of-the-art reliable protocols exploit failure-free operation for performance 18

  17. Performance of state-of-the-art protocols ZAB replicas Leader 20

  18. Performance of state-of-the-art protocols ZAB Leader Local reads form all replicas à Fast read write ucast bcast 21

  19. Performance of state-of-the-art protocols ZAB Leader Local reads form all replicas à Fast Leader Writes serialize on the leader à Low throughput read write ucast bcast 22

  20. Performance of state-of-the-art protocols ZAB CRAQ Leader Head Tail Local reads form all replicas à Fast Leader Writes serialize on the leader à Low throughput read write ucast bcast 23

  21. Performance of state-of-the-art protocols ZAB CRAQ Leader Head Local reads form all replicas à Fast Tail Local reads form all replicas à Fast Leader Writes serialize on the leader à Low throughput read write ucast bcast 24

  22. Performance of state-of-the-art protocols ZAB CRAQ Leader Head Local reads form all replicas à Fast Tail Local reads form all replicas à Fast Leader Head Writes traverse length of the chain à High latency Tail Writes serialize on the leader à Low throughput read write ucast bcast 25

  23. Performance of state-of-the-art protocols ZAB CRAQ Leader Head Local reads form all replicas à Fast Tail Local reads form all replicas à Fast Leader Head Writes traverse length of the chain à High latency Tail Writes serialize on the leader à Low throughput Fast reads but poor write performance read write ucast bcast 26

  24. Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write 28

  25. Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write 29

  26. Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Head Tail Avoid long latencies 30

  27. Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Leader Avoid write serialization 32

  28. Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Fast, decentralized, fully concurrent writes 33

  29. Key protocol features for high performance Goal: low-latency + high-throughput Reads Local from all replicas Local reads from all replicas Writes Fast - Minimize network hops Decentralized - No serialization points Fully concurrent - Any replica can service a write Existing replication protocols are deficient Fast, decentralized, fully concurrent writes 34

  30. Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write 36

  31. Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write write(A=3) 37

  32. Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write write(A=3) Invalidation I I States of A: Valid, Invalid 38

  33. Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations - Coordinator is a replica servicing a write write(A=3) Invalidation I I At this point, no stale reads can be served Strong consistency! States of A: Valid, Invalid 39

  34. Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid 41

  35. Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability V commit Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid 42

  36. Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability V Validation V V Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid 43

  37. Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability V Validation V V Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid 44

  38. Enter Hermes Broadcast-based, invalidating replication protocol Inspired by multiprocessor cache-coherence protocols Coordinator Followers Fault-free operation: 1. Coordinator broadcasts Invalidations 2. Followers Acknowledge invalidation 3. Coordinator broadcasts Validations - All replicas can now serve reads for this object write(A=3) Invalidation I Ack I Ack Strongest consistency Linearizability V Validation V V Local reads from all replicas à valid objects = latest value States of A: Valid, Invalid What about concurrent writes? 45

  39. Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them 47

  40. Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them 48

  41. Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? Inv(TS1) Inv(TS4) write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them 49

  42. Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? Inv(TS1) Inv(TS4) write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them 50

  43. Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? Inv(TS1) Inv(TS4) write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them 51

  44. Concurrent writes = challenge Challenge How to efficiently order concurrent writes to an object? Inv(TS1) Inv(TS4) write(A=3) write(A=1) Solution Store a logical timestamp (TS) along with each object - Upon a write: coordinator increments TS and sends it with Invalidations - Upon receiving Invalidation: a follower updates the object’s TS - When two writes to the same object race: use node ID to order them Broadcast + Invalidations + TS à high performance writes 52

  45. Writes in Hermes Broadcast + Invalidations + TS 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort 54

  46. Writes in Hermes Broadcast + Invalidations + TS 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort 55

  47. Writes in Hermes Broadcast + Invalidations + TS 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort 56

  48. Writes in Hermes Broadcast + Invalidations + TS 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort 57

  49. Writes in Hermes Broadcast + Invalidations + TS 1. Decentralized Fully distributed write ordering at endpoints 2. Fully concurrent Any replica can coordinate a write Writes to different objects proceed in parallel 3. Fast Writes commit in 1 RTT Writes never abort Awesome! But what about fault tolerance? 58

  50. Handling faults in Hermes Problem A failure in the middle of a write can permanently leave a replica in Invalid state Solution: send write value with Invalidation à Early value propagation 60

More Related