1 / 87

ZooKeeper Tutorial

ZooKeeper Tutorial. Flavio Junqueira Benjamin Reed Yahoo! Research. hCps://cwiki.apache.org/confluence/display /ZOOKEEPER /EurosysTutorial. Eurosys 2011 ‐ Tutorial. 1. Used for 14-848 Discussion, 11/6/2017 by Gregory Kesden. Plan for today. First half Part 1

Télécharger la présentation

ZooKeeper Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ZooKeeperTutorial Flavio Junqueira BenjaminReed Yahoo!Research hCps://cwiki.apache.org/confluence/display/ZOOKEEPER/EurosysTutorial Eurosys 2011 ‐Tutorial 1 Used for 14-848 Discussion, 11/6/2017 by Gregory Kesden

  2. Plan fortoday • Firsthalf • Part1 • Motivation andbackground • Part2 • How ZooKeeper works onpaper • Secondhalf • Part3 • Share some practicalexperience • Programmingexercises • Part4 • Somecaveats • Wrapup Eurosys 2011 ‐Tutorial 2

  3. ZooKeeperTutorial Part 1 Fundamentals

  4. Yahoo!Portal Search E‐mail Finance Weather News Eurosys 2011 ‐Tutorial 4

  5. Yahoo!: Workloadgenerated • Homepage • 38 million users a day(USA) • 2.5 billion users a month(USA) • Websearch • 3 billion queries amonth • E‐mail • 90 million actualusers • 10min/visit Eurosys 2011 ‐Tutorial 5

  6. Yahoo!Infrastructure • Lots ofservers • Lots ofprocesses • High volumes ofdata • Highly complex soawaresystems • … and developers are meremortals • Yahoo! Lockport DataCenter Eurosys 2011 ‐Tutorial 6

  7. Coordinationisimportant Eurosys 2011 ‐Tutorial 7

  8. Coordinationprimitives • Semaphores • Queues • Leaderelection • Groupmembership • Barriers • Configuration Eurosys 2011 ‐Tutorial 8

  9. Even small ishard… Eurosys 2011 ‐Tutorial 9

  10. A simplemodel • Workassignment • Master assignswork • Workers execute tasks assigned bymaster Master Worker Worker Worker Worker Eurosys 2011 ‐Tutorial 10

  11. Mastercrashes • Single point offailure • No work isassigned • Need to select a newmaster Master Worker Worker Worker Worker Eurosys 2011 ‐Tutorial

  12. Workercrashes • Not as bad… Overall system stillworks • – Does not work if there aredependencies • Some tasks will never beexecuted • Need to detect crashedworkers • Master Worker Worker Worker Worker Eurosys 2011 ‐Tutorial

  13. Worker does not receiveassignment • Same problem asbefore • Some tasks may not be executed • Need to guarantee that worker receives assignment • Master Worker Worker Worker Worker Eurosys 2011 ‐Tutorial

  14. Fault‐tolerant distributedsystem Master CoordinationService Master Worker Worker Worker Worker Worker Worker Eurosys 2011 ‐Tutorial

  15. Fault‐tolerant distributedsystem Master CoordinationService Master Worker Worker Worker Worker Worker Worker Eurosys 2011 ‐Tutorial

  16. Fullydistributed CoordinationService Worker Worker Worker Worker Worker Worker Eurosys 2011 ‐Tutorial

  17. Fallacies of distributedcomputing The network isreliable. Latency iszero. Bandwidth isinfinite. The network issecure. Topology doesn'tchange. There is oneadministrator. Transport cost iszero. The network is homogeneous. Peter Deutsch,http://blogs.sun.com/jag/resource/Fallacies.html Eurosys 2011 ‐Tutorial

  18. One morefallacy • You know who is alive Eurosys 2011 ‐Tutorial

  19. Why is itdifficult? • FLP impossibilityresult • Asynchronoussystems • Consensus is impossible if a single process cancrash • Fischer, Lynch, Paterson, ACM PODS,1983 • According to Herlihy, we do needconsensus • Wait‐free synchronization • Wait‐free: completion in a finite number ofsteps • Universal object: equivalent to solving consensus forn • processes • Herlihy, ACM TOPLAS,1991 Eurosys 2011 ‐Tutorial

  20. Why is itdifficult? • CAPprinciple • – Can’t obtain availability, consistency, and parOtiontolerancesimultaneously Availability ParOtiontolerance Consistency Gilbert, Lynch, ACM SIGACT NEWS, 2002 Eurosys 2011 ‐Tutorial 20

  21. The case for a coordinationservice • Many impossibilityresults • Many fallacies to stumbleupon • Several common requirements across applications • Duplicating isbad • Duplicating poorly is evenworse • Coordination service • Implement it once andwell • Share by a number ofapplications Eurosys 2011 ‐Tutorial

  22. Currentsystems • Chubby,Google • Lockservice • Burrows, USENIX OSDI,2006 • Centrifuge,Microsoft • Leaseservice • Adya et al., USENIX NSDI,2010 • ZooKeeper,Yahoo! • Coordinationkernel • On Apache since2008 • Hunt et al., USENIX ATC,2010 Eurosys 2011 ‐Tutorial

  23. Example – Bigtable,HBase • Sparse column‐oriented datastorage • Tablet: range ofrows • Unit ofdistribution • Architecture • Master • Tabletservers Eurosys 2011 ‐Tutorial

  24. Example – Bigtable,HBase • Masterelection • Tolerate mastercrashes • Metadatamanagement • ACLs, Tabletmetadata • Rendezvous • Find tabletserver • Crashdetection • Live tabletservers Eurosys 2011 ‐Tutorial

  25. Example – Webcrawling • Fetchingservice Master • – Fetch Web pages for search engine • Masterelection • Assignwork • Metadatamanagement • Politenessconstraints • Shards • Crashdetection • Liveworkers Fetcher Fetcher Fetcher Eurosys 2011 ‐Tutorial

  26. And moreexamples… • GFS – Google FileSystem • Masterelection • File systemmetadata • KaCa ‐ Document indexingsystem • Shardinformation • Index version coordination • Hedwig – Pub‐Subsystem • Topicmetadata • Topicassignment Eurosys 2011 ‐Tutorial

  27. Summary of Part1 • Large infrastructures requirecoordination • Fallacies of distributedcompuOng • Theory results: FLP,CAP • Coordination services • Examples • Websearch • Storagesystems Eurosys 2011 ‐Tutorial

  28. ZooKeeperTutorial Part 2 Theservice

  29. ZooKeeperIntroduction • Coordinationkernel • Does not export concreteprimitives • Recipes to implementprimitives • File system basedAPI • Manipulate small data nodes:znodes Eurosys 2011 ‐Tutorial 29

  30. ZooKeeper:Overview Ensemble ClientApp Follower ZooKeeper ClientLib Session Leader atomically broadcast updates Leader ClientApp Follower ZooKeeper ClientLib Session Follower Replicated system ClientApp Follower ZooKeeper ClientLib Session Eurosys 2011 ‐Tutorial 30

  31. ZooKeeper: Readoperations Ensemble Read operations processed locally ClientApp Read“x” X =10 Follower ZooKeeper ClientLib Leader ClientApp Follower ZooKeeper ClientLib Follower ClientApp Follower ZooKeeper ClientLib Eurosys 2011 ‐Tutorial 31

  32. ZooKeeper: Writeoperations Ensemble ClientApp Write“x”,11 X =11 Follower ZooKeeper ClientLib X =11 Leader ClientApp X =11 Follower ZooKeeper ClientLib Follower ClientApp Follower ZooKeeper ClientLib Replicates across aquorum Eurosys 2011 ‐Tutorial 32

  33. ZooKeeper: Semantics ofSessions • A prefix of operations submitted through a session areexecuted • Upondisconnection • Client lib tries to contact anotherserver • Before session expires: connect to newserver • Server must have seen a transaction id at least as large as thesession Eurosys 2011 ‐Tutorial 33

  34. ZooKeeper:API • Create znodes:create • – Persistent, sequential,ephemeral • Read and modify data: setData,getData • Read the childrenofznode: getChildren • Check if znode exists:exists • Delete a znode:delete Eurosys 2011 ‐Tutorial 34

  35. ZooKeeper:API • Order • Updates: Totally ordered,linearizable • FIFO order for clientoperations • Read: sequentiallyordered • write(x,10) • Client1: • write(x,11) • Client2: • write(x,10) write(x,11) • Sequen7al: Eurosys 2011 ‐Tutorial 35

  36. ZooKeeper:API • Order • Updates: Totally ordered,linearizable • FIFO order for clientoperations • Read: sequentiallyordered write(x,10) read(x) Client1: read(x) Client2: write(x,10) read(x) read(x) Sequen7al: Eurosys 2011 ‐Tutorial 9

  37. ZooKeeper:Example 1‐ create“/C‐”, “Ci”, sequential, ephemeral 2‐ getChildren“/” 3‐ If not leader, getData“first node” Client1 (C1) / C1/C‐1 Client2 (C2) C3 /C‐2 Client3 (C3) C2/C‐3 Eurosys 2011 ‐Tutorial 10

  38. ZooKeeper: Znodechanges • Znodechanges • Data isset • Node is created ordeleted • Etc… • To learn of znodechanges • Set awatch • Upon change, client receives anotification • Notification ordered before newupdates Eurosys 2011 ‐Tutorial

  39. ZooKeeper: Watches getData “/foo”,true Client return10 / 10 /foo Eurosys 2011 ‐Tutorial

  40. ZooKeeper: Watches Client / 11 /foo setData “/foo”,11 Client returnok Eurosys 2011 ‐Tutorial

  41. ZooKeeper: Watches Client / notification 11 /foo Client Eurosys 2011 ‐Tutorial

  42. Watches, Locks, and the herdeffect • Herdeffect • Large number of clients wake upsimultaneously • Loadspikes • Undesirable Eurosys 2011 ‐Tutorial

  43. Watches, Locks, and the herdeffect Client1 (C1) Client2 (C2) / C1 /C‐1 C2 /C‐2 Client3 (Cn) /C‐m Cn Eurosys 2011 ‐Tutorial

  44. Watches, Locks, and the herdeffect Client1 (C1) Client2 (C2) / C2 /C‐2 Client3 (Cn) /C‐m Cn Eurosys 2011 ‐Tutorial

  45. Watches, Locks, and the herdeffect Client1 (C1) / Client2 (C2) notification C2 /C‐2 Client3 (Cn) /C‐m Cn notification Eurosys 2011 ‐Tutorial

  46. Watches, Locks, and the herdeffect • Asolution • Use order ofclients • Eachclient • Determines the znode z preceding its own znode in the sequentialorder • Watchz • A single notification is generated upon a crash • Disadvantage for leaderelection • One client is notified of a leaderchange Eurosys 2011 ‐Tutorial

  47. Linearizability • Correctnesscondition • Informaldefinition • Order of operations is equivalent to a sequential execution • Equivalent order satisfies real time precedenceorder write(x,10) read(x) Client1: read(x) Client2: write(x,10) read(x) read(x) Sequential: Eurosys 2011 ‐Tutorial

  48. Linearizability • Correctnesscondition • Informaldefinition • Order of operations is equivalent to a sequential execution • Equivalent order satisfies real time precedenceorder write(x,10) read(x) Client1: read(x) Client2: write(x,10) read(x) read(x) Sequen7al: Eurosys 2011 ‐Tutorial

  49. Linearizability • Correctnesscondition • Informaldefinition • Order of operations is equivalent to a sequential execution • Equivalent order satisfies real time precedenceorder write(x,10) read(x) Client1: read(x) Client2: write(x,10) read(x) read(x) Sequen7al: Eurosys 2011 ‐Tutorial

  50. Linearizability • Is it important? Itdepends… • Implements universal object • Herlihy’sresult • Implement consensus for nprocesses Eurosys 2011 ‐Tutorial

More Related