1 / 10

The Chubby Lock Service for Loosely-coupled Distributed Systems

The Chubby Lock Service for Loosely-coupled Distributed Systems. Mike Burrows @OSDI’06 Mosharaf Chowdhury. Problem. (Un)Locking- as-a- service Leader election Synchronization … Uses Paxos to solve the distributed consensus problem in an asynchronous environment. Overview.

almira
Télécharger la présentation

The Chubby Lock Service for Loosely-coupled Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Chubby Lock Service for Loosely-coupled Distributed Systems Mike Burrows @OSDI’06 Mosharaf Chowdhury

  2. Problem • (Un)Locking-as-a-service • Leader election • Synchronization • … • Uses Paxos to solve the distributed consensus problem in an asynchronous environment

  3. Overview Goals/Non-goals Use Cases Planned usage by GFS, BigTable, and MegaStore Also heavily used as Internal name service MapReduce rendezvous point • Primary • Availability • Reliability • Usability & deployability • Secondary • Performance • Non-goals • Storage capacity

  4. System Structure R M R R R Proxy Server Reads are satisfied by the master alone Writes are acknowledged after updating a majority

  5. How to Use it? • UNIX File system like interface • API modeled in a similar way • Read/write locks on each file/directory • Advisory locks • Coarse-grained • Event notification • After corresponding action

  6. Cogs • KeepAlives • Piggybacks event notifications, cache invalidations etc. • Leases/TimeOuts • Master and client-side local leases • Failover handling

  7. Developers are … Human  Despite attempts at education, our developers regularly write loops that retry indefinitely when a file is not present, or poll a file by opening it and closing it repeatedly when one might expect they would open the file just once. Our developers sometimes do not plan for high availability in the way one would wish. However, mistakes, misunderstanding and the differing expectations of our developers lead to efforts that are similar to attacks. Our developers are confusedby non-intuitive caching semantics, so we prefer consistent caching. Developers also fail to appreciate the difference between a service being up, and that service being available to their applications. A lock-based interface is more familiar to our programmers.

  8. Hit or Miss? • Scales to 90000 clients • Can be scaled further using proxies and partitioning • 61 outages in total • 52 under 30s (almost no impact) • 1 outage due to overload • Data loss on 6 occasions • 4 due to software error (fixed) • 2 due to operators Scalable Available Reliable

  9. More Numbers • Naming-related events • 60% file opens • 46% stored files • 10 clients use each cached file out of 230k • 14% of cache are negative caches for names • KeepAlive account for 93% traffic • Uses UDP instead of TCP

More Related