The Chubby Lock Service for Loosely-coupled Distributed Systems
260 likes | 518 Vues
The Chubby Lock Service for Loosely-coupled Distributed Systems. Mike Burrow, Google Inc Presented by Xin (Joyce) Zhan. Outline. Design System structure Locks, caching, failovers Scaling mechanism Use and observations As name service Failover problems.
The Chubby Lock Service for Loosely-coupled Distributed Systems
E N D
Presentation Transcript
The Chubby Lock Service for Loosely-coupled Distributed Systems Mike Burrow, Google Inc Presented by Xin (Joyce) Zhan
Outline • Design • System structure • Locks, caching, failovers • Scaling mechanism • Use and observations • As name service • Failover problems
Lock service for distributed system • Synchronize access to shared resources • Other usage • Primary election, meta-data storage, name service • Reliability, availability
System Structure • Set of replicas • Periodically elected master • Master lease • Paxos protocol • All client requests are directed to master • updates propagated to replicas • Replace failed replicas • master periodically polls DNS
Design • Store small files • Event notification mechanism • Consistent caching • Advisory lock (vs.mandatory) • confilct only when others attempt to acquire the same lock • Coarse grained locks • survive lock server failures
Design - File Interface • Ease distribution • /ls/fool/wombat/pouch • Node meta-data include Access Control Lists • Handle • analogous to UNIX file descriptors • support for use across master changes
Design - Sequencer for lock • Delayed / Out-of-order messages • introduce sequence numbers into interactions that use locks • lock holder requests a sequencer, pass it to file server to validate • Alternative • lock-delay
Design - Events • Client subscribes when creating handle • Delivered async via up-call from client library • Event types • file contents modified • child node added / removed / modified • Chubby master failed over • handle / lock have become invalid • lock acquired / conflicting lock request (rarely used)
Design - Caching • Clients cache file data and meta data • Consistent, write-through • Invalidation • master keeps list of what clients may have cached • master sends invalidations on top of KeepAlive • clients flush changed data, ack. with KeepAlive • server proceeds the modification only after invalidation • Clients cache open handle and locks
Design - Sessions • Session maintained through KeepAlives • handles, locks, cached data remain valid • lease • Lease timeout advanced when • creation of a session • master fail-over occurs • master responds to KeepAlive RPC
Design - KeepAlive • Master responds close to lease timeout • Client sends another KeepAlive immediately • Client maintains local lease timeout • conservative approximation • When local lease expires • disable cache • session in jeopardy, client waits in grace period • cache enabled on reconnect • Application informed about session changes • Jeopardy/safe/expired event
Design - Failovers • In-memory state discarded • sessions, handles, locks, etc. • Lease timer “stops” • Fast master election • client reconnect before lease expires • Slow master election • clients flush cache, enter grace period • New master reconstruct the assumption of in-memory state of previous master
Design - Failovers Steps of newly-elected master: • Pick new epoch number • Respond only to master location requests • Build in-memory state for sessions / locks from database • Respond to KeepAlives • Emit fail-over events to sessions, flush caches • Wait for acknowledgements / session expire • Allow all operations to proceed • Allow clients to use handles created before fail-over • Delete ephemeral files w/o open handles after an interval
Design - Backup and Mirroring • Master writes snapshots every few hours • GFS server in different building • Collection of files mirrored across cells • /ls/global/master mirrored to /ls/cell/slave • Mostly for configuration files • Chubby’s own ACLs • Files advertising presence / location • pointers to Bigtable cells
Design - Scaling Mechanisms • 90,000 clients communicate with one cell • Regulate the number of Chubby cells • client use the nearby cell • Increase lease time • Client caching • Protocol-conversion servers
Scaling - Proxies • Proxies pass requests from clients to cell • Reduce traffic of KeepAlive and read requests • Not writes, but writes << 1% of workload • KeepAlive traffic by far most dominant • Overheads: • additional RPC for writes / first time reads • increased probability of unavailability
Scaling - Partitioning • Namespace of a cell partitioned between servers • N partitions, each with master and replicas • Node D/C stored on P(D/C) = hash(D) mod N • meta-data for D may be on different partition • Little cross-partition communication • Reduce R/W traffic, no necessarily KeepAlive
Use and Observations Many files for naming Config, ACL, meta-data common 10 clients use each cached file, on avg. Few locks held, no shared locks KeepAlives dominate RPC traffic
Use as Name Service • DNS uses TTL values • entries must be refreshed within that time • huge (and variable) load on DNS server • Chubby’s caching uses invalidations, no polling • client builds up needed entries in cache • name entries further grouped in batches
Failover problems • Master writes sessions to DB when created • Overload when start of many processes at once • Instead, store session at first modification / lock acquisition etc. • Active sessions recorded with probability on KeepAlive • spread out writes in time • young read-only session may be discarded in a fail-over
Failover problems • New design – do not record sessions in database • recreate them like handles after fail-over • new master waits full lease time before operations proceed
Lesson learnt • Developers rarely consider availability • should plan for short Chubby outages • Fine-grained locking not essential • Poor API choices • handles acquiring locks cannot be shared • RPC use affects transport protocols • forced to send KeepAlives by UDP for timeliness