1 / 33

Distributed Systems CS 15-440

Learn about Google Chubby, a library and infrastructure for synchronization in distributed systems, and its role in ordered communication. Explore the architecture, interfaces, and caching mechanisms of Chubby.

elutz
Télécharger la présentation

Distributed Systems CS 15-440

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed SystemsCS 15-440 Google Chubby and Message Ordering Recitation 4, Sep 29, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud

  2. Today… • Last recitation session: • Google Protocol Buffers and Publish-Subscribe • Today’s session: • Google Chubby • A Google library and infrastructure for synchronization • Ordered Communication • Ordering events and enforcing ordering while communicating • Announcement: • Project 1 due on Oct 3rd

  3. Overview • Recap • Google Chubby • Ordered Communication

  4. Recap: Google Physical Infrastructure • Google has created a large distributed system from commodity PCs Commodity PC Data Center Cluster Approx 30 racks (around 2400 PCs) 2 high-bandwidth switches (each rack connected to both the switches for redundancy) Placement and replication generally done at cluster level Rack Approx 40 to 80 PCs One Ethernet switch (Internal=100Mbps, external = 1Gbps)

  5. Recap: Google Data center Architecture (To avoid clutter the Ethernet connections are shown from only one of the clusters to the external links)

  6. Recap: Google System Architecture

  7. Recap: Google Infrastructure

  8. Overview • Recap • Google Chubby • Ordered Communication

  9. Google Chubby • Google Chubby offers the coordination and storage services to other services (e.g., to Google File System) • It provides coarse-grained distributed locks to synchronize distributed activities in a large-scale, asynchronous environment • It can be used to support the election of primary in a set of replicas • It can be used as a name-service within Google • It provides a file system offering the reliable storage of small files Chubby is an all-in-one package consisting of file-system, locking service, naming service and election facilitator!

  10. Chubby Interface • Chubby provides an abstraction based on a file system concept that every data object is a file • Files are organized into hierarchical namespace • Example /ls/chubby_cell/directory_name/…/file_name Lock Service An identifier for describing the name of the instance of Chubby

  11. Chubby as a file-system and a locking service • The interface provides an easy mechanism to store small files • Chubby provides following Interfaces • General Interfaces • File-System Interfaces • Locking Service Interfaces

  12. Chubby – General Interfaces • Chubby provides interfaces for opening, closing and deleting a file in its namespace • Open call: Opens a file or directory and returns a handle • Client can specify if the file has to be opened for reading, writing or locking • Close call: Relinquishes the handle • Delete calls: Remove the file or directory

  13. Chubby – File-System Interfaces • Chubby provides two services: • Whole-file reading and writing operations • Single atomic operations are provided to read and write complete data in the file • Chubby can be used to store small files (but not large files) • Access control • A file is associated with an Access Control List (ACL) • ACL can be get and set through interfaces

  14. Chubby – Locking Service Interfaces • In Chubby, a file can be opened as a lock • The owner of the lock has the handle to the file • Chubby provides three interfaces • Acquire: The call gets a handle to the lock • Release: This call releases the lock • TryAcquire: This is a Non-blocking variant of the Acquire call • Chubby provides advisory locks, and not mandatory locks • Advantage: Extra flexibility and resilience • Disadvantage: Programmer has to manage the conflict

  15. Summary of Chubby Interfaces

  16. Chubby Architecture • A Chubby Instance (or a chubby cell) is the first level of hierarchy inside Chubby (ls) /ls/chubby_cell/directory_name/…/file_name • Chubby instance is implemented as a small number of replicated servers (typically 5) with one designated master • Clients access these replicas using Chubby Library • Uses Protocol Buffers to communicate • Replicas are placed at failure-independent sites • Typically, they are placed within a cluster but not within a rack

  17. Chubby Namespace Architecture • The hierarchical namespace of directories and files/locks is maintained in a database at each replicas • The consistency of replicated database is ensured through a consensus protocol that uses operation logs • Logs can be used to reconstruct the state of the system • Problem: Logs can become too large over time • Solution: Chubby takes a snapshot of the system periodically, and erases the old logs

  18. Chubby Session • Chubby Session is the relationship between client and a Chubby cell • KeepAlive messages maintain the session

  19. Client Caching and Consistency • Client caches file data, meta data and handles that are open • Cache consistency • Whenever a mutation is to occur, the associated operation is blocked until all caches are invalidated • Invalidation messages are piggybacked on KeepAlive messages • Disadvantages: • Cached copies are not invalidated, and not simultaneous updated • Operation cannot progress until all replicas are invalidated • Advantages: • Simple and elegant for small files and locks

  20. Chubby Architecture Diagram

  21. Overview • Recap • Google Chubby • Ordered Communication

  22. Ordered Communication • In several applications, ordering of events is vital • For example, consider a flight-booking system Reserve Cancel time Client Server Prices 15% Off Server cancels the reservation before booking – even when the messages are reliably delivered! We will study how to ensure ordered delivery of events in group communication

  23. Ordered Multicast – An Example • An example where total-ordering is necessary • In an eCommerce application, the bank database has been replicated across many servers • Let us consider a 2-replica scenario Event 2 = Add interest of 5% Event 1 = Add $1000 2 1 4 Bal=2000 Bal=2100 Bal=1000 3 Bal=1000 Bal=1050 Bal=2050 Replicated Database The updates from Event 1 and Event 2 should be performed in the same order on every replicated server. Else the data is inconsistent.

  24. Three Types of Ordering • FIFO Order • Causal Order • Total Order

  25. T 1 T 2 F 1 F F 3 2 Time C 1 C 2 C 3 P P P 1 2 3 FIFO Ordering • FIFO Order • If a process sends a multicasts a message m before m’, then no correct process delivers m’ if it has not already delivered m • In the example, • F1 and F2 are in FIFO Order • Drawback: • FIFO Order does not specify any order for the messages generated across different processes • e.g, F1 and F3 can be delivered in any order

  26. T 1 T 2 F 1 F F 3 2 Time C 1 C 2 C 3 P P P 1 2 3 Causal Ordering • Causal Order • If process Pi multicasts a message mi and Pj multicasts mj, and if mimj (operator ‘’ is Lamport’s happened-before relation) then any correct process that delivers mj will deliver mi before mj • Relationship between FIFO and Causal order: • Causal Order implies FIFO Order, but FIFO Order does not imply Causal Order • In the example, C1 and C3 are in Causal Order • Drawback: • The happened-before relation between mi and mj should be induced before communication

  27. T 1 T 2 F 1 F F 3 2 Time C 1 C 2 C 3 P P P 1 2 3 Total Ordering • Total Order • If process Pi multicasts a message mi and Pj multicasts mj, and if one correct process delivers mi before mj then every correct process delivers mi before mj • In the example, T1 and T2 are in Total Order • Drawback: • Total order does not imply FIFO or causal orders

  28. Totally Ordered Multicast • Totally Ordered Multicast is a multicast communication paradigm that ensures that all messages are delivered in the same order at all the receivers • Approach: • Process Pi sends timestampedmulticast message msgito all the receivers in the group • At the sender, the message is buffered in a local queue queuei • Any incoming message at Pj is queued in queuej, according to its timestamp, and acknowledged to every other process. Process 1 Process 2 Process 3 1 1 1 5 3 2 7 1 0 5 0 3 7 1 1 7 0 3 5 2 2 2 4 4 4 6 6 6

  29. Totally Ordered Multicast (cont’d) • A receiver will deliver the message to the application if • The message is at the head of the queue, and • The message has been acknowledged by each other process • Assumptions in Totally Ordered Multicast: • Communication is reliable • There is no out-of-order delivery of messages that are transmitted from the same sender

  30. Application of Vector Clocks: Causally Ordered Multicast • In Causally Ordered Communication, a message m is delivered to an application only if all messages that causally precede m has been received • Vector Clocks allow implementation of Causally Ordered Multicast • Here, a multicast message is delivered to an application in the causal order • Under some criteria, Causally Ordered Multicast is weaker than Totally Ordered Multicast • If two messages are not related to each other, it does not matter in which order they are delivered to the application

  31. Causally Ordered Multicast – An Example

  32. Causally Ordered Multicast – Approach • Clocks are adjusted only when sending and receiving messages • When sending a message mfrom Process Pi: • VCi[i] = VCi[i] + 1 • ts(m) = VCi • When it delivers a message with ts(m): • VCj[k] = max(VCj[k], ts(m)[k]) ; (for all k) • When Pj receives a message m (with timestamp ts(m)) from Pi, it will deliver the message to the application only if: • ts(m)[i] = VCj[i]+1 • m is the next message that Pj was expecting from Pi • ts(m)[k] <= VCj[k]; (for all k != i) • Pj has seen all the messages that have been seen by Piwhen it sent the message m

  33. References • http://perspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspx • http://mobilelocalsocial.com/2010/google-data-center-fire-returns-worldwide-404-errors/ • http://techcrunch.com/2008/04/11/where-are-all-the-google-data-centers/ • http://cdk5.net • http://www.dis.uniroma1.it/~baldoni/ordered%2520communication%25202008.ppt • http://www.cs.uiuc.edu/class/fa09/cs425/L5tmp.ppt

More Related