Distributed Data Structures: Overview & Implementation Techniques

CS 294-8Distributed Data Structureshttp://www.cs.berkeley.edu/~yelick/294

Agenda • Overview • Interface Issues • Implementation Techniques • Fault Tolerance • Performance

Overview • Distributed data structures are an obvious abstraction for distributed systems. Right? • What do you want to hide within one? • Data layout? • When communication is required? • # and location of replicas • Load balancing

Distributed Data Structures • Most of these are containers • Two fundamentally difference kinds: • Those with integrators or ability to look at all container elements • Arrays, meshes, databases*, graphs* and trees* (sometimes) • Those with only single element ops • Queue, directory (hash table or tree), all *’d items above

DDS in Ninja • Described in Gribble, Brewer, Hellerstein, Culler • A distributed data structure (DDS) is a self-managing layer for persistent data. • High availability, concurrency, consistency, durability, fault tolerance, scalability • A distributed hash table is an example • Uses two-phase commits for consistency • Partitioning for scalability

Scheduling Structures • In serial code, most scheduling is done with a stack (often implicit), a FIFO queue, or a priority queue • Do all of these makes sense in a distributed setting? • Are there others?

Distributed Queues • Load balancing (work stealing…) • Push new work onto a stack • Execute locally by popping from the stack • Steal remotely by removing from the bottom of the stack (FIFO)

Interfaces (1) • Blocking atomic interfaces: operations happen between invocation and return • Internally each operation performs locking or other form of synchronization • Non-blocking “atomic” interfaces: operation happens sometime after invocation • Often paired with completion synchronization • Request/response for each operation • Wait for all “my” operations to complete • Wait for all operations in the world to complete

Interfaces (2) • Non-atomic interface: use external synchronization • Undefined under certain kinds (or all) concurrency • May be paired with bracketing synchronization • Aquire-insert-lock, insert, insert, Release-insert-lock • Begin-transaction… • Operations with no semantics (no-ops) • Prefetch, Flush copies, … • Operations that allow for failures • Signal “failed”

DDS Interfaces • Contrast: • RDBMS’s provide ACID semantics on transactions • Distributed files systems: NFS weak, Frangipani and AFS stronger • DDS: • All operations on elements are atomic (indivisible, all or nothing) • This seems to mean that the hash table operations that involve a single element are atomic • One-copy equivalence: replication of elements is invisible • No transaction across elements or operations

Implementation Strategies (1) • Two simple techniques • Partitioning: • Used when the d.s. is large • Used when writes/updates are frequent • Replication: • Used when writes are infrequent and reads are very frequent • Used to tolerate failures • Full static replication is extreme; dynamic partial replication is more common • Many hybrids and variations

Implementation Strategies (2) • Moving data to computation good for: • dynamic load balancing • I.e., idle processors grab work • smaller objects in ops involving > 1 object • Moving computation to data good for: • large data structures • Other?

DDS: Distributed Hash Table • Operations include: • Create, Destroy • Put, Get, and Remove • Built with storage “bricks” • Each manage a single node, network-visible hash table • Contain a buffer cache, lock manager, network stubs and skeletons • Data is partitioned, and partitions are replicated • Replica groups are used for each partition

DDS: Distributed Hash Table • Operations on elements: • Get – use any replica in appropriate group • Put or remove – update all replicas in group using two-phase commit • DDS library is commit coordinator • If individual node crashes during commit phase, it is removed from replica • If DDS fails during commit phase, individual nodes will coordinate: if any have committed, all must

DDS: Hash Table Key: 110011 0 1 0 1 0 1 0 1 0 1 DP map RG map

Example: Aleph Directory • Maps names to mobile objects • Files, locks (?), processes,… • Interested in performance at scale, not reliability • Two basic protocols: • Home: each object has a fixed “home” PE that keeps track of cache copies • Arrow: based on path-reversal idea

Path Reversal Find

Path Reversal

Aleph Directory Performance • Aleph is implemented as Java packages on top of RMI (and UDP?) • Run on small systems (up to 16 nodes) • Assumed that “home” centralized solution would be faster at this scale • 2 messages to request; 2 to retrieve • Arrow was actually faster • Log2 p to request; 1 to retrieve • In practice, only 2 to request (counter ex.)

Hybrid Directory Protocol • Essentially the same as the “home” protocol, except • Link waiting processors into a chain (across the processors) • Each keeps the id of the processor ahead of it in the chain • Under high contention, resource moves down the chain • Performance: • Faster than home and arrow on counter benchmark and some others…

How Many Data Structures? • Gribble et al claim: • “We believe that given a small set of DDS types (such as a hash table, a tree, and an administrative log), authors will be able to build a large class of interesting and sophisticated servers.” • Do you believe this? • What does it imply about tools vs. libraries?

Administrivia • Gautam Kar and Joe L. Hellerstein speaking Thursday • Papers online • Contact me about meeting with them • Final projects: • Send mail to schedule meeting with me • Next week: • Tuesday: guest lecture by Aaron Brown on benchmarks; related to Kar and Hellerstein work. • Still to come: Gray, Lamport, and Liskov

Distributed Data Structures: Overview & Implementation Techniques

Distributed Data Structures: Overview & Implementation Techniques

Presentation Transcript

CS 294-8 Consensus Revisited http://www.cs.berkeley.edu/~yelick/294

294

P.291-294 Forgetting

294 Welcome

CS 294-42: Technology Trends

Pg. 294-295

CS 294-8 Self-Stabilizing Systems cs.berkeley/~yelick/294

CS 294-42: Project Suggestions

CS 294-8 Abstraction Functions cs.berkeley/~yelick/294

CS 294-8 ISTORE: Hardware Overview and Software Challenges cs.berkeley/~yelick/294

CS 294-5: Statistical Natural Language Processing

CS 294-8 Consensus cs.berkeley/~yelick/294

Guests: 294

CmpE 294 Feedback

IST 294 - 051

CS 294-8 Extended Static Checking cs.berkeley/~yelick/294

CS 294-8 The Spec Language cs.berkeley/~yelick/294

CHEM 294 CHEMISTRY SEMINAR

CS 294-12 -- October 2002

CS 294-110: Project Suggestions

CS 294-110: Technology Trends

CS 294-42: Project Suggestions