1 / 49

D 3 S: Debugging Deployed Distributed Systems

D 3 S: Debugging Deployed Distributed Systems. Xuezheng Liu et al, Microsoft Research, NSDI 2008 Presenter: Shuo Tang, CS525@UIUC. Debugging distributed systems is difficult. Bugs are difficult to reproduce Many machines executing concurrently Machines/network may fail

ponce
Télécharger la présentation

D 3 S: Debugging Deployed Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. D3S: Debugging Deployed Distributed Systems XuezhengLiu et al, Microsoft Research, NSDI 2008 Presenter: Shuo Tang, CS525@UIUC

  2. Debugging distributed systems is difficult • Bugs are difficult to reproduce • Many machines executing concurrently • Machines/network may fail • Consistent snapshots are not easy to get • Current approaches • Multi-threaded debugging • Model-checking • Runtime-checking

  3. State of the Arts • Example • Distributed reader-writer locks • Log-based debugging • Step1: add logs • void ClientNode::OnLockAcquired(…) { • … • print_log( m_NodeID, lock, mode); • } • Step2: Collect logs • Step3: Write checking scripts

  4. Problems • Too much manual effort • Difficult to anticipate what to log • Too much? • Too little? • Checking for large system is challenging • A central checker cannot keep up • Snapshots must be consistent

  5. D3S Contribution • A simple language for writing distributed predicates • Programmers can change what is being checked on-the-fly • Failure tolerant consistent snapshot for predicate checking • Evaluation with five real-world applications

  6. D3S Workflow state Conflict! state state state state Predicate: no conflict locks Violation! Checker Checker

  7. Glance at D3S Predicate V0: exposer { ( client: ClientID, lock: LockID, mode: LockMode ) } V1: V0  { ( conflict: LockID ) } as final after (ClientNode::OnLockAcquired) addtuple ($0->m_NodeID, $1, $2) after (ClientNode::OnLockReleased) deltuple ($0->m_NodeID, $1, $2) class MyChecker : vertex<V1> { virtual void Execute( const V0::Snapshot & snapshot ) { …. // Invariant logic, writing in sequential style } static int64 Mapping( const V0::tuple & t ) ; // guidance for partitioning };

  8. D3S Parallel Predicate Checker Lock clients Expose states individually Key: LockID Exposed states (C1, L1, E), (C2, L3, S), (C5, L1, S),… L1 L1 Reconstruct: SN1, SN2, … (C1,L1,E),(C5,L1,S) (C2,L3,S) Checkers

  9. Summary of Checking Language • Predicate • Any property calculated from a finite number of consecutive state snapshots • Highlights • Sequential programs (w/ mapping) • Reuse app types in the script and C++ code • Binary Instrumentation • Supports for reducing the overhead (in the paper) • Incremental checking • Sampling the time or snapshots

  10. Snapshots • Use Lamport clock • Instrument network library • 1000 logic clocks per second • Problem: how does the checker know whether it receives all necessary states for a snapshot?

  11. Consistent Snapshot { (A, L0, S) }, ts=2 { }, ts=10 { (A, L1, E) }, ts=16 A • Membership • What if a process does not have state to expose for a long time? • What if a checker fails? { (B, L1, E) }, ts=6 ts=12 B SA(2) SB(6) Detect failure SA(10) SA(16) Checker M(2)={A,B} SB(2)=?? M(6)={A,B} SA(6)=?? M(10)={A,B} SA(6)=SA(2) check(6) SB(10)=SB(6) check(10) M(16)={A} check(16)

  12. Experimental Method • Debugging five real systems • Can D3S help developers find bugs? • Are predicates simple to write? • Is the checking overhead acceptable? • Case: Chord implementation – i3 • Using predecessors and successors list to stabilize • “holes” and overlap

  13. Chord Overlay • Consistency vs. Availability: cannot get both • Global measure on the factors • See the tradeoff quantitatively for performance tuning • Capable of checking detailed key coverage • Perfect Ring: • No overlap, no hole • Aggregated key coverage is 100% ???

  14. Summary of Results Data center apps Wide area apps

  15. Overhead (PacificA) • Less than 8%, in most cases less than 4%. • I/O overhead < 0.5% • Overhead is negligible in other checked systems

  16. Related Work • Log analysis • Magpie[OSDI’04], Pip[NSDI’06], X-Trace[NSDI’07] • Predicate checking at replay time • WiDS Checker[NSDI’07], Friday[NSDI’07] • P2-based online monitoring • P2-monitor[EuroSys’06] • Model checking • MaceMC[NSDI’07], CMC[OSDI’04]

  17. Conclusion • Predicate checking is effective for debugging deployed & large-scale distributed systems • D3S enables: • Change of what is monitored on-the-fly • Checking with multiple checkers • Specify predicate in sequential & centralized manner

  18. Thank You • Thank the authors for providing some of slides

  19. PNUTSYahoo!’s Hosted Data Serving Platform Brian F. Cooper et al. @ Yahoo! Research Presented by Ying-Yi Liang * Some slides come from the authors’ version

  20. What is the Problem • The web era: web applications • Users are picky – low latency; high availability • Enterprises are greedy – high scalability • Things go fast – new ideas expires very soon • Two ways of developing a cool web application • Making your own fire: quick, cool, but tiring, error prone • Using huge “powerful” building blocks: wonderful, but the market would have shifted away when you are done • Both ways do not scale very well… • Something is missing – an infrastructure specially tailored for web applications!

  21. Web Application Model • Object sharing: Blogs, Flicker, Web Picasa, Youtube, … • Social: Facebook, Twitter, … • Listing: Yahoo! Shopping, del.icio.us, news • They require: • High scalability, availability and fault tolerance • Acceptable latency w.r.t. geographically distributed requests • Simplified query API • Some consistency (weaker than SC)

  22. A 42342 E A 42342 E B 42521 W B 42521 W C 66354 W F 15677 E D 12352 E E 75656 C B 42521 W A 42342 E C 66354 W C 66354 W D 12352 E D 12352 E E 75656 C E 75656 C F 15677 E F 15677 E PNUTS – DB in the Cloud CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Parallel database Geographic replication Indexes and views Structured, flexible schema Hosted, managed infrastructure

  23. Basic Concepts Primary Key Record Tablet Field

  24. A view from 10,000-ft

  25. PNUTS Storage Architecture Clients REST API Routers Message Broker Tablet controller Storage units

  26. Geographic Replication Clients REST API Region 1 Routers Message Broker Tablet controller Region 2 Storage units Region 3

  27. Storage unit Tablets In-region Load Balance

  28. Data and Query Models • Simplified rational data model: tables of records • Typed columns • Typical data types plus the blob type • Does not enforce inter-table relationship • Operation: selection, projection (no join, aggregation, …) • Options: point access, range query, multiget

  29. MIN-Canteloupe SU1 Canteloupe-Lime SU3 Lime-Strawberry SU2 Strawberry-MAX SU1 Storage unit 1 Storage unit 2 Storage unit 3 Record Assignment Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Router Lime Mango Orange Strawberry Tomato Watermelon

  30. Write key k SU SU SU 2 4 1 8 3 5 7 6 Single Point Update Sequence # for key k Write key k Routers Message brokers Write key k Sequence # for key k SUCCESS Write key k

  31. MIN-Canteloupe SU1 Canteloupe-Lime SU3 Lime-Strawberry SU2 Strawberry-MAX SU1 Grapefruit…Pear? Grapefruit…Lime? Lime…Pear? Storage unit 1 Storage unit 2 Storage unit 3 Range Query Router Apple Avocado Banana Blueberry Canteloupe Grape Kiwi Lemon Lime Mango Orange Strawberry Tomato Watermelon

  32. Relaxed Consistency • ACID transactions • Sequential consistency: too strong • Non-trivial overhead for asynchronous settings • Users can tolerate stale data in many cases • Go hybrid: eventual consistency + mechanism for SC • Use versioning to cope with asynchrony Record inserted Delete Update Update Update Update Update Update Update v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Time Generation 1

  33. Relaxed Consistency read_any() Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  34. Relaxed Consistency read_latest() Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  35. Relaxed Consistency read_critical(“v.6”) Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  36. Relaxed Consistency write() Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  37. Relaxed Consistency test_and_set_write(v.7) ERROR Stale version Current version Stale version v. 2 v. 5 v. 1 v. 3 v. 4 v. 6 v. 7 v. 8 Time Generation 1

  38. Membership Management • Record timelines should be coherent for each replica • Updates must be applied to the latest version • Use mastership • Per-record basis • Only one replica has mastership at anytime • All update requests are sent to master to get ordered • Routers & YMB maintain mastership information • Replica receiving frequent write req. gets the mastership • Leader election service provided by ZooKeeper

  39. ZooKeeper • A distributed system is like a zoo, someone needs to be in charge of it. • ZooKeeper is a highly available, scalable coordination svc. • ZooKeeper plays two roles in PNUTS • Coordination service • Publish/subscribe service • Guarantees: • Sequential consistency; Single system image • Atomicity (as in ACID); Durability; Timeliness • A tiny kernel for upper level building blocks

  40. ZooKeeper: High Availability • High availability via replication • A fault-tolerant persistent store • Providing sequential consistency

  41. ZooKeeper: Services • Publish/Subscribe Service • Contents stored in ZooKeeper organized as directory trees • Publish: write to specific znode • Subscribe: read specific znode • Coordination via automatic name resolution • By appending sequence number to names • CREATE(“/…/x-”, host, EPHEMERAL | SEQUENCE) • “/…/x-1”, “/…/x-2”, … • Ephemeral nodes: znodes living as long as the session

  42. ZooKeeper Example: Lock 1) id = create(“…/locks/x-”, SEQUENCE | EMPHEMERAL); 2) children = getChildren(“…/locks”, false); 3) if (children.head == id) exit(); 4) test = exists(name of last child before id, true); 5) if (test == false) goto 2); 6) wait for modification to “…/locks”; 7) goto 2);

  43. ZooKeeper Is Powerful • Many core svc. in distributed sys. built on ZooKeeper • Consensus • Distributed locks (exclusive, shared) • Membership • Leader election • Job tracker binding • … • More information at http://hadoop.apache.org/zookeeper/

  44. Experimental Setup • Production PNUTS code • Enhanced with ordered table type • Three PNUTS regions • 2 west coast, 1 east coast • 5 storage units, 2 message brokers, 1 router • West: Dual 2.8 GHz Xeon, 4GB RAM, 6 disk RAID 5 array • East: Quad 2.13 GHz Xeon, 4GB RAM, 1 SATA disk • Workload • 1200-3600 requests/second • 0-50% writes • 80% locality

  45. Scalability

  46. Sensitivity to R/W Ratio

  47. Sensitivity to Request Dist.

  48. Related Work • Google BigTable/GFS • Fault-tolerance and consistency via Chubby • Strong consistency – Chubby not scalable • Lack of geographic replication support • Targeting analytical workloads • Amazon Dynamo • Unstructured data • Peer-to-peer style solution • Eventual consistency • Facebook Cassandra (still kind of a secret) • Structured storage over peer-to-peer network • Eventual consistency • Always writable property – success even in the face of a failure

  49. Discussion • Can all web applications tolerate stale data? • Is doing replication completely across WAN a good idea? • Single level router vs. B+ tree style router hierarchy • Tiny service kernel vs. stand alone services • Is relaxed consistency just right or too weak? • Is exposing record versions to applications a good idea? • Should security be integrated into PNUTS? • Using pub/sub service as undo logs

More Related