790 likes | 930 Vues
Distributed Queuing and Beyond. Srikanta Tirthapura Brown University. App 1: Distributed Multiplayer Game. Buy horses. Build castle. Easy Implementation: Centralized. Build castle. Scalability Problems?. Alternate Peer-to-Peer Implementation. Build castle. Multicast moves to all players.
E N D
Distributed Queuing and Beyond Srikanta Tirthapura Brown University
App 1: Distributed Multiplayer Game Buy horses Build castle
Easy Implementation: Centralized Build castle Scalability Problems?
Alternate Peer-to-Peer Implementation Build castle Multicast moves to all players
Concurrent moves ordered consistently B: Attack treasury A: Move jewels A Move jewels B A: Move jewels Attack treasury B: Attack treasury Needs Ordered Multicast!
App 2: Synchronizing Access to Mobile Object Object resides at one node at a time
Concurrent Requests for the object Need the file Me too! Me too! Me too!
Distributed Queuing • Distributed Logical Queue in Network • Processors issue requests to join the queue • Request joins the Queue by locating and informing the current tail about itself
Join Queue = Inform Tail New tail
Minimal Knowledge about Queue • Each element of queue only knows its successor (if there is one) • Sometimes need to know the predecessor • In general, immediate neighbors • Can still do useful things with this much knowledge!
Application to Mobile Objects • Accesses (maybe concurrent) to the object are queued • Object passes from request to successor
Application to Ordered Multicast • Old solutions use counting (token numbers) to order messages • We propose using Queuing, which might be faster (Tirthapura, Herlihy and Wattenhofer, 2001)
Traditional solution to Ordered Multicast: Counting Counter Count = 2 Count = 1 Count = 3 Counting can be done in a distributed manner
Ordering Using Counting Count = 1 Count = 3 Count = 2
Our Solution: Ordered Multicast using Queuing Single Distributed Queue of Messages
Ordering using Queuing Behind none Behind green Behind red
Queuing vs Counting • Both can be done in a distributed fashion • Distributed Queuing might be faster than Distributed Counting • Evidence: Most existing counting techniques can be “short-circuited” to queue faster than they can count - Combining Trees, Combining Funnels
Fault Tolerance • Very important issue • No single solution fits all • Tradeoff: Scalability versus Strength of Fault Tolerance Guarantees • Self-stabilization(Tirthapura and Herlihy, DISC 2001)
This Talk • Examine Arrow Protocol, a popular solution for distributed queuing • Performs well in practice • No previous systematic analysis • We give the first theoretical explanation of the concurrent performance
Roadmap • Distributed Queuing Protocols • Our contribution: Theoretical Analysis of Arrow Protocol • Our contribution: Further Applications of Queuing • Related Work, Conclusions and Future Work
Centralized Queuing tail
Centralized Queuing tail
Problems • Scalability: Central Processor is a bottleneck (“hot-spot”) • No locality: Repeated requests from the same node have to travel all the way to the central node and back
Arrow Protocol • Invented by Kerry Raymond (1989), applied to Distributed Directories by Herlihy and Demmer (1998) • Path Reversal on a Spanning Tree of the Network
New Request tail
Path Reversal tail
Path Reversal tail
Path Reversal tail
Experiments • Herlihy and Warres compared Home-based (centralized) and Arrow Protocols for object management [JavaGrande 99] • Arrow protocol outperformed home-based even for small system sizes • We did experiments on a larger scale (80 processor SP-2 machine, MPI)
Can we explain this? • Is there a fundamental reason why it performs well under concurrency? • We provide one such explanation..
Previous Work: Sequential Case • Demmer and Herlihy (1998) and Peleg and Reshef (1998) analyzed the Sequential Case • Requests spaced very far apart in time, next request issued after previous one queued
Our Analysis: Concurrent Case • How does the protocol perform when many nodes attempt to join the queue at the same time? • We analyze the combined cost of all the requests (Tirthapura, Herlihy and Wattenhofer, ACM PODC 2001)
Model • Graph with Nodes = processors, Edges = Communication Links • Edge weight = Delay of Communication Link • Latency = time till the request is queued (time till it informs predecessor)
One-shot Concurrent Problem • All requests occur at same time • An arbitrary subset of nodes R (of size r) request to join the queue at time zero • No more requests added later
Cost Metric • l(v) is latency of node v’s request • Cost of protocol is sum of latencies
Example 3 concurrent requests
bad good
Queuing Order is Important • Good queuing orders produce short paths- Latency to join the queue is small - Distance to successor is small • r requests, r! possible queuing orders • How good an order does Arrow produce? • Remember that Arrow is distributed and has to order requests on the fly!
Competitive Analysis • Compare an on-line distributed algorithm to an optimal “off-line” algorithm • Off-line algorithm has global information and gets synchronization for free(No on-line algorithm can do better) • Competitive ratio is worst case ratio between cost of on-line and off-line
Our Theorem The competitive ratio of Arrow for one-shot case is upper bounded by (log r) s- r is number of concurrent requests - s is the stretch of the (pre-selected) spanning tree T w.r.t. G Stretch of a spanning tree is the overhead of routing on tree versus routing on graph.
Roadmap of proof Two Parts: • Upper bound on Cost of Arrow: Nearest Neighbor characterization of order of queuing • Lower bound on Cost of Off-line algorithm