940 likes | 1.11k Vues
Overlay Neighborhoods for Distributed Publish/Subscribe Systems. Reza Sherafat Kazemzadeh Supervisor: Dr. Hans-Arno Jacobsen SGS PhD Thesis Defense University of Toronto September 5, 2012. Content-Based Pub/Sub. P. P. P. P. Publish. P. P. Pub/Sub. S. S. S. S. S. P. S. S. S.
E N D
Overlay Neighborhoods for Distributed Publish/Subscribe Systems Reza Sherafat Kazemzadeh Supervisor: Dr. Hans-Arno Jacobsen SGS PhD Thesis Defense University of Toronto September 5, 2012
Content-Based Pub/Sub P P P P Publish P P Pub/Sub S S S S S P S S S S Subscribers Publishers
Thesis Contributions List of publications: [ACM Surveys]Dependable publish/subscribe systems (being submitted) [Middleware’12]Opportunistic Multi-Path Publication Forwarding in Pub/Sub Overlays [ICDCS’12]Publiy+: A Peer-Assisted Pub/Sub Service for Timely Dissemination of Bulk Content [SRDS’11]Partition-Tolerant Distributed Publish/Subscribe Systems [SRDS’09]Reliable and Highly Available Distributed Publish/Subscribe Service [ACM Transactions on Parallel and Distributed Systems]Reliable Message Delivery in Distributed Publish/Subscribe Systems Using Overlay Neighborhoods (being submitted) [Middleware Demos/Posters’12]Introducing Publiy (being submitted)
Thesis Contributions List of publications: [ACM Surveys]Dependable publish/subscribe systems (being submitted) [Middleware’12] Opportunistic Multi-Path Publication Forwarding in Pub/Sub Overlays [ICDCS’12]Publiy+: A Peer-Assisted Pub/Sub Service for Timely Dissemination of Bulk Content [SRDS’11]Partition-Tolerant Distributed Publish/Subscribe Systems [SRDS’09]Reliable and Highly Available Distributed Publish/Subscribe Service [ACM Transactions on Parallel and Distributed Systems]Reliable Message Delivery in Distributed Publish/Subscribe Systems Using Overlay Neighborhoods (being submitted) [Middleware Demos/Posters’12]Introducing Publiy (being submitted)
Thesis Contributions List of publications: [ACM Surveys]Dependable publish/subscribe systems (being submitted) [Middleware’12] Opportunistic Multi-Path Publication Forwarding in Pub/Sub Overlays [ICDCS’12]Publiy+: A Peer-Assisted Pub/Sub Service for Timely Dissemination of Bulk Content [SRDS’11]Partition-Tolerant Distributed Publish/Subscribe Systems [SRDS’09]Reliable and Highly Available Distributed Publish/Subscribe Service [ACM Transactions on Parallel and Distributed Systems]Reliable Message Delivery in Distributed Publish/Subscribe Systems Using Overlay Neighborhoods (being submitted) [Middleware Demos/Posters’12]Introducing Publiy (being submitted)
Thesis Contributions Overlay Neighborhoods List of publications: [ACM Surveys]Dependable publish/subscribe systems (being submitted) [Middleware’12] Opportunistic Multi-Path Publication Forwarding in Pub/Sub Overlays [ICDCS’12]Publiy+: A Peer-Assisted Pub/Sub Service for TimelyDissemination of Bulk Content [SRDS’11]Partition-Tolerant Distributed Publish/Subscribe Systems [SRDS’09]Reliable and Highly Available Distributed Publish/Subscribe Service [ACM Transactions on Parallel and Distributed Systems]Reliable Message Delivery in Distributed Publish/Subscribe Systems Using Overlay Neighborhoods (being submitted) [Middleware Demos/Posters’12]Introducing Publiy (being submitted)
Dependability in Pub/Sub Systems Part I Publications:[SRDS’11]Partition-Tolerant Distributed Publish/Subscribe Systems[SRDS’09]Reliable and Highly Available Distributed Publish/Subscribe Service[ACM Transactions on Parallel and Distributed Systems]Reliable Message Delivery in Distributed Publish/Subscribe Systems Using Overlay Neighborhoods (being submitted)[ACM Surveys]Dependable publish/subscribe systems(being submitted)[Middleware Demos/Posters’12]Introducing Publiy (being submitted)
Challenges of Dependabilityin Content-based Pub/Sub Systems The “end-to-end principle” is not applicable in a pub/sub system • Loose-coupling between publishers and subscribers (endpoints) • End-point cannot distinguish message loss from filtered messages: This is especially true in content-based systems supporting flexible publication filtering Filtered out(not matching sub) Loss cannot be differentiated from filtered pubs ✗ ✗ ✓ ✓ Pub/Sub Middleware ? P S
Overlay Neighborhoods Primary network: An initial spanning tree • Brokers maintain neighborhood knowledge • Allows brokers to transform overlayin a controlled manner d-Neighborhood knowledge(dis a config. parameter): • Knowledge of other brokers within distance d • Knowledge of forwarding paths within neighborhood 3-neighborhood 2-neighborhood 1-neighborhood
Publication Forwarding Algorithm • Received pubs are placed on a FIFO msg queue and kept until processing is complete • All known subscriptions having interest in pare identified after matching • Forwarding path of the publication within downstream neighborhoods are identified • Publication is sent to closest available brokers towards matching subscribers • p upstream queue d-neighborhood downstream S S S
When There are Failures • Broker reconnects the overlay by creating new links to neighbors of the failed brokers • Publications in message queue are re-transmitted bypassing failed neighbors • Multiple concurrent failed neighbors (up to d-1) are bypassed similarly P S S S
Expected # of deliveries w/o failures Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)
Expected # of deliveries w/o failures Actual deliveries with failures Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)
Expected # of deliveries w/o failures Actual deliveries with failures Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)
Expected # of deliveries w/o failures Actual deliveries with failures Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)
Expected # of deliveries w/o failures Actual deliveries with failures Low deliveries with d=1 Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)
Low deliveries with d=1 Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)
Low deliveries with d=1 Impact of Mass Failures on Throughput Experiment setup: 500 brokers (failures injected at random brokers) Measurement interval of 2 mins (aggregate publish rate changes depending number of failures)
Opportunistic Multi-pathpublication Forwarding Part II Publications:[Middleware’12] Opportunistic Multi-Path Publication Forwarding in Pub/Sub Overlays
Problems in Existing Pub/Sub Systems • Forwarding paths in the overlay are constructed in“fixed end-to-end” manner (no/little path diversity) • This results in a high number of “pure forwarding” brokers • Low yield (ratio of msgs delivered over msgs sent is small) Low efficiency P ✗ ✓ ✗ ✗ S E D ✓ C B A
Multi-Path Forwarding in a Nutshell Actively utilize neighborhoods A Soft links
Different Forwarding Strategies • Conventional systems:Strategy 0 Total msgs: 6 • Forwarding strategy 1 Total msgs: 5 • Forwarding strategy 2 Total msgs: 3 p * * * * * * * * * * * * * * * A A A B B B C C C * * * p p
S2 outperforms S0 by 90% S1 outperforms S0 by 60% Maximum System Throughput Experiment setup: 250 brokers Publish rate of 72,000 msgs/min
Bulk Content Dissemination inpub/sub systems Part III Publications:[ICDCS’12]Publiy+: A Peer-Assisted Publish/Subscribe Service for Timely Dissemination of Bulk Content
Applications Scenarios InvolvingBulk Content Dissemination Replicationwithin CDN Socialnetworks File synch. P2P filesharing Fast replication of content:(video clips, pics) • Scalability • Reactive delivery • Selective delivery Distributionof softwareupdates
Data layer Hybrid Architecture brokers A case for a peer-assisted design Control layer (for metadata) • P/S broker overlay • Distributed repositorymaintaining users’subscriptions Data layer (for actual data) • Form peer swarm • Exchange blocksof data Control layer Subscribe Subscribe Subscribe Subscribe Subscribe subscribers
Scalability w.r.t. Number of Subscribers Network setup: 300 and 1000 clients 1 source publishing 100 MB of content
Conclusion • We introduced the notion of overlay neighborhoods in distributed pub/sub systems • Neighborhoods expose brokers’ knowledge of nearby neighbors and the publication forwarding paths that crosses these neighborhoods • We used neighborhood in different ways • Passive use of neighborhoods for ensuring reliable and ordered delivery • Active use of neighborhoods formultipath publication forwarding • Bulk content dissemination
EXTRAS BONUS SLIDES if needed!
Content-Based Publish/Subscribe NY London P P Publish P Toronto Pub/Sub S S S S S P S sub = [STOCK=IBM] Trader 1 S Trader 2 sub= [CHANGE>-8%] Stock quote dissemination application
System Architecture Tree dissemination networks: One path from source to destination • Pros: • Simple, loop-free • Preserves publication order(difficult for non-tree content-based P/S) • Cons: • Trees are highly susceptible to failures Primary tree:Initial spanning tree that is formed as brokers join the system • Maintain neighborhood knowledge • Allows brokers to reconfigure overlayafter failures on the fly ∆-Neighborhood knowledge: ∆ is configuration parameterensures handling ∆-1 concurrent failures (worst case) • Knowledge of other brokers within distance ∆ Join algorithm • Knowledge of routing paths within neighborhood Subscription propagation algorithm 3-neighborhood 2-neighborhood 1-neighborhood
Overlay Disconnections When there are d or more concurrent failures • Publication delivery may be interrupted • No publication loss B E B D B C B B B A Failed chain of d brokers Subtree Subtree Remain connected Disconnected Subtrees are Disconnected
Experimental Evaluation Studied various aspects of system’s operation: • Impact of failures/recoveries on delivery delay • Impact of failures on other brokers • Size of d-neighborhoods • Likelihood of disconnections • Impact of disconnections on system throughput Discussed next
Publication Forwarding in Absence of Overlay Fragments • Forwarding only uses subscriptions accepted brokers. • Steps in forwarding of publication p: • Identify anchor of accepted subscriptions that match p • Determine active connections towards matching subscriptions’ anchors • Send p on those active connections and wait for confirmations • If there are local matching subscribers, deliver to them • If no downstream matching subscriber exists, issue confirmation towards P • Once confirmations arrive, discard p and send a conf towards p P E D C B A S p p p p p p conf conf conf conf conf conf p Publications Subscriptions E C Deliver to localsubscribers ☑ ☑ ☑ ☑ ☑ ☑ ☑
Publication Forwarding in Presence of Overlay Partitions • Key forwarding invariant to ensure reliability:we ensure that no stream of publications are delivered to a subscriber after being forwarded by brokers that have not accepted its subscription. • Case1: Sub s has been accepted with no pid. It is safe to bypass intermediate brokers P E D C B A S Publications Subscriptions p p p p B D conf conf conf conf ☑ ☑ C Deliver to localsubscribers ☑ ☑ ☑ ☑ ☑
Publication Forwarding (cont’d) • Case2: Sub s has been accepted with some pid. • Case 2a: Publisher’s local broker has accepted s and we ensure all intermediate forwarding brokers have also done so: It is safe to deliver publications from sources beyond the partition. P E D C B A S Publications Subscriptions p p p p B D conf conf conf conf ☑ ☑ C ☑ ☑ ☑*
Publication Forwarding (cont’d) • Case2: Sub s has been accepted with some pid. • Case 2a: Publisher’s local broker has accepted s and we ensure all intermediate forwarding brokers have also done so: It is safe to deliver publications from sources beyond the partition. P E D C B A S Publications Subscriptions p p p p B D conf conf conf conf Depending on when this link has been establishedeither recovery or subscription propagation ensureC accepts s prior to receiving p ☑ ☑ C ☑ ☑ ☑*
Publication Forwarding (cont’d) • Case2: Subscription s is accepted with some pid tags. • Case 2b: Publisher’s broker has not accepted s: It is unsafe to deliver publications from this publisher (invariant). P E D C B A S Subscriptions Publications p p p p* p p ☑* s was acceptedat S with the same pid tag ☑ Tag with pid
Overlay Fragments • When primary tree is setup, brokers communicate with their immediate neighbors in the primary tree through FIFO links. • Overlay fragments: Broker crash or link failures creates “fragments” and some neighbor brokers “on the fragment” become unreachable from neighboring brokers • Active connections: At each point they try to maintain a connection to its closest neighbor in the primary tree. • Only active connections are used by brokers P F E D C B A S x Active connection to E D pid1=<C, {D}> Brokers on the fragment Brokers beyondthe fragment Brokers onthe fragment ? Fragment detector
Overlay Fragments – 2 Adjacent Failures • What if there are more failures, particularly adjacent failures? • If ∆ is large enough the same process can be used for larger fragments. P F E D C B A S Active connection to F D E pid1=<C, {D}> + pid2=<C, {D, E}> Brokers beyondthe fragment Brokers onthe fragment
Overlay Fragments - ∆ Adjacent Failures • Worst case scenario: ∆-neighborhood knowledge is not sufficient to reconnect the overlay. • Brokers “on” and “beyond” the fragment are unreachable. P F E D C B A S No new active connection F D E pid1=<C, {D}> pid2=<C, {D, E}> + pid3=<C, {D, E, F}> Brokers beyondthe fragment Brokers onthe fragment
Fragments Brokers are connected to closest reachable neighbors & aware of nearby fragment identifiers. • How does this affect end-to-end connectivity? For any pair of brokers, a fragment on the primary path between them is: • An “island” if end-to-end brokers are reachable through a sequence of active connections • A “barrier” if end-toe-end brokers are unreachable through some sequence of active connections destination source destination source P P F F E E D D C C B B A A S S F D D E
Store-and-Forward • A copy is first preserved on disk • Intermediate hops send an ACK to previous hop after preserving • ACKed copies can be dismissed from disk • Upon failures, unacknowledged copies survive failure and are re-transmitted after recovery • This ensures reliable delivery but may cause delays while the machine is down P P P P Tohere Fromhere ack ack ack
Mesh-Based Overlay Networks [Snoeren, et al., SOSP 2001] • Use a mesh network to concurrently forward msgs on disjoint paths • Upon failures, the msg is delivered using alternative routes • Pros: Minimal impact on delivery delay • Cons: Imposes additional traffic & possibility of duplicate delivery Fromhere Tohere P P P P
Replica-based Approach [Bhola , et al., DSN 2002] • Replicas are grouped into virtual nodes • Replicas have identical routing information PhysicalMachines Virtual node
Replica-based Approach[Bhola , et al., DSN 2002] • Replicas are grouped into virtual nodes • Replicas have identical routing information • We compare against this approach Virtual node P P P P P P
Problems with a Single Overlay Tree Overloaded root • Tree provides no routing diversity • Overloaded root • All traffic goes through asingle broker • Under utilization: Not all availablecapacity is effectively used ? Unutilizedbandwidth capacity Tree: Single path connectivitynot suitable for diverseforwarding patterns
Related Work – Structured Topologies • A topology is an interconnection between brokers: • Topology relatively stable: long-term connections • Most commonly a global/per-publisher spanning tree • Topology adaptation change topology based on: • Traffic patterns [1,2] – optimize a cost function • Maintain acyclic property by adding + removing links • Advantages: • Fixed topology enables high-throughput connections • Routes may be improved from a “course-grained” system-wide perspective • Disadvantages: • Routes may never be optimal for individual broker pairs • Introduces pure forwarding brokers • Diversity of routing is not accounted for Tree A Re-configure Tree A’ [1] Virgillito, A., Beraldi, R., Baldoni, R.: On event routing in content-based publish/subscribe through dynamic networks. In: FTDCS. (2003)[2] Virgillito, A., Beraldi, R., Baldoni, R.: On event routing in content-based publish/subscribe through dynamic networks. In: FTDCS. (2003)