310 likes | 461 Vues
Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems. Chen Chen 1 joint work with Roman Vitenberg 3 , Hans-Arno Jacobsen 1,2 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University of Toronto
E N D
Scaling Construction of Low Fan-out Overlays for Topic-based Publish/Subscribe Systems Chen Chen 1 joint work with Roman Vitenberg 3, Hans-Arno Jacobsen 1,2 1 Department of Electrical and Computer Engineering 2 Department of Computer Science University of Toronto 3 Department of Informatics University of Oslo ICDCS 2011
Example: pub/sub <IBM, price = 100> Interests: IBM Interests: Microsoft <Microsoft, price = 50> Interests: IBM ICDCS 2011
Pub/Sub • A communication paradigm • Subscribers express their interests • Publishers disseminate messages • Many applications and industry standards • Application integration, financial data dissemination, RSS feed distribution, business process management • WS Notifications, WS Eventing, OMGs’ Real-time Data Dissemination Service • Topic-based pub/sub • TIBCO RV • Google’s GooPS ICDCS 2011
Two directions for pub/sub Design of routing protocols Construction of overlay The construction of the overlay topology such that network traffic is minimized. Chockler et al., PODC’07 Onus et al., INFOCOM’09 • The design of protocols so that publications and subscriptions are sent most efficiently across the overlay network. • G. Li et al., ICDCS’08 • M. Castro et al., JSAC’02 ICDCS 2011
Desirable properties for overlays {b,c,d} V1 {a,c} {a} V5 V2 V4 V3 {a,b} {b,d} Low average node degree Low maximum node degree Low diameter Topic-connectivity Efficiency to construct Adaptability to churn Ease of distributed implementation ICDCS 2011
Our contributions ICDCS 2011
Topic-connected overlay(TCO) {b,c,d} {b,c,d} V1 V1 {a,c} {a} {a} {a,c} V5 V2 V5 V2 V4 V3 V4 V4 V3 {a,b} {b,d} {a,b} {b,d} {a,b} Suboverlay Ga is topic-connected Suboverlay Gbis NOT topic-connected An overlay G ICDCS 2011
MinMax-TCO {b,c,d} {b,c,d} V1 V1 {a,c} {a,c} {a} {a} V5 V2 V5 V2 V4 V3 V4 V3 {a,b} {b,d} {a,b} {b,d} V5 has 3 edges V1 has 4 edges ICDCS 2011
MinMax-TCO problem and GM-M algorithm[Onus, 2009] • Minimum MaximumDegree Topic-Connected Overlay (MinMax-TCO) problem • Given a set of nodes V, set of topics T, and Interest: V T {true, false}, construct a topic-connected overlay Gwith minimummaximum degree. • Theorem: MinMax-TCOis NP-complete • GM-M algorithm (MinMax-ODA) • always greedily adding an edge which 1) has the largest edge contribution, and 2) increases the maximum node degree minimally • logarithmic approximation ratio • time complexity ICDCS 2011
Why divide-and-conquer To improve running time Reduce the size of node set Divide-and-conquer based on node set V • GM-M’s runtime cost is expensive • time complexity • 487 minutes: |V|=1000, |T|=100, uniform distribution* * each topic has an equal probability for all nodes that may be interested in that topic • The number of nodes is the dominant factor ICDCS 2011
Divide-and-conquer (DC) {c} {a} V0 V1 {c} {a,c,d} V4 {d} V6 V12 {a,b,c} V13 V7 {c} V9 {a,b,c} V10 V3 {d} {c} - Divide overlay based on V - Conquer each sub-TCO by GM-M - Combine via cross-TCO links {a,b,c} V2 V11 {b,c,d} {a,b,d} V8 V14 V5 {a} {a,b,d} ICDCS 2011
Challenges for divide • DividetheMinMax-TCO problem into several sub-overlay construction problems Node clustering Random partitioning Each node flips a coin and gets assigned to one of the partitions Fast Easy to tune Straightforward to decentralize However, May lose correlation among nodes due to randomness Maximum node degree is very sensitive to random partitioning Nodes with similar interests are placed together • High runtime cost • Not trivial to decentralize • Outputs with varying sizes ICDCS 2011
Bad case for random partitioning {t1, t2, t3, t4, t5, t6, t7, t8} Vb2 Vb3 vall Va1 {t1, t2, t3, t4} {t5, t6, t7, t8} Vb4 Va1 Va2 V8 Va2 V1 {t1, t2} {t3, t4} {t5, t6} {t7, t8} V7 vall Vb1 Vb2 Vb3 Vb4 V2 V6 Vb1 V5 V3 V4 V1 V2 V3 V4 V5 V6 V7 V8 {t1} {t2} {t5} {t3} {t4} {t6} {t7} {t8} • Random partitioning may increase the degrees • of individual nodes by a factor of ICDCS 2011
Poor performance of DCfor MinMax-TCO ICDCS 2011
Pub/sub workloads • The number of nodes |V|: from 1000 to 8000 • The number of topics |T|: from 100 to 1000 • The subscription size: from 50 to 150 on average • Topic popularity • Uniform: [Chockler, 2007] • Zipf: feed popularity distribution in RSS [Liu, 2005] • Exponential: stock popularity in NYSE [Tock, 2005] ICDCS 2011
Learn from workloads Observations • Increased maximum node degree occurs when a node subscribes to a large number of topics • “Pareto 80-20” rule: • most nodes subscribe to a relatively small number of topics • only a relatively small number of nodes might be interested in a large number of topics Basic idea special treatment for those nodes interested in many topics ICDCS 2011
Bulk nodes Given (V,T,Int) the bulk node set is a subset such that where Tv is the topic set subscribed by node v and ηis defined as bulk subscriber threshold The lightweight node set is L = V – B The bulk subscriber threshold η can be determined based on historical results ICDCS 2011
Challenges for combine Combine multiple sub-TCOs into one by adding cross-TCO links as bridges • Not all nodes need to participate • How to select node subsets for cross-TCO links? • small : increasing node degrees • large : degrading time efficiency ICDCS 2011
Representative set {b,c,d} {b,c,d} {b,c,d} V1 V1 V1 {a,c} {a} {a} {a} V5 V2 V5 V2 V5 V2 {a,c} {a,c} V4 V3 V4 V3 V4 V3 {a,b} {a,b} {b,d} {b,d} {a,b} {b,d} {v3,v5} is a 1-rep set which covers all topics {a,b,c,d} {v1,v2,v3,v5} is a 2-rep set; {a,b,c,d} is covered twice. A topic-connected overlay ICDCS 2011 Given a TCO (V,T,Int,E), a representative set (rep set) is a subset of V that covers all V’s topics λ times.
Representative nodes • Representative nodes (rep-nodes) • Represents the interests of all the nodes • Can function as bridges to determine cross-TCO links • Coverage factor λ: for tuning the size of rep set • Observation For typical pub/sub workload and sufficiently large partitions, minimal rep sets tend to be several times smaller than the total number of nodes. • How to find a minimal rep set Rλ for (V,T,Int)? • Linearly reducible to classic set cover problem: NP-complete • Greedy algorithm: always adding a node with the largest number of topics that are not yet λ-covered • a logarithmic approximation ratio • efficiently implemented ICDCS 2011
Divide-and-Conquer with Bulk and Lightweight Rep-nodes (DCBR-M) {b,d,e,f} {c,e,h} {a,b,c,e,f,g} V3 V9 V18 {c,d,g,h} V6 {a,e,f,g} {a,b,c,d,f,g} V12 {a,b,c,e,f,g,h} V19 V20 V0 V15 {a,c,h} {a,c,g,h} {a,d,e,g} V4 {a,c,e,g} V1 {b,c,d,e} V17 {d,f,g,h} V2 V13 {b,f,h} V5 V8 {a,c,e,f} V16 V7 {a,d,e} {a,d,f,g} {a,c,d,e} V14 V11 V10 {b,d,e} {a,e,f} {b,d,e,g} ICDCS 2011
Design of DCBR-M algorithm • Different parameters for tuning the algorithm: • The bulk subscriber threshold η divide, combine bulk nodes vs. lightweight nodes • The coverage factor λ combine time efficiency vs. the quality of TCO • The number of lightweight partitions p divide, conquer p = |L| (one node one partition): combine only p = 1 (all node one partition): conquer only • How to decentralizeDCBR-M • Nodes autonomously organize themselves into random partitions • Different partitions construct inner edges in parallel • Different partitions compute rep sets in parallel • Bulk nodes and rep-nodes communicate and compute outer edges ICDCS 2011
Theoretical analysis of DCBR-M • DCBR-M will generate a TCO whose maximum node degree is asymptotically the same as that of the TCO output by GM-Munder the realistic assumption for typical pub/sub workloads. • The running time of DCBR-M is Considerable speedup when |B| and |R| are small ICDCS 2011
Evaluation for DCBR-M (1) ICDCS 2011
Evaluation for DCBR-M (2) ICDCS 2011
Evaluation for DCBR-M (3) ICDCS 2011
Conclusion ICDCS 2011
Backup ICDCS 2011
Related work • Construction of the overlay • MinAvg-TCO, Chockler et al. PODC’2007 • MinMax-TCO, Onus et al. Infocom’2009 • Low-TCO, Onus et al. ICDCS’2010 • DC for MinAvg-TCO, Chen et al. ICDCS’2010 • Design of routing protocols • G. Li et al. ICDCS’2008 • M. Castro et al. JASC’2002 ICDCS 2011
Minimal Number of Links • A typical pub/sub system combines a number of protocols, many of which maintaining per-link state • A node must constantly monitor the availability of each of its neighbors (heartbeats and keep-alive state) • If the links are maintained using TCP, there is the cost of connection state for each link • The more links there are, the fewer topics can be routed over each individual link, thereby diminishing cross-topic aggregation benefits • If sequential-diff-based compression scheme is used, there is an extra cost associated with a history table ICDCS 2011
DCBR-M vs DC • MinMax-TCO vs MinAvg-TCO Fundamentally different problems • Average node degree is a “global” property; maximum node degree possess both “global” and “local” properties. • DC for MinAvg-TCO does not directly apply to MinMax-TCO. • MinMax-TCO is more sensitive to divide, conquer and combine. • Different algorithm design, theoretical analysis, and experiments. ICDCS 2011