Corona: A High Performance Publish-Subscribe System for the World Wide Web

Corona: A High Performance Publish-Subscribe System for the World Wide Web CorONA: Cornell Online News Aggregator Authors: V. Ramasubramanian, R. Peterson and E.G. Sirer Cornell University Presenter: Sara Salahi Northwestern University

Motivation • Abundance of frequently changing information on the Web: • Weblogs, wikis, news sites etc. • Increased need to notify users of updates • Ideally want: • Fast update detection • Optimal bandwidth utilization • Existing protocols do not provide users with automatic notification of updates

Background • Publish-Subscribe Systems • Publishers, subscribers and infrastructure • Topic based vs. Content based • Fundamental drawbacks of preceding systems: • Require substantial changes in the way publishers serve content • Expect subscribers to learn sophisticated query languages • Non-compatible with current Web architecture

Background • Micronews Systems • Micronews feeds: short descriptions of frequently updated information in XML-based formats (e.g. RSS) • Feed readers, cloud tag (pub-sub model) • Commercial services disseminate micronews updates to users • Main disadvantages: • Fragile centralized servers • Relentless polling to detect updates • Corona Improvements: • Shares updates between peers • Cooperative polling reduces update latencies

Background • Overlay Networks • Large number of structured overlays that organize networks • Rings, hyperdimensional cubes, butterfly structures, de Bruijn graphs, skip-lists etc. • Corona is easily layered on structured overlays with uniform node degree (includes all of the above listed overlays)

Corona: The Big Picture • Topic based pub-sub system which interoperates with current Web architecture (URLs = “channels”) • Cooperative polling of channels by geographically distributed nodes • “…n nodes polling with same polling interval and randomly distributed polling times can detect updates n times faster if they share updates with each other.” • Optimization problem • Tradeoff between update performance and network load

Analytical Modeling • Pastry: underlying substrate, organizes network into a ring • Routing table, DAG rooted at each node • Node can reach another node in logbN hops, b: fanout, N: # of nodes • Corona assigns nodes in well-defined wedges • Optimal wedge size determined by analysis of global performance overhead tradeoff

Analytical Modeling • Channel with polling level L • Polled by nodes with at least L matching prefix digits in their identifiers (polling level 0: all nodes in system poll for the channel) • Polling level quantifies performance-overhead tradeoff • Channel with polling level L has: • N/bL nodes polling it τ: polling interval • Cooperatively detects updates in (τ/2)(bL/N) time on average • Collective load placed on server of the channel is τ(N/bL)

Analytical Models • Corona Lite • Minimize average update detection time • Bound load placed on content servers • Overall update performance = average of the update detection time of each channel weighted by # of clients subscribed to the channels • Target network load - the total # of subscriptions in the system • Corona Fast • Achieve target average update detection time • Minimize load placed on content servers • Maintains stable performance through changes in workload • Corona Fair • Minimize average update detection time w.r.t. expected update frequency • Bound load on content servers • Incorporates update rate of channels into tradeoff to achieve a fairer distribution of update performance between channels • Defines a modified update performance metric as the ratio of the update detection time and the polling interval of the channel

Decentralized Optimization • Honeycomb – determines optimal polling levels • fi(l) and gi(l) define performance & cost for channel i as function of polling level l • NP-Hard so approximate solution • Lagrange multiplier: • Due to monotonicity, optimal solution L* is bounded by same minima as approximated solutions Ld* and Lu* • Honeycomb aggregates global tradeoff factors • Channels grouped in tradeoff clusters, fi/gi • # clusters/polling level is limited by a constant (Tradeoff_Bins) • Cluster aggregation overhead (memory state, network bandwidth) limited by size of routing table

System Management • Channel has unique identifier and one or more owner nodes managing it • Primary owner is Corona node with numerically closest identifier to channel’s identifier • Additional owners are F closest neighbors • Tolerate failures • Like all P2P systems, problem occurs if more than F adjacent nodes fail at once • Fixed because users can easily re-subscribe • Owners inform subscribers of updates and keep track of channel-specific factors that affect performance tradeoffs

System Management • Cooperative Polling • Optimization Phase • Corona nodes apply optimization algorithm on tradeoff data • Maintenance Phase • Changes to polling levels communicated to peer nodes in routing table via maintenance messages • Aggregation Phase • Enables nodes to receive new aggregates of tradeoff factors • Polls for a channel at different nodes are randomly distributed over time

Update Dissemination • Version numbers • Deltas • Studies show that amount of change in content update is typically tiny – 6.8% • Difference engine used to identify new information • When delta is generated by a node, all other nodes in channel’s polling wedge are updated • “Simultaneously” detected deltas • Primary owner makes sure latest delta is used and ignores redundant deltas

User-Interface http://www.cs.cornell.edu/people/egs/beehive/corona/

Implementation • Layered on Pastry • Corona handles orphan channels • Tradeoff factors are aggregated into slack cluster prior to optimization • Reliance on IM • Can’t log in from all nodes simultaneously • Prevent malicious nodes from generating spurious updates • Publish digitally signed certificates • Use threshold-cryptography to generate certificate for content

Evaluation • Compare Corona performance against legacy RSS performance • Real-life RSS traces are used • The tradeoff parameters are extrapolated to a larger scale: • 1024 nodes • 100,000 channels • 5,000,000 subscribers • Polling interval – 30 minutes

Evaluation Network load on Content Servers Number of Pollers per Channel Average Update Detection Time Update Detection Time per Channel

Evaluation Update Detection Time per Channel Update Detection Time per Channel OVERALL SUMMARY

Deployment • A set of 60 PlanetLab nodes • Corona-Lite scheme is used • 7500 RSS feeds from www.syndic8.com • 150,000 subscriptions • Polling interval – 30 minutes

Deployment Results Average Update Detection Time Total Polling Load on Servers

Conclusions/Future Work • Corona is a topic based pub-sub system which interoperates with current Web architecture, network overlays • Fast update detection time achieved by: • Cooperative polling of channels by geographically distributed nodes • Shared updates between peers • Do all updates need to be shared? • Measure average time to deliver updates to subscribers? • Maybe optimize polling interval time depending on rate of updates in channel? • Need to run better simulation with IM interface to see true overhead of having multiple nodes logged in at once

Thank you!

Corona: A High Performance Publish-Subscribe System for the World Wide Web