120 likes | 225 Vues
This paper discusses the efficient dissemination of RSS documents through the G-ToPSS architecture, addressing limitations in traditional polling-based systems. It highlights the increasing use of web publishing tools and the necessity for better subscription and matching algorithms tailored to RSS. The proposed solution promotes a publish/subscribe interaction model and introduces a query language optimized for filtering RSS content. Future work involves extending the G-ToPSS prototype to support full RDF language features and refining constraint processing for improved scalability and matching efficiency.
E N D
CMS-ToPSS: Efficient Dissemination of RSS Documents Milenko Petrovic Haifeng Liu Hans-Arno Jacobsen University of Toronto VLDB2005
Information Dissemination • Easy to use web publishing tools (blog, wiki) are fueling the increase in the number of web publishers • RSS frequently used to disseminate update to interested users • CNN.com, Yahoo! News, Amazon.com, MSN search (beta) Problem:Polling based architecture RSSreaders RSSpublishers RSSaggregator VLDB05
Solution! Current rss dissemination architecture G-ToPSS rss dissemination architecture VLDB05
MatchingRSS feeds MatchingRSS feeds Interaction Model: Publish/Subscribe Publisher Publisher RSS feeds Broker Queries over all RSS Subscriber Subscriber VLDB05
Research challenges • Need a subscription (query) language suitable for filtering of rss documents • Need an efficient matching algorithm based on graph representation • Structurally matching • Constraint matching • Scalability to a large number of subscriptions and high publishing rate VLDB05
Subscription Scalability VLDB05
Memory Scalability VLDB05
Matching Semantics PAPER17 Publication ?y(?y <= Publication) AUTHOR CONFERENCE AUTHOR CONFERENCE “Arno Jacobsen” SIGMOD SIGMOD “Arno Jacobsen” YEAR “2001” YEAR LOCATION “California” ?z(?z > 2000) Subscription VLDB05
Data Model (RSS Documents) • Publications are represented as directed graphs with node and edge labels • Node labels are typed • Literal value • Class • Edge labels are typed • Class • Classes can be related using multiple inheritance ontology VLDB05
Query Language (GQL) • Queries are represented as directed graph patternswith node and edge labels • Node labels are variables • Variables can be constrained by • Classes • Class instances and literal values • Edge labels are class instances • Mapping (matching) semantics • Pattern graph maps to data graph if the topology (structure) of the two graphs matches and all variable constraints are satisfied VLDB05
Conclusion and Future Work • Proposed a prototype for graph-based metadata filtering • G-ToPSS supports high matching rate for an expressive subscription language • Extend G-ToPSS with full RDF language features • Optimize constraint processing during matching VLDB05