Cooperative Xpath Caching

Cooperative Xpath Caching K Sarath Kumar

Talk Outline • Introduction • Storage and Querying in P2P scenario • XML Caching • Building a Cooperative Cache • Indexing XML Cache • Cache Operations • Cache Replacement • Experimental Evaluation • Results

Introduction • XML is playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere • In a distributed scenario, how do we share XML data with low • Response latency • Computational cost • Usage • Deploying web services • Accessing XML sites from the web

Storage and Querying in P2P scenario Query • Distributed Hash Table (DHT) • Interface for looking up content in P2P networks • Operations • put(key,data) • lookup(key) • Advantages • a. Decentralization • b. Scalability P1 P3

Storage and Querying in P2P scenario Remote sources • Query Caching • Popular Queries • Results of previous queries • Cache hit • Cache miss • Cache management • Replacement strategies P1 P3 Query Cache

XML Caching Contributions of the paper • Propose and evaluate two-ways of building distributed XML caching scheme • A prefix-based approach for indexing path queries • Replacement strategies • Experimental evaluation of factors affecting cache performance

XML Caching Problem Formulation • Assume a network of N peers, that pose queries on XML documents. • Documents are located at a large number of widely distributed nodes • Data sources need not coincide with peers posing queries

XML Caching Remote sources • Each of the N peers offers storage space Cp for caching query results. • Focus is on dynamic organization and management of “overall” cache content so that most popular items are indexed efficiently. XML Fragments P1

Building a cooperative cache • Build a cooperative cache that contains a set of queries along with their results • We consider queries which are linear path expressions • Example: • LPE:/bookstore/book/title • Cache misses have to be minimized as they need data to be brought from remote data sources Bookstore Book Author Title Name Address Price

Building a cooperative cache • Results of the current query may be added to the cache • Cache content is indexed to achieve efficiency • Query subsumption can be exploited in building index • If Q1 subsumes Q2 then results of Q1 contain results of Q2 and thus can be used to answer it • Example: Bookstore Q1: /Bookstore/ Q2: /Bookstore/Book/Title/ Book Author Title Name Address Price

Sharing cache content • Two alternatives for sharing cache content • a) Index Cache • Loosely-coupled approach • Cache results of the query at the peer that posted it • Index the results to help other peers locate them • b) Data Cache • Tightly-coupled approach • Each peer is assigned a specific part of query space • Results of each query are cached at the peer which is responsible for the corresponding part of the query space

Indexing XML Cache • Index – Prefix Trie • Trie is a tree for indexing and storing strings • Trie nodes are labeled with prefixes of indexed strings • Each node corresponds to distinct prefix of the data • Actual strings are stored in the leaf nodes with which they share a common prefix • Node labeling • Root node is labeled with empty string

Indexing XML Cache <> x /x/x1 /x/x2 /x/x3 /x/xn Q1,Q3, Q7 … Q Q8,Q9Q14,… Q Q2,Q5,Q4 …

Indexing and Storing XML Cache Properties of the Prefix Trie A query Q is indexed/stored at the unique leaf node whose label is either a prefix of Q or Q⊥. Each leaf node has a predefined storage capacity C. On exceeding C, the leaf will be split. On a split, new leaf will be created and queries redistributed among them. The leaves in the index/store queries of size at least C−k (k is a predefined number). Leafs will be merged in size is lesser. Each trie node records the labels of its parent and children (if any).

Indexing and Storing XML Cache Distributing Prefix-Trie Index The nodes of the prefix-trie index are distributed among peers using DHT

Cache Operations • Distributed Cache Operations • I. Index Cache • Each peer • caches locally the results of its own queries • publishes the corresponding query in the DHT index. • Local hit – found in local cache • Global hit – found from querying index • Disadvantage • Redundancy

Cache Operations • Index Cache Operations – Caching a new query • Peer stores the results of query Q in its cache and inserts Q into the trie index. • Q must be indexed at the leaf having a prefix of Q as a label – so have to do a lookup. • The lookup may have three outcomes • If a leaf has prefix of Q as label – insert Q there • If an internal node having Q as a label is located – create a NULL leaf and insert Q there. • Else, a new leaf has to be created.

Cache Operations • Index Cache Operations – Cache lookup Algorithm • Lookup first for Q in the DHT. • If it is not found, lookup for leaf L having a prefix of Q as label (costly as label of L is not known in advance!). • If outcome of the lookup is an internal node, check the labels of its children. • If one of the children is a NULL leaf, Q′ subsumes Q, then Q′ is followed to the peer caching it. • If the outcome of a lookup for a prefix is a leaf, then check whether the leaf indexes Q or any other query subsuming it. • In any other case, the lookup fails.

Cache Operations • II. Data Cache Operations • Along with the indexed query, each leaf also stores its result • Tight control over the distribution of cache content – redundancy can be avoided • If a new query Q subsumes query Q’ existing in the cache, replace Q’ with Q • Query lookup • Similar to Index Cache except when Q lookup returns a NULL leaf. In the above case, DHT lookup for Q⊥ is done. • If a NULL leaf is located then visit all leaves in the subtrie starting from its parent

Cache Operations • Prefix lookup alternatives • For a query Q of length L, • a) Looking up all prefixes of Q in parallel (SP). • b) Looking up all prefixes of Q sequentially, starting from the longest one (SS). • c) Binary search on the prefix lengths of query Q (BI). •  If the currently located prefix is an internal node, lookup proceeds with longer prefixes. •  If the current prefix is neither an internal node nor a leaf, lookup proceeds with shorter prefixes.

Cache Replacement • Why? • With insertion of new queries maximum cache capacity Cp offered by the peer may be exceeded. • Basis? • Utilization value (UV) is maintained for each cached query and updated whenever it is used for answering a query. • How? • Maintain access statistics of cached queries. • Optimized by maintaining stats of fragments. • Example If Q = /X/YZ is posed, statistics of Q1 = /X/Y/Z/A/B and Q2 = /X/Y/Z/F/G are also updated.

Cache Replacement Proactive Replacement Problem: Split on leaf overflow is regardless of the UVs of the paths stored/indexed in it  overhead of splitting stale paths (with low UVs) Solution: Check leafs for paths with low UVs Each peer defines its own UV threshold τ and replaces paths with UV lower than τ. τ is defined locally by each peer as a percentage over the average of all the UVs of the paths stored at a peer.

Experimental Evaluation Experimental Setup

Experimental Evaluation • Cooperative vs. Individual caching • In individual caching peers do not share their query results with any other peer. • It is found that cache hits increase linearly with increase in query overlap for the cooperative cache. • If query workload is kept constant, as number of peers participating in cache sharing increases the hit ratio also increases linearly.

Experimental Evaluation • Is it beneficial to build cache? • ρ= Bc / Bi • Bc = cost of transfering data from the • cooperative cache for answering the queries • Bi = corresponding cost when the same queries are answered from remote hosts. As query overlap increases, ρ decreases in both approaches as hit ratio increases.

Experimental Evaluation Index Cache vs. Data Cache a) Cache Hit Ratio Increase in leaf capacity decreases cache hit ratio in DC (as more splits implies better distribution in DHT) . In IC cache hit ratio depends only on peer query workload.

Experimental Evaluation Index Cache vs. Data Cache b) Lookup Cost hLC: No. of hops for locating the leaf of the distributed prefix trie that stores/indexes the results of the query nLC: The network bandwidth needed for transferring the final result fragment of a query to the peer that has posed the query in the case of a hit As local cache hits increase, lookup costs decrease in Index Cache while lookup costs remain constant in case of Data Cache

Results • 1) Cooperative caching significantly increases the cache hit ratio compared to individual caching. • 2) Data Cache achieves higher hit ratios than Index Cache, when there is substantial query imbalance among the peers. • 3) Replacement decisions based on the local view of each peer do not significantly deteriorate cache performance compared to a global cache.

Results • 4) Proactive replacement decreases the maintenance cost in Data Cache without affecting its hit ratio • 5) Cooperative caching is scalable with the number of peers as long as there is workload locality.

References • http://en.wikipedia.org/wiki/XML • http://www.w3.org/XML/ • L. Chen, S. Wang, and E. A. Rundensteiner. A Fine-Grained Replacement Strategy for XML Query Cache. In WIDM, 2002. • Hari Balakrishnan, M. Frans Kaashoek, David Karger, Robert Morris, and Ion Stoica. Looking up data in P2P systems. In Communications of the ACM, 2003. • B. Mandhani and D. Suciu. Query Caching and View Selection for XML Databases. In VLDB, 2005. • G. Skobeltsyn and K. Aberer. Distributed Cache Table: Efficient Query-Driven Processing of Multi-Term Queries in P2P Networks. In P2PIR, 2006.

Thank you

Cooperative Xpath Caching

Cooperative Xpath Caching

Presentation Transcript

Cooperative Caching in Wireless Multimedia Sensor Nets

XPATH

XPath

Supporting Cooperative Caching in Disruption Tolerant Networks

XPATH

XPath

Dynamic-Content Web Caching with Cooperative Proxy Scheme

XPath

XPATH

Byzantine Fault Tolerant Cooperative Caching

The Mystery of Cooperative Web Caching

Cooperative Caching Middleware for Cluster-Based Servers

XPath

Cooperative Caching for Chip Multiprocessors

Cooperative Caching for Chip Multiprocessors

Cooperative Caching and Kill-Bots

XPath

Byzantine Fault Tolerant Cooperative Caching