Distributed Query Processing and Catalogs for Peer-to-Peer Systems

Distributed Query Processing and Catalogs for Peer-to-Peer Systems Professor: Iluju Kiringa Student: Fan Yang, Libin Cai

Agenda • About P2P • Mutant Query Plan • Distributed Catalog • Intentional Statements • Security and Privacy • Conclusions

About P2P • Advantages: • Ease of deployment • Ease of use • Fault tolerance • Scalability • Limitations: • Weak query capabilities • No infrastructure for distributed queries • Limitations in index scalability and result quality

A query example User Bob wants to see a movie tonight. Bob visits his favorite portal, BobsPortal.com. Bob uses GUI front-end to come up with an XML query: FOR $r in document(‘‘film_reviews’’)//review, $g in document(‘‘preferences’’)//genre, $s in document(‘‘film_showings’’) / showing[date = ‘‘15 March 2002’’] WHERE $r/genre = $g AND $r/title = $s/title RETURN <film> { $r/title } { $r/rating } { $s/theater } </film> [2] Three XML documents: film reviews, preferences, and film showings.

A query example (cont’) The logical query plan [2] Three elements: Regular query operators: select, join Pseudo-operator: document, display References to XML fragments Query processing: logical query plan physical query plan query processing algorithm executed

Advent of Mutant Query Plan • Why is MQP? • can cope with incomplete metadata • can decentralize query optimization and execution • Respect the autonomy and the local policies of sites • Adapt to server and network conditions even while being evaluated • What is MQP? • An algebraic query plan graph, encoded in XML • References to resource locations (URLs) • References to abstract resource names (URNs) • Verbatim XML fragments • Each MQP is tagged with a target once the MQP is fully evaluated.

Mutant Query Processing [1]

Mutant Query Plan Example Garage Sale example: Query: CDs for $10 or less in the Portland area. MQP: Regular query operators: select, join Pseudo-operator: display Constant piece of XML URNs [1]

Mutant Query Plan Example (cont’) (a) Resolution and rewriting (b) reduction [1]

Comparisons between Pipelined plan and Mutant plan (a) Pipelined plan (b) mutant plan [2]

Distributed Catalogs • Question: • how do peers find out resources available in other peers? • Build distributed catalogs to efficiently route queries • Procedures: • Peers use multi-hierarchic namespaces to categorize data; • Data providers use multi-hierarchic namespaces to describe data they serve; • Data consumers use them to formulate queries.

Multi-hierarchic Namespaces Multi-hierarchic namespace:The set of categorization hierarchies relevant to an applications domain. [1] Interest area: Second-hand armchairs in the Portland area: [USA/OR/Portland, Furniture/Chairs] A multi-hierarchic namespaces with two categorization dimensions and two highlighted interest areas: (a) Vancouver-Portland furniture, (b) items in Portland [1]

Peer Roles

Resource Resolution • Authoritative Server • Strives to know about all base servers within its interest area. • Through an authoritative index or meta-index server, the known base servers in a particular interest area can be found out. • Resource Resolution • Seeks authoritative index or meta-index server • Recursively follows the index references • Finds all the relevant base servers and data items • Resolves URN

Example of Resource Resolution • Urn: ForSale: Portland-CDs • urls: http://10.1.2.3.9020/, http://10.2.3.4.9020/ • Interest area: [USA/OR/Portland, Music/CDs] • Authoritative meta-index server A :[USA, *] • Index Server B: [USA, Music] • Index Server C: [USA/OR, Music] • Index Server G: replace URN with URLs Query plan A B C … G http://10.1.2.3.9020/ http://10.2.3.4.9020/

Intentional Statements • Purposes: • How can index and meta-index servers convey the relationships between the data they cover? • How can mutant queries use this information to make intelligent choices about completeness, currency and latency tradeoffs? • Intentional Statements: • used to describe relationships between index and meta-index servers, can be expressed using coordination formulas. Server R replicates everything from server S for the Portland category of the Location hierarchy Only Oregon sporting goods information that R holds is for Portland and Eugene golf clubs at S R index several base servers base[Portland, *]@R = base[Portland, *]@S base[Oregon, Sporting Goods]@R = base[Portland, Golf Clubs]@S  base[Eugene, Golf Clubs]@S Index[Oregon, Golf Clubs]@R = base[Oregon, Golf Clubs]@S Base[base[Oregon, Golf Clubs]@T base[base[Oregon, Golf Clubs]@U

Utilizing Intentional Statements (cont’) • Processes: • Whenever a server registers an interest area with the meta-index server, it provides intentional statements • Servers can then use such information in binding and routing MQPs. Assumptions: Meta-index server M knows about servers R and S Interest areas: R [Portland, Recreation] S [Oregon, Sporting Goods] M receives an MQP that contains the resource name [Portland, Golf Clubs] Then the name could be bound to: base[Portland, Golf Clubs]@R base[Portland, Golf Clubs]@S If M knows the intentional statement,base[Portland, Sporting Goods]@R = base[Portland, Sporting Goods]@S then it could bind to: base[Portland, Golf Clubs]@R | base[Portland, Golf Clubs]@S Conclusion: the MQP could be routed to either R or S, but it need not go to both.

Utilizing Intentional Statements (cont’) • For queries run not instantly: Suppose: Server R replicates everything for Portland at S, also possibly keeps additional data about Portland, can be up to 30 minutes out of date R polls every 30 minutes to update the data it replicates from S. Intentional Statement: base[Portland, *]@R ≥ base[Portland, *]@S{30} A binding for resource [Portland, CDs] might then be: base[Portland, CDs]@R{30} | (base[Portland, CDs]@R  base[Portland, CDs]@S){0} Explanations: One can get an answer quickly by just routing the MQP to R, but that answer could be up to 30 minutes out of date. By routing the MQP to both R and S, one can have a complete and current answer. • Conclusions: • Impossible to guarantee queries run instantly • Compromises on latency, completeness and currency. • Replication can’t be both scalable and instantaneous.

What else could be in MQPs • Accumulating catalog and statistics information • Maintaining provenance • Rewards system • Meta-index updating • Detection of spoofing

Security and Privacy • Issues: • With MQPs, the partial results is possibly divulged to other undesirable servers • Solutions: • MQPs need to incorporate ordering and transfer policies • Encrypts data or data elements with the public key • MQPs can allow to obtain answers under given server security policies

Conclusions • Enable peers to independently optimize and partially evaluate queries without global knowledge, and with a minimum of coordination overhead.

References • [1] Vassilis Papadimos, David Maier and Kristin Tufte. Distributed Query Processing and Catalogs for Peer-to-Peer Systems. OGI School of Science Engineering. Oregon Health Science University. • [2] V. Papadimos and D. Maier. Distributed Queries without Distributed State. In Proc. of WebDB 2002, pages 95-100.

Thanks! Questions?...

Distributed Query Processing and Catalogs for Peer-to-Peer Systems

Distributed Query Processing and Catalogs for Peer-to-Peer Systems

Presentation Transcript

Distributed Peer-to-peer Name Resolution

Peer-to-peer systems and Distributed Hash Tables (DHTs)

Peer To Peer Distributed Systems

Peer-to-Peer Systems

Peer to Peer and Distributed Hash Tables

Peer-to-Peer (P2P) Distributed Storage

Peer-to-peer systems

Peer-to-Peer Systems

Distributed Hash-based Lookup for Peer-to-Peer Systems

Peer-to-Peer Systems

Peer-to-Peer Networking for Distributed Learning Repositories:

Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems

Peer-to-peer systems and Distributed Hash Tables (DHTs)

Peer-to-Peer Systems

Peer-to-peer systems

Peer to peer networks Distributed innovation

Peer-to-Peer Distributed Search

Peer-to-Peer Protocols and Systems

Peer-to-Peer Systems (cntd.)