1 / 42

AmbientDB Relational Query Processing in a P2P Network

AmbientDB Relational Query Processing in a P2P Network. Peter Boncz and Caspar Treijtel LEE BYUNGIL PL Lab. Hongik University 2004.11.14. Outline. 1. Introduction 1.1 Goal 1.2 Assumptions 1.3 Example: Collaborative Filtering in a P2P Database 1.4 Overview

eve
Télécharger la présentation

AmbientDB Relational Query Processing in a P2P Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AmbientDBRelational Query Processing in a P2P Network Peter Boncz and Caspar Treijtel LEE BYUNGIL PL Lab. Hongik University 2004.11.14

  2. Outline 1. Introduction 1.1 Goal 1.2 Assumptions 1.3 Example: Collaborative Filtering in a P2P Database 1.4 Overview 2. AmbientDB Architecture 2.1 Data Model 2.2 Query Execution in AmbientDB 2.3 Dataflow Execution 2.4 Executing the Collaborative Filtering Query 3. DHTs in AmbientDB 3.1 Example: Approximated Collaborative Filtering 4. Conclusion

  3. 1. Introduction (1) • AmbientDB • A new peer-to-peer (P2P) DBMS prototype • Developed at CWI (Centrum voor Wiskurde en Informatica) • Distributed an ad-hoc P2P network • Global query algebra • Multi-wave stream processing plans • Ambient Intelligence (AmI) • Digital environments in which multimedia services are sensitive to people’s needs

  4. Music Playlist Scenario • amP2P player • Log - mata information • Homogeneous • Content - AmbientDB instance, or external sources • Heterogeneous • AmbientDB • Its collection • Only Meta-information

  5. 1.1 Goal • Full relational database functionality • Cooperate in ad-hoc way with other AmbientDB devices • Propose • A general architecture for AmbientDB • Complex query processing in ad-hoc P2P network

  6. 1.2 Assumptions (1) • Upscaling (flexibility) • Amount of cooperating devices to be potentially large • Home environment and ad-hoc P2P network • Downscaling • Devices often have few resources (CPU, memory, network, battery) • Schema integration • All devices operate under a common global schema • Data placement • Data placement is determined by user • Network failure • Resilience of Chord • While a query runs, the routing tree stays intact

  7. Chord

  8. 1.2 Assumptions (2) • Distributed database • Priori • Not in AmbientDB • Federated database • Statically Heterogeneous schema integration • Mobile database • Centralized database server and client (mobile node) • P2P file sharing system • Non-centralized and ad-hoc topologies • Simple keyword text search

  9. Example Music Schema • The global schema “AMP2P” in AmbientDB • distributed table • On the global level • The union of all horizontal fragments of these tables

  10. 1.3 Example : CollaborativeFiltering in a P2P Database (1) • amP2P player • Access to a local content repository (digital music collection) • AmbientDB instance • Share all music content in the “home zone” • Only share the meta-information in the huge P2P network

  11. 1.3 Example : CollaborativeFiltering in a P2P Database (2) • Memory-based implicit voting scheme • Predicted vote for the active user for item j • vi,j = the vote of user i on item j • w(a,i) = weight function defined on the active user and user i • vi = average vote for user i • k = nomalizing factor • weight(usera, useri) • Times the example song has been fully played by user i • Refined form • Negative information – skipped

  12. Collaborative Filtering Query in SQL

  13. 1.4 Overview • General architecture • Include Data model • Query execution • Three-level query execution process • DHT (Distributed Hash Table) • Global table indices • Optimize the query • Related work & future work • Conclusion

  14. AmbientDB Architecture

  15. 2. AmbientDB Architecture • Distributed Query processor • Execute query on all ad-hoc connected devices • P2P protocol • Chord • scalable lookup and routing scheme • P2P IP overlay networks made out of unreliable connections • Query node = root • A small number of connections per node • Simultaneous bi-directional communication and query processing • DHTs – global table indices • Local DB component • Local table • Embedded database • External data source – wrapper component (distributed database system) • Schema integration engine • Meta-data translation • Using view-based schema mappings

  16. AmbientDB Routing Tree Using IP Overlay

  17. 2.1 Data Model (1) • Standard relational data model & algebra as query language • Query are formulated against global tables • Local node or limited set of node or all reachable nodes • Converging answer • Query locally • Re-issue iteratively over more nodes

  18. 2.1 Data Model (2) • Abstract Table • LT (Local Table) • Each node has private schema • Global schema – global table T • All participating nodes Ni carry a table instance Ti • In query node • Ti may be accessed as a LT • DT (Distributed Table) • Q : Set of node that participate in some global query • The union of local table instances

  19. 2.1 Data Model (3) • PT (Partitioned Table) • Specialization of the DT • All participating tuples in each Ti are disjunct between all nodes • Advantage over DT • Exact query answers can often be computed in an efficient distributed fashion • By broadcasting a query and letting each node compute a local result without need for communication • Attaching a bitmap index Ti.Q to each local table Ti • “virtual” column • #NODEID • Be aware in which node are located • Stored in a DT/PT • Location-specific query restrictions

  20. LT, DT and PT

  21. 2.2 Query Execution in AmbientDB (1) • Three level translation • Abstract level • User query • Selection, join, aggregation, sort • Lists • (List<Type>) • List instances • <a,b,c> • Concrete level • Table parameters, return value • Partition, union • Execution level • Wave-plans

  22. The Abstract Global Algebra

  23. The Concrete Global Algebra

  24. 2.2 Query Execution in AmbientDB (2) • Starting at the leaves • Abstract query plan -> concrete • Concrete operator have concrete result type • Process continue to the root of the query graph • Local result table, hence LT • Local concrete variant of all abstract operators • All tables -> LT • Concrete union • (T1)-> LT • More efficient alternative query plans

  25. 2.2 Query Execution in AmbientDB (3) • select, aggr, order support distributed execution(dist) • Execute in all node on their local partition (LT) of a PT or a DT • Produce again a distributed result (PT or DT) • Broadcast the query through the routing tree • The result is again dispersed over all node as a PT or DT • Aggrmerge = aggrlocal(unionmerge(DT)):LT • Reduce the fragments to be collected in the query node • Save considerable bandwidth

  26. 2.2 Query Execution in AmbientDB (4) • join variants • Broadcast join (LT, T1)->T1 • Foreign-key join (T1,DT)->T1 • Referential integrity to minimize communication • Split join (LT1,T1)->T1 • Reduce bandwidth consumption • O(T*N) -> O(T*log(N)) • partition • A special operator that performs double elimination • Create a PT from a DT by creating a tuple participation bitmap at all nodes • To be able to use the dist operators • We should convert a DT to a PT

  27. Mappings

  28. 2.3 Dataflow Execution (1) • Query processing paradigm • Routing tree using TCP connections is used to pass bi-directional tuple streams • Multiple simultaneous such waves (upward and downward) • Third translation phase • Concrete query plan -> wave-plans • Concrete operator • One or more waves (Local dataflow aglebra operators)

  29. 2.3 Dataflow Execution (2) • dist plans for select, aggr, order and foreign-key join • buffer-to-buffer local operator in each node, without further communication • broadcast join • Propagates a tuple wave through the network • split • Split(<true,true>,<c1,c1>) • Ordered -> effectively forming a DT/PT • scan-select, quick-sort, merge-join, heap-based top-N, ordered aggregation • All stream-based • Require little memory

  30. The Dataflow Algebra

  31. 2.4 Executing the Collaborative Filtering Query (1)

  32. 2.4 Executing the Collaborative Filtering Query (2)

  33. 2.4 Executing the Collaborative Filtering Query (3) • Problems • Query 1 • Large list of all users that have ever listened to the example song • Hog resources from all nodes in the network • Query 2 • Basically send all log record to the query node for aggregation • More efficiently in an AmbientDB enriched with DHTs

  34. 3. DHTs in AmbientDB (1) • Useful lookup structures for large-scale P2P applications • Reduce the amount of nodes involved in answering a query • Involving many nodes • Decrease query performance • Create an overload in the average query frequency • Gnutella (not use DHT or global indices) • Easy to locate popular music • Difficult to locate less wel-known songs

  35. 3. DHTs in AmbientDB (2) • To enable the query optimizer to automatically accelerate selection queries using such DHTs • DHT indices can be exploited by a query optimizer to accelerate lookup queries • Special form of a PT, as the partitions are disjunct • selectchord(DHT):LT • Dataflow level • Route a message to the Chord finger on which the selection key-value hashes • Retrieving all corresponding tuples as an LT via a direct TCP/IP transfer • Non-complete index

  36. DT and DHT in AmbientB

  37. 3.1 Example: Approximated Collaborative Filtering (1) • HISTO • Static histogram of fully-listened-to songs per user • Reduce the histogram computation cost of query

  38. Optimized collaborative filtering query in SQL

  39. 3.1 Example: Approximated Collaborative Filtering (2)

  40. 3.1 Example: Approximated Collaborative Filtering (3)

  41. Network Bandwidth Compared

  42. 4. Conclusion • Full query processing architecture • Executing queries in a declarative, optimizable language, over an ad-hoc P2P network • DHT • Efficient global indices

More Related