Data Management for Peer-to-Peer Computing: A Vision

Data Management for Peer-to-Peer Computing: A Vision Ali Rahbari

Outline • P2P Data Networks • Why P2P Databases are Different • A P2P Database Scenario • A logic for P2P Databases • Propagation Strategy • Architecture and Implementation Issues

P2P Data Networks: Basic Notions • Node • Database, File System, etc • P2P network • Indexed nodes with equal participant rights • Services • Query answering • Query, results and update propagation • Locality • No global schema, no centralized control • Nodes have only a partial vision of the world • Autonomy • Nodes are largely independent of their language and content, etc

Roles for P2P DBs? • Peers come and go, but must still be able to interoperate. • To us, the big question is how to cope with DBs that • are incomplete, overlapping, and mutually inconsistent • dynamically appear and disappear • have limited connectivity. • Scenario • Databases of medical patients • Complete integration is likely to be infeasible • But dynamic integration of DBs relevant to one patient could have high value.

A Model for P2P Databases • Each peer is a node with a database. It exchanges data and services with acquaintances (i.e. other peers). • The set of acquaintanceschanges often, due to • site availability • changing usage patterns • Peers are fully autonomous. • No global control or central server.

D: Doctor P: Pharmacist H: Hospital A Motivating Scenario • A patient may be described in several DBs, which use different patient id formats, disease descriptions, etc. • But the databases can use different patient id formats, disease descriptions, etc • When a patient is admitted to the hospital, H becomes acquainted with D • The acquaintance is dropped when treatment is over • When the doctor prescribes a drug, D becomes acquainted with P • A patient is injured skiing, so more DBs get involved Ski Clinic

Proposal: Local Relational Model (LRM) • A logic for P2P data integration • Instead of a global schema, each peer has • coordination formulas – each specifies semantic interdependencies between two acquaintances • binarydomain relations – each specifies how symbols in one database translate to symbols in an acquaintance’s database. • Each expression in a coordination formula is relative to just one participating database • Use coordination formulas and domain relations for query and update processing.

A Coordination Formula • p: pharmacist DB medication(PrescriptionID, PatientID, Prod) • d: doctor DB treatment(TreatmentID, PatientID, Description, Type) where type {“hospital”, “home”} • (i:x).A(x) means for all xin the domain of databasei, A(x) is true. • A coordination formula: (p:y).(p:z).(p: (x).medication(x, y, z)  d: (w).treatment(w, y, z, “home”) ) “There’s a row in treatment in the doctor DB for each row in medication in the pharmacist DB”

Domain Relation • A row <d1,d2> in domain relation rikspecifies that valued1 in DBicorresponds to value d2 in DBk • rikmay be partial • rik,rki need not be symmetric • Example - DBicontains lengths in meters and DBk in kilometers (total but not symmetric) • rik(x) = roundToClosestK(x) rik(653)=1, rik(453)=0 • rki(x) = x*1000 rki(1)=1000

Queries • A query is a coordination formula of the form A(x) i: q(x), where • A(x) is a coordination formula • x has n variables • i is the database against which the query is posed • q is a new n-ary predicate symbol • A relational space is a pair <db,r> where db is a set of DBs and r associates an rik with each pair of DBs • <db,r> ⊨ f A relational space <db,r> satisfies a coordination formula f • The answer to a query: {ddomi| <db,r> ⊨ ((i:x).A(x)  i:x=d)}

Interpreting a Query • A query: ((i:P(x) j:R(y)) k:S(x,y) )  h:q(x,y) • Evaluate P,R,S ini,j,k (respectively) • Map these results via rih,rjh,rkhto sets si,sj,sk • And then compute ((sisj) sk)

P2P Databases: Proposed Solution Coordinate query and update exchange between autonomous DBs using: • Coordination Formulas • Specify semantic interdependencies between data from two nodes table to table: Cust Customer column to column: name(Cust)  nm(Customer) • Binary Domain Relations • Specify how the symbols used in one database translate to symbols used in another database ‘one’  ‘uno’ CAN$1.00  US$0.65 • Keep AUTONOMY and COORDINATION, as much as possible

What’s New in the Solution? • No global schema, no central registry, no form of control • No need of system restructuring when new nodes come and old ones go away • We do not integrate, we COORDINATE. • Integration is built at design time • coordination happens at runtime

Propagation Strategy: Basic notions • Acquaintance • Pair of nodes which have coordination formulas and binary domain relations with respect to each other • Acquaintances can exchange data and services • Interest Group • Set of nodes with inter-acquaintances between them which have related content • Group Manager • Node of an Interest Group, which is dedicated for group and query propagation management • GM has higher requirements for stability, must be permanently active • Query Scope • Set of nodes which are supposed to answer a given query. Query Scope is defined by Group Manager

Query Propagation Strategy “no more propagation from 8” “no more propagation from 9” 5. “nodes 2 and 4 are reached” “node 8 is reached” “node 6 is reached” GM • User submits query Q () • Node defines query topic • Node sends to Group Manager (GM) request to define Query Scope (QS) • GM computes and sends back QS • Node 1 sends query to acquaintances in QS, and reports this fact to GM • Nodes 2 and 4 send answer to node 1 • Nodes propagate the query to theirs acquaintances from QS and report this fact to GM • And so on… • Nodes which do not propagate any further, report this fact to GM • Propagation stops when “no more propagation” received from all boundary nodes 3. QS (, topic) = ? 4. QS (, topic)= (2, 4, 6, 8, 9, 11) 9 6 2 2. Q (, topic) ←Res2 10 7 1. Q () ←Res4 1 4 11 3 5 8

Implementation Architecture • A classic multi-database system, with • A protocol for adding/dropping acquaintances • LRM query processing (domain mapping logic) that can cope with chains of acquaintances • Dynamic approach to materialized view creation • Tools to help a user establish an acquaintance

Architecture • P2P Layer • P2P functionality’s add-on • Local Data Source • Database • File system • User Interface • User queries • Results • Query Manager and Update Manager • Responsible for query and update propagation • Manage coordination and correspondence rules, acquaintances, and interest groups • Wrapper • Provides a translation layer between QM and UM, and LDS

Summary • Why P2P databases are different • A P2P database scenario • A logic for P2P databases (LRM) • Coordination formulas and domain relations • Query semantics • Architecture and implementation issues

منابع • 1. M.J. Carey, L.M. Haas, P.M. Schwarz, Manish Arya, W.F. Cody, R. Fagin, M. Flickner, A. Luniewski, W. Niblack, D. Petkovic, J. Thomas II, J.H. Williams, E.L. Wimmers: Towards heterogeneous multimedia information systems: The Garlic approach. RIDE-DOM 1995: 124-131. • 2. T. Catarci and M. Lenzerini. Representing and using interschema knowledge in cooperative information systems. International J. of Intelligent and Cooperative Info. Sys., 2(4), 375-398, 1993. • 3. S. Ceri and J. Widom. Managing semantic heterogeneity with production rules and persistent queues. In Proceedings 19thVLDB (1993), 108-119. • 4. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J.D. Ullman, J. Widom. The TSIMMIS Project: Integration of heterogeneous data sources. 16thMeeting of Information Processing Society of Japan, 1994, 7–18. • 5. A. Gupta and J. Widom. Local verification of global integrity constraints in distributed databases. In Proc. ACM SIGMOD Conference, 49-58, 1993.

Data Management for Peer-to-Peer Computing: A Vision

Data Management for Peer-to-Peer Computing: A Vision

Presentation Transcript

Web-based Journal Manuscript Management and Peer-Review Software and Systems

Alternative Peer Review : Quality Management for 21 st Century Scholarship

Trust Management

A Survey of Peer-to-Peer Content Distribution Technologies

Trust

Peer Influence

Peer-to-Peer Systems

The Use of Recovery Coaches and Peer Supports

LINK THINK TANK

Lecture 12 Overview

ATS Peer Observations Ramp Agents

Reliable Distributed Systems

A Framework for Structured Peer-To-Peer Systems

An Overview of Data Communication in LabVIEW

CS 552 Peer 2 Peer Networking

Peer Review