190 likes | 361 Vues
Data Management for Peer-to-Peer Computing: A Vision. Ali Rahbari. Outline. P2P Data Networks Why P2P Databases are Different A P2P Database Scenario A logic for P2P Databases Propagation Strategy Architecture and Implementation Issues. P2P Data Networks: Basic Notions. Node
 
                
                E N D
Data Management for Peer-to-Peer Computing: A Vision Ali Rahbari
Outline • P2P Data Networks • Why P2P Databases are Different • A P2P Database Scenario • A logic for P2P Databases • Propagation Strategy • Architecture and Implementation Issues
P2P Data Networks: Basic Notions • Node • Database, File System, etc • P2P network • Indexed nodes with equal participant rights • Services • Query answering • Query, results and update propagation • Locality • No global schema, no centralized control • Nodes have only a partial vision of the world • Autonomy • Nodes are largely independent of their language and content, etc
Roles for P2P DBs? • Peers come and go, but must still be able to interoperate. • To us, the big question is how to cope with DBs that • are incomplete, overlapping, and mutually inconsistent • dynamically appear and disappear • have limited connectivity. • Scenario • Databases of medical patients • Complete integration is likely to be infeasible • But dynamic integration of DBs relevant to one patient could have high value.
A Model for P2P Databases • Each peer is a node with a database. It exchanges data and services with acquaintances (i.e. other peers). • The set of acquaintanceschanges often, due to • site availability • changing usage patterns • Peers are fully autonomous. • No global control or central server.
D: Doctor P: Pharmacist H: Hospital A Motivating Scenario • A patient may be described in several DBs, which use different patient id formats, disease descriptions, etc. • But the databases can use different patient id formats, disease descriptions, etc • When a patient is admitted to the hospital, H becomes acquainted with D • The acquaintance is dropped when treatment is over • When the doctor prescribes a drug, D becomes acquainted with P • A patient is injured skiing, so more DBs get involved Ski Clinic
Proposal: Local Relational Model (LRM) • A logic for P2P data integration • Instead of a global schema, each peer has • coordination formulas – each specifies semantic interdependencies between two acquaintances • binarydomain relations – each specifies how symbols in one database translate to symbols in an acquaintance’s database. • Each expression in a coordination formula is relative to just one participating database • Use coordination formulas and domain relations for query and update processing.
A Coordination Formula • p: pharmacist DB medication(PrescriptionID, PatientID, Prod) • d: doctor DB treatment(TreatmentID, PatientID, Description, Type) where type {“hospital”, “home”} • (i:x).A(x) means for all xin the domain of databasei, A(x) is true. • A coordination formula: (p:y).(p:z).(p: (x).medication(x, y, z)  d: (w).treatment(w, y, z, “home”) ) “There’s a row in treatment in the doctor DB for each row in medication in the pharmacist DB”
Domain Relation • A row <d1,d2> in domain relation rikspecifies that valued1 in DBicorresponds to value d2 in DBk • rikmay be partial • rik,rki need not be symmetric • Example - DBicontains lengths in meters and DBk in kilometers (total but not symmetric) • rik(x) = roundToClosestK(x) rik(653)=1, rik(453)=0 • rki(x) = x*1000 rki(1)=1000
Queries • A query is a coordination formula of the form A(x) i: q(x), where • A(x) is a coordination formula • x has n variables • i is the database against which the query is posed • q is a new n-ary predicate symbol • A relational space is a pair <db,r> where db is a set of DBs and r associates an rik with each pair of DBs • <db,r> ⊨ f A relational space <db,r> satisfies a coordination formula f • The answer to a query: {ddomi| <db,r> ⊨ ((i:x).A(x)  i:x=d)}
Interpreting a Query • A query: ((i:P(x) j:R(y)) k:S(x,y) )  h:q(x,y) • Evaluate P,R,S ini,j,k (respectively) • Map these results via rih,rjh,rkhto sets si,sj,sk • And then compute ((sisj) sk)
P2P Databases: Proposed Solution Coordinate query and update exchange between autonomous DBs using: • Coordination Formulas • Specify semantic interdependencies between data from two nodes table to table: Cust Customer column to column: name(Cust)  nm(Customer) • Binary Domain Relations • Specify how the symbols used in one database translate to symbols used in another database ‘one’  ‘uno’ CAN$1.00  US$0.65 • Keep AUTONOMY and COORDINATION, as much as possible
What’s New in the Solution? • No global schema, no central registry, no form of control • No need of system restructuring when new nodes come and old ones go away • We do not integrate, we COORDINATE. • Integration is built at design time • coordination happens at runtime
Propagation Strategy: Basic notions • Acquaintance • Pair of nodes which have coordination formulas and binary domain relations with respect to each other • Acquaintances can exchange data and services • Interest Group • Set of nodes with inter-acquaintances between them which have related content • Group Manager • Node of an Interest Group, which is dedicated for group and query propagation management • GM has higher requirements for stability, must be permanently active • Query Scope • Set of nodes which are supposed to answer a given query. Query Scope is defined by Group Manager
Query Propagation Strategy “no more propagation from 8” “no more propagation from 9” 5. “nodes 2 and 4 are reached” “node 8 is reached” “node 6 is reached” GM • User submits query Q () • Node defines query topic • Node sends to Group Manager (GM) request to define Query Scope (QS) • GM computes and sends back QS • Node 1 sends query to acquaintances in QS, and reports this fact to GM • Nodes 2 and 4 send answer to node 1 • Nodes propagate the query to theirs acquaintances from QS and report this fact to GM • And so on… • Nodes which do not propagate any further, report this fact to GM • Propagation stops when “no more propagation” received from all boundary nodes 3. QS (, topic) = ? 4. QS (, topic)= (2, 4, 6, 8, 9, 11) 9 6 2 2. Q (, topic) ←Res2 10 7 1. Q () ←Res4 1 4 11 3 5 8
Implementation Architecture • A classic multi-database system, with • A protocol for adding/dropping acquaintances • LRM query processing (domain mapping logic) that can cope with chains of acquaintances • Dynamic approach to materialized view creation • Tools to help a user establish an acquaintance
Architecture • P2P Layer • P2P functionality’s add-on • Local Data Source • Database • File system • User Interface • User queries • Results • Query Manager and Update Manager • Responsible for query and update propagation • Manage coordination and correspondence rules, acquaintances, and interest groups • Wrapper • Provides a translation layer between QM and UM, and LDS
Summary • Why P2P databases are different • A P2P database scenario • A logic for P2P databases (LRM) • Coordination formulas and domain relations • Query semantics • Architecture and implementation issues
منابع • 1. M.J. Carey, L.M. Haas, P.M. Schwarz, Manish Arya, W.F. Cody, R. Fagin, M. Flickner, A. Luniewski, W. Niblack, D. Petkovic, J. Thomas II, J.H. Williams, E.L. Wimmers: Towards heterogeneous multimedia information systems: The Garlic approach. RIDE-DOM 1995: 124-131. • 2. T. Catarci and M. Lenzerini. Representing and using interschema knowledge in cooperative information systems. International J. of Intelligent and Cooperative Info. Sys., 2(4), 375-398, 1993. • 3. S. Ceri and J. Widom. Managing semantic heterogeneity with production rules and persistent queues. In Proceedings 19thVLDB (1993), 108-119. • 4. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J.D. Ullman, J. Widom. The TSIMMIS Project: Integration of heterogeneous data sources. 16thMeeting of Information Processing Society of Japan, 1994, 7–18. • 5. A. Gupta and J. Widom. Local verification of global integrity constraints in distributed databases. In Proc. ACM SIGMOD Conference, 49-58, 1993.