170 likes | 295 Vues
The role of a Mediator in R-GMA. Manfred Oevers IBM Andrew Cooke Heriot Watt Laurence Field RAL Steve Fisher RAL James Magowan IBM Werner Nutt Heriot Watt Howard Williams Heriot Watt. Schema & Contributions. Contributions are Views. SELECT * FROM CPULoad
 
                
                E N D
The role of a Mediator in R-GMA Manfred OeversIBM Andrew CookeHeriot Watt Laurence Field RAL Steve FisherRAL James MagowanIBM Werner NuttHeriot Watt Howard Williams Heriot Watt
Contributions are Views SELECT * FROM CPULoad WHERE Country = ‘UK’ AND Site = ‘RAL’ SELECT * FROM CPULoad WHERE Country = ‘UK’ AND Site = ‘GLA’
The Scenario Ga relational schema (for a virtual database) qqueries posed againstG pproducers, associated with views onG Currently views have the form: SELECT * FROM r WHERE < ??? > The Mediator: how to match q with the p’s
A Concise Notation • CREATE TABLE cpuLoad(Loc,M,L) • SELECT Loc,M FROM cpuLoad WHERE Loc=‘RAL’ and L >= 70 • (Loc,M) | cpuLoad(RAL,M,L) & L >=80
Satisfiability The Problem: “For all locations give me all machines with a cpu load L >= 70” q: (Loc, M) | cpuLoad(Loc, M, L) & L >= 70 p1: (ral, M, L) | cpuLoad(ral, M, L) & L >= 80 p2: (hw, M, L) | cpuLoad(hw, M, L) & L >= 50 p3: (gla, M, L) | cpuLoad(gla, M, L) & L <= 20 The Query Plan: (Loc, M) | p1(Loc, M, L) U (Loc, M) | p2(Loc, M, L) & L >= 70
Satisfiability (issues) Implementation: • What are suitable sources? This involves checking satisfiability of constraints - a task for the Registry? • Who computes “load L >= 70” ? • The Mediator? Or the Producer? • What are the capabilities of a Producer? • Which are relevant? • Where are these recorded?
Completeness The Problem: “Find all machines that are not in USA and have diskspace S >= 100” q: M | DiskSpace(M, S) & S > 100 & NOT InUSA(M) p1: (M, S) | DiskSpace(M, S) p2: M | InUSA(M) The Query Plan: M | p1(M, S) & S > 100 & NOT InUSA(M)
Completeness(issues) Implementation: • What if p1 doesn’t know about all machines? We might not get all answers for our query (“incompleteness”) • What if p2 doesn’t know about all US machines? • We might get answers that don’t satisfy our query (“incorrect” answers). • What is the yardstick for completeness?
Projection Views (1) Popular queries stored by an Archiverarmay involve projection, e.g. “all machines with disk space S >= 50” ar: M | DiskSpace(M, S) & S >= 50 The Problem: “get all machines with S >= 30” q: M | DiskSpace(M, S) & S >= 30 Can we compute answers forq, even though no diskspace values are stored?
Projection Views (2) Query Plan: • In all possible instances of this database, machines stored inarhave diskspace S >= 50 • Thus,arprovides certain answers to queryq What if the values 50/ 30 are swapped?
Projection Views (3) “all machines with disk space S >= 30” ar: M | DiskSpace(M, S) & S >= 30 The Problem: “get all machines with S >= 50” q: M | DiskSpace(M, S) & S >= 50 • In some instances, all machines inarwill be correct answers toq… in others, not. • Thus,arwould not provide certain answers.
Link(x,y) ral007 ibm747 Diskspace = 24 Diskspace = 90 hw666 Diskspace= ? gla999 Diskspace = 10 Computing certain answers can be costly (1) The Problem: q: M | Link(X,Y) & DiskSpace(Y, S1) & S1 >= 50 & Link(Y,Z) & DiskSpace(Z, S2) & S2 < 50 Is ral007 a certain answer?
Computing certain answers can be costly (2) The Problem: “Find all machines that are linked to another with a diskspace >= 50, which is in turn linked to one with a diskspace < 50.” q: X | Link(X,Y) & DiskSpace(Y, S1) & S1 >= 50 & Link(Y,Z) & DiskSpace(Z, S2) & S2 < 50 Is ral007 a certain answer? The Answer: It is! But we have to reason about all cases...
Early Conclusions (1) First Problem: Semantics • What are the answers we expect from our queries? Certain answers? A subset of these? • So far we have not looked at time, which will raise further questions. • We need to clarify what producer views mean? (Completeness? To what degree?) Semantics are not too difficult when there are no projection views (or aggregation). Query planning techniques exist for special cases, e.g. select/project/join views and queries without comparisons (<, >, …).
Early Conclusions (2) The Mediator needs Helpers • Who decides which sources are relevant for a query? • The Registry? • The Mediator? (but higher network load). • Can Producers do: • selections? • joins? (several producers may be attached to one DBMS)
Early Conclusions (3) What will the Mediator do? • Construct a set of logical plans = query over some producers • Identify logical plans that are feasible (e.g. input bindings: “no phone no. without a name”) • Construct an execution plan • which concrete operations, when (e.g. selection, sort-merge join... • joining becomes complex! • Choose the best/ cheapest plan • Execute the plan