The Rise and Fall and Rise of Dependency Theory Part II: The Rise from the Ashes

The Rise and Fall and Rise of Dependency TheoryPart II: The Rise from the Ashes Ronald Fagin IBM Almaden Research Center

Dependencies were Considered Harmful • Dependencies were undesirable • Except for keys and referential integrity constraints • Database normalization eliminated dependencies • BCNF: each FD is a logical consequence of keys • 4NF: each MVD is a logical consequence of keys • 5NF: each JD is a logical consequence of keys

But then: • Dependencies took on a new, very positive role!

Data Integration and Data Exchange Data integration: Describe data in a global schema in terms of data in local schemas Data exchange: Describe data in a target schema in terms of data in a source schema, and actually produce the target database

Data Integration and Data Exchange These are old, but recurrent, database problems • Phil Bernstein – 2003 “Data exchange is the oldest database problem” • EXPRESS: IBM San Jose Research Lab – 1977 • for transforming data between hierarchical databases • The universal relation model is an early case of data integration We will focus mainly on data exchange

SchemaMappings & Data Exchange Σ Source S Target T I J • Schema Mapping M = (S, T, Σ) • Source schema S, Target schema T • High-level, declarative assertions Σ that specify the relationship between S and T • Data Exchangevia the schema mapping M = (S, T, Σ): Transform a given source instance I to a target instance J, so that <I, J> satisfy the specifications Σ of M

Schema Mapping Specification Language The relationship between source and target is typically given by source-to-target tgds (x)  y (x, y) where • (x)is a conjunctionof atoms over the source • (x, y) is a conjunction of atoms over the target (Student(s)  Enrolls(s,c))  t g (Teaches(t,c)  Grade(s,c,g)) There may also be target tgds and egds: Grade(s,c,g))  Grade(s,c,g’))  (g = g’)

New Role of Dependencies • In data exchange, dependencies play a crucial role in describing how to transform data from one format to another

Solutions in Schema Mappings Definition: Schema MappingM = (S, T, Σ) If I is a source instance, then a solutionfor I is a target instance J such that <I, J> satisfy Σ Fact: In general, for a given source instance I, • there may be no solutions at all or • there may be multiple solutions; in fact there may be infinitely many solutions

Universal Solutions in Data Exchange • [Fagin, Kolaitis, Miller, Popa – ICDT 2003] introduced universal solutions as the “best” solutions in data exchange • By definition, a solution is universal if it has homomorphisms to all other solutions • Thus, it is a “most general” solution • Constants: entries in source instances • Variables (labeled nulls): entries besides constants in target instances • Homomorphismh: J1→ J2between target instances: • h(c) = c, if c is a constant • If P(a1,…,am) is in J1,, then P(h(a1),…,h(am)) is in J2

How to Obtain a Universal Solution? • Answer: Use our old friend the chase! Theorem [Fagin, Kolaitis, Miller, Popa – ICDT 2003]: If there is a solution, then the chase produces a universal solution

Standard schema mappings • [Fagin, Kolaitis, Miller, Popa – ICDT 2003] define a weakly acyclic set of tgds • [Deutsch, Tannen - ICDT 2003] have a slightly more restrictive notion • Let a standard schema mapping be one specified by s-t tgds, target egds, and a weakly acyclic set of target tgds. Theorem [Fagin, Kolaitis, Miller, Popa – ICDT 2003]: For standard schema mappings, the chase runs in polynomial time (data complexity)

Query Answering in Data Exchange Σ q Schema S Schema T J I Question: What is the semantics of target query answering? Definition: The certain answers of a query q over T on I certain(q,I) = ∩ { q(J): J is a solution for I } Note: It is the standard semantics in data integration

Computing the Certain Answers Theorem [Fagin, Kolaitis, Miller, Popa – ICDT 2003]: Assume a standard schema mapping. Let q be a union of conjunctive queries over the target. • If I is a source instance and J is a universal solution for I: certain(q,I) = the set of all “null-free” tuples in q(J). • Hence, certain(q,I) is computable in polynomial time • Compute a universalsolution J, using the chase, in polynomial time • Evaluate q(J) and remove tuples with nulls

Composing Schema Mappings M12 M23 • Given M12 = (S1, S2, 12) and M23 = (S2, S3, 23), derive a schema mapping M13 = (S1, S3, 13) that is “equivalent” to the sequence M12 and M23 SchemaS1 SchemaS2 SchemaS3 M13 What does it mean for M13 to be “equivalent” to the composition of M12 and M23?

Semantics of Composition 13 has to have the property that: <I1,I3> ⊨13 if and only if there exists I2such that <I1,I2> ⊨12 and <I2,I3> ⊨23

Result of the composition • Question: If M12 and M23 are each specified by s-t tgds, what language is needed for specifying the composition of M12 and M23? • Answer:[Fagin, Kolaitis, Popa, Tan – PODS 2004]: second-order tgds

Second-Order Tgds Definition:Let S be a source schema and T a target schema. A second-order tuple-generating dependency (SO-tgd) is a formula of the form: f1 … fm( (x1(11))  …  (xn(nn)) ), where • fi is a function symbol • i is a conjunction of atoms over Sand equalities of terms • iis a conjunction of atoms fromT Example: f (e( Emp(e)  Mgr(e,f(e) ) e( Emp(e)  (e=f(e))  SelfMgr(e) ) )

Composition and SO-Tgds Theorem [Fagin, Kolaitis, Popa, Tan – PODS 2004]: • The composition of any finite sequence of schema mappings specified by s-t tgds can be specified by an SO-tgd • Conversely, every SO-tgd specifies the composition of a finite sequence of mappings that are each specified by s-t tgds. • Recently [Arenas, Fagin, Nash – ICDT 2010] showed that the sequence need only be of size 2

Composition with Target Constraints • [Arenas, Fagin, Nash – ICDT 2010] defined s-t SO dependencies, which generalize SO tgds by allowing not only target atoms but also equalities in the conclusion • Theorem [Arenas, Fagin, Nash – ICDT 2010] : • The composition of any finite sequence of standard schema mappings can be specified by an s-t SO dependency (along with target egds and target tgds) • Conversely, every s-t SO dependency specifies the composition of a finite sequence of standard schema mappings • In fact, again, the sequence need only be of size 2 • The chase procedure can be extended to schema mappings specified by s-t SO dependencies, so that it produces universal solutions in polynomial time (data complexity)

Conclusions • Dependencies now play a crucial role in data integration and data exchange • We even have second-order dependencies, which have in fact been implemented in IBM Infosphere Data Architect. • Dependency theory is alive and well!

Extra slides

The Smallest Universal Solution • Fact: Universal solutions need not be unique • Question: Is there a “best” universal solution? • Answer: [Fagin, Kolaitis, Popa – PODS 2003] took a “small is beautiful” approach: There is a smallest universal solution (if solutions exist); hence, the most compact one to materialize • Definition: The core of an instance J is the smallest subinstance J’ that is homomorphically equivalent to J • Fact: • Every finite relational structure has a core • The core is unique up to isomorphism

Core: The smallest universal solution Theorem [Fagin, Kolaitis, Popa – PODS 2003] : • All universal solutions have the same core • The core of the universal solutions is the smallest universal solution • If the target constraints are egds, then the core is polynomial-time computable (data complexity) Theorem [Gottlob and Nash – PODS 2006]: If the target constraints are egds and a weakly acyclic set of tgds, then the core is polynomial-time computable

Old Conclusions • Dependencies now play a crucial role in data integration and data exchange • We even have second-order dependencies, which have in fact been implemented in practice! • Lately, even probabilistic dependencies have been studied • [Dong, Halevy, Yu – VLDB 2007] • [Das Sarma, Dong, Halevy – SIGMOD 2008] • [Fagin, Kimelfeld, Kolaitis – ICDT 2010] • Probabilistic dependencies on probabilistic databases • Dependency theory is alive and well!

The Rise and Fall and Rise of Dependency Theory Part II: The Rise from the Ashes

The Rise and Fall and Rise of Dependency Theory Part II: The Rise from the Ashes

Presentation Transcript

The Rise of Europe

Mohammed and the Rise of Islam

World War II

The Age of Industrialization

Chapter 9: Late Roman Empire, Judaism, and the Rise of Christianity

The Rise of Realism: 1850-1900

Sea Level Rise and Small Glaciers

The Rise of Nationalism

The Rise of the City

STRUCURAL DESIGN OF A HIGH RISE BUILDING IN THE UAE

Rise of New Leaders and Ideas in Europe during the 1930s

The emperor’s new paradigm

The Rise of Ancient Civilization

FC.101 THE DEVELOPMENT OF THE ENLIGHTENMENT STATE

Our God

gdro

The Rise of Islam

Venice: Rise and Fall of a Great City-State

The Rise of the Totalitarianism in Europe

Europe Renaissance – 1400’s, 1500’s Rise of Vatican 16 th c.

The Rise and Fall of Finnish Language-Education Policy: A Blueprint How to Bring about a Crisis

Schedule