280 likes | 412 Vues
The paper introduces LIVE, a novel Database Management System (DBMS) that supports uncertainty and lineage through an Uncertainty-Lineage Database (ULDB) model. It discusses the Trio system's architecture, query evaluation techniques, and the implications of versioning in databases. Key topics include handling uncertain data via tuple alternatives and confidence annotations, as well as implementing data modifications (inserts, deletes, updates) while preserving lineage and versioning integrity. The paper highlights experimental outcomes and addresses the complexities of query evaluations with uncertainty.
E N D
LIVE A lineage-supported, versioned DBMS Anish Das Sarma Martin Theobald Jennifer Widom
Agenda • ULDB Data Model and the Trio System • Uncertainty & Lineage • LIVE Data Model (LDM) • Uncertainty, Lineage & Versioning • Data Modifications • Insert/Delete Tuples, Update Values, Update Confidences • Query Evaluation • Valid-At vs. Snapshot Queries, Interval Computations, Confidence Computations, Complexity • Experiments/Conclusions LIVE - A lineage-supported, versioned DBMS
ULDB Data Model • Different types of uncertainty: • 1. Tuple Alternatives • 2. ‘?’ (Maybe) Annotations • 3. Confidences • Implementation of the ULDB data model: • Trio System • TriQL query language • TrioExplorer browser frontend, trioplus client, API • Enhanced PostgreSQL backend (SPI) • Search for “Stanford Trio” LIVE - A lineage-supported, versioned DBMS
ULDBs – Alternatives • 1. Alternatives:uncertainty about attribute values • 2. ‘?’ (Maybe) Annotations • 3. Confidences Three possible worlds LIVE - A lineage-supported, versioned DBMS
ULDBs – Maybe Annotations • 1. Alternatives • 2.‘?’ (Maybe): uncertainty about tuple presence • 3. Confidences ? Six possible worlds LIVE - A lineage-supported, versioned DBMS
ULDBs – Confidences • 1. Alternatives • 2. ‘?’ (Maybe) Annotations • 3. Confidences: weighted uncertainty ? Six possible worlds, each with a probability LIVE - A lineage-supported, versioned DBMS
ULDBs – Closure Suspects= πperson(Saw ⋈ Drives) CANNOT Does not correctly capture possible worlds in the result! ? ? ? LIVE - A lineage-supported, versioned DBMS
ULDBs – Lineage Suspects= πperson(Saw ⋈ Drives) ? ? ? λ(31) = (11,2)(21,2) λ(32,1) = (11,1)(22,1) ; λ(32,2) = (11,1)(22,2) λ(33) = (11,1)23 LIVE - A lineage-supported, versioned DBMS
ULDBs – Summary Uncertainty-Lineage Databases (ULDBs) • Alternatives • ‘?’ (Maybe) Annotations • Confidences • Lineage • ULDBs are closed and complete LIVE - A lineage-supported, versioned DBMS
Lineage & Confidences • Can exclusively utilize lineage in order to compute the confidence of a result tuple. • #P-complete for general Boolean formulas • Approximation algorithms: Luby-Karp, etc. Select distinct car from Saw; 0.99 λ(21) = (11 12 13) P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5) LIVE - A lineage-supported, versioned DBMS
Versioning (LDM Data Model) • Version intervals for tuples • Contiguous version numbers0,…, • Database has current version vD • Tuples have a validity intervals[s, e] • Valid-At Queries: • Select * from Photo valid-at 2; • Snapshot Queries: • View Photo at 2; • Possible Worlds: • LDM databases encode lists of sets of possible worlds. LIVE - A lineage-supported, versioned DBMS
Data Modifications – Insert • Insert Tuple: • Insert t with version [vD+1,] • commit; Increase vD (2) (1) (2) LIVE - A lineage-supported, versioned DBMS
Data Modifications – Delete • Insert Tuple: • Insert t with version [vD+1,] • Delete Tuple: • Set end(t) to vD • commit; Increase vD (3) (2) (1) (2) LIVE - A lineage-supported, versioned DBMS
Data Modifications – Update • Insert Tuple: • Insert t with version [vD+1,] • Delete Tuple: • Set end(t) to vD • Update Value: • Set end(t) to vD • Insert t’ with version [vD+1,] • commit; Increase vD (4) (3) (2) (1) (2) (4) LIVE - A lineage-supported, versioned DBMS
Data Modifications – Update • Insert Tuple: • Insert t with version [vD+1,] • Delete Tuple: • Set end(t) to vD • Update Value: • Set end(t) to vD • Insert t’ with version [vD+1,] • Update Probability: • Set end(t) to vD • Insert t’=t with probability p’ and version [vD+1,] • commit; Increase vD (4) (3) (2) (1) (2) (4) (5) LIVE - A lineage-supported, versioned DBMS
Data Modifications – Summary • Insert Tuple: • Insert t with version [vD+1,] • Delete Tuple: • Set end(t) to vD • Update Value: • Set end(t) to vD • Insert t’ with version [vD+1,] • Update Probability: • Set end(t) to vD • Insert t’=t with probability p’ and version [vD+1,] • Possible worlds: • Updates may create duplicate worlds, which are merged (at any version v). (4) (3) (2) (1) (2) (4) (5) LIVE - A lineage-supported, versioned DBMS
Query Evaluation implementation of Q • 1) Data Computation (regular SQL, including lineage) • 2) Interval Computation (stored procedure) D D + Result operational semantics possible worlds at versions encoding of possible worlds Qon each world D1, D2, …, Dn1 Q(D1), Q(D2), …, Q(Dn) @ (0) @ (0) … D1, D2, …, Dn2 @ (1) … D1, D2, …, Dnv @ (vD) LIVE - A lineage-supported, versioned DBMS
Lineage, Confidences & Versions • Can exclusively utilize lineage in order to compute the confidence of any result tuple. • Can exclusively utilize lineage in order to compute the version interval of any result tuple. LIVE - A lineage-supported, versioned DBMS
Version Interval Computation • Positive Lineage (disjunctions & conjunctions) • In the lineage formulaλ(t) • Replace every tuple t’ by its version interval • Replace every with and every with Select distinct car from Saw; [1,]: 0.99 λ(21) = (11 12 13) P(21) = 1 – (1-0.8) X (1-0.9) X (1-0.5) LIVE - A lineage-supported, versioned DBMS
Version & Confidence Computation • Positive Lineage (disjunctions & conjunctions) • In the lineage formulaλ(t) • Replace every tuple t’ by its version interval • Replace every with and every with Select distinct car from Saw; Select distinct car from Saw valid-at 2; [1,] : 0.98 λ(21) = (11 12) P(21) = 1 – (1-0.8) X (1-0.9) LIVE - A lineage-supported, versioned DBMS
Interval Computations & Query Plans • Can decouple interval computation from data computation • Or:push interval computation into query plans only when there is no negation. Select R.A from R EXCEPT ( Select R.A from R EXCEPT Select S.A from S ); Select R.A from R,S Where R.A=S.A; t=(a)[0,10] – t=(a)[5,10] r=(a)[0,10] u=(a)[0,10] – r=(a)[0,10] s=(a)[5,15] r=(a)[0,10] s=(a)[5,15] LIVE - A lineage-supported, versioned DBMS
Complexity Results • Positive Lineage (disjunctions & conjunctions) • Version interval computation • PTIME (linear) • Confidence computation • #P-complete • Arbitrary Lineage(including negation) • Version interval computation • PTIME (linear) if all confidences are known • NP-hard if confidences are not known (need to check for idempotence of negated tuples) • Confidence computation • #P-complete LIVE - A lineage-supported, versioned DBMS
Experiments – Setup • Probabilistic & versioned TPC-H setting • Queries over Lineitem, Orders tables with varying join selectivity from 0.1% to 1% (6,000-60,000 and1,500-15,000 tuples for Lineitem & Orders) • Update 0.1% to 1% of the input data • Assign probabilities within [0,1] uniform-randomly to tuples • Additional indexes for versioning • Two B+-trees on (start, end)and end points of intervals • Rewrite valid-at & snapshot queries using WHERE (start ≤ v ≤ end)predicates LIVE - A lineage-supported, versioned DBMS
Experiments – Results (I) • Join query • Overhead of versioned system vs. non-versioned system (versions not computed) (%) • Join query • Overhead of computing versions (versioned system) LIVE - A lineage-supported, versioned DBMS
Experiments – Results (II) • Join query • Progressive data updates (overwrite multiple times) • Join query • Valid-at queries vs. • full version computation LIVE - A lineage-supported, versioned DBMS
Experiments – Results (III) • Overhead of version computation, different query types (1% data modified) LIVE - A lineage-supported, versioned DBMS
Conclusions • LDMs are closed and complete • Generalizes to full ULDB data model (including value alternatives & maybe (?) annotations) • Can employ lineage also for update propagations • Supports all of INSERT/DELETE/UPDATE with INTERSECT/UNION/EXCEPT set operations Uncertainty Versioning Lineage DBMS LIVE - A lineage-supported, versioned DBMS