1 / 47

Databases With Uncertainty And Lineage

Databases With Uncertainty And Lineage. Written by : Omar Benjelloun · Anish Das Sarma · Alon Halevy · Martin Theobald · Jennifer Widom Presented by : Alex Gorodetsky. Outline. Lineage? Uncertainty? Combined together? The logic behind the given solution

tyler
Télécharger la présentation

Databases With Uncertainty And Lineage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Databases With Uncertainty And Lineage Written by : Omar Benjelloun · Anish Das Sarma · Alon Halevy · Martin Theobald · Jennifer Widom Presented by : Alex Gorodetsky

  2. Outline • Lineage? Uncertainty? Combined together? • The logic behind the given solution • The solution – The Trio system • Conclusions and future work

  3. Databases with lineage • Definition: LDB Dis a triple (, S, λ), where • = • S = I () ∪ Eis a set of symbols. • E is external lineage - a set of external symbols. • λis a lineage function from S to

  4. Databases with lineageExample :“crime-solver” database • In this example, the lineage function λ is an obvious lineage function for the tuples in the result. • Some operations have less obvious lineage functions, such as: • Negation • Duplicate-elimination • Aggregation • The operations we consider in this paper: • Simple well-behaved lineage functions.

  5. Uncertain databases • An uncertain database represents a set of possible instances, each of which is one possible state of the database. • Definition: An x-tuple is a multiset of tuples, called alternatives. • Definition: A maybe x-tuple is a x-tuple annotated with a ‘?’. • Definition: An x-relation is a multiset of x-tuples.

  6. Uncertain databases • X-relations is a specific formalism for uncertain databases. • X-relations provide a good balance of simplicity and expressiveness, and are orthogonal to the capabilities brought in by lineage.

  7. Uncertain databasesExample • Here, Amy may have seen a Mazda, a Toyota, or no car at all, and the relation has three possible instances. • x-relations are not a complete formalism. • For example, the join Accuses cannot be represented as an x-relation: • x-tuples are independent, so they cannot express the fact that if Amy accuses Jimmy (due to the Mazda), then she must accuse Billy as well. • We will soon see that although x-relations alone are incomplete, adding lineage makes them complete.

  8. Data Integration Systems • Systems that offer a uniform interface to a multitude of data sources. • We will focus on the semantics integration.

  9. Data Integration - Any Problem? • “Subjectivity” effect - No standard. • As data integration applications strive to offer a single “objective” and coherent integrated view of data sources, uncertainty is bound to appear. • 3 main kinds of uncertainty in data integration applications: • Data – automatic extraction from unstructured data. • Mappings between the schemas. • Mappings between data objects in different sources.

  10. Uncertainty, Lineage and Data Integration • Consider an Uncertainty-generating data integration operations are performed. • Lineage keeps track of the origins of data: • Manages uncertainty • Explains uncertainty • Potentially correct the uncertainty • So maybe we need to consider combining uncertainty with lineage in databases?

  11. Outline • Lineage? Uncertainty? Combined together? • The logic behind the given solution • The solution – The Trio system • Conclusions and future work

  12. Combining lineage and uncertainty • Definition: A ULDBD is a triple (, S, λ), where • = , =x-relation • Sis a set of symbols containing I () • λis a lineage function from S to

  13. Combining lineage and uncertainty Example • We combine the uncertain Saw x-relation with the earlier Drives relation to create a new version of Accuses that has both uncertainty and lineage:

  14. Combining lineage and uncertainty • Definition: Let D = (, S, λ) be a ULDB. A possible LDB of D is obtained as follows. • Pick a set of symbols ⊆ S such that:

  15. Combining lineage and uncertainty • Definition: Let D = (, S, λ) be a ULDB. A possible LDB of D is obtained as follows. • Pick a set of symbols ⊆ S such that: • Alternatives of the same x-tuple are mutually exclusive.

  16. Combining lineage and uncertainty • Definition: Let D = (, S, λ) be a ULDB. A possible LDB of D is obtained as follows. • Pick a set of symbols ⊆ S such that: • If an alternative is present in a possible instance, so must be the alternatives it was derived from.

  17. Combining lineage and uncertainty • Definition: Let D = (, S, λ) be a ULDB. A possible LDB of D is obtained as follows. • Pick a set of symbols ⊆ S such that: • An x-tuple must yield a tuple in a possible instance unless: • It is a maybe x-tuple, and • None of its alternatives has a nonempty lineage that would have been consistent with condition 2.

  18. Combining lineage and uncertaintyExample • Consider the choices for x-tuple 21 of Saw, do we satisfy all the conditions? • Why did we add them? • To satisfy condition 3

  19. Combining lineage and uncertaintyExample • Can we add (42,1)? • No. Why? • condition 2 would be violated

  20. Combining lineage and uncertaintyExample • All in all we have 3 possible LDBs:

  21. Combining lineage and uncertainty -Completeness • Definition: a formalism is complete if it is possible to represent any set of possible instances within the formalism. • Theorem – ULDBs Completeness: Given any set of possible LDB’s P={P1,P2,..., Pm} over relations R={R1, R2,..., Rn}, there exists a ULDB D = (, S, λ) whose possible LDB’s are P.

  22. Combining lineage and uncertainty -Well-behaved lineage • Definition: The lineage of an x-tuple is well-behaved if it satisfies : • Acyclic • Deterministic • Uniform • Definition: a ULDB D = (, S, λ) is well-behaved if all its x-tuples have well-behaved lineage. • Well-behaved lineage is well-suited for database queries.

  23. Combining lineage and uncertainty –DL-monotonic queries • Intuitively, any operation that can produce its results in a “tuple-by-tuple” fashion is DL-monotonic. • Aggregation, duplicate-elimination, and some set operators are not DL-monotonic. • From now on we assume all queries Q to be DL-monotonic.

  24. Combining lineage and uncertainty – Query evaluation(Algorithm 1) ? Q ?

  25. Combining lineage and uncertainty – Applying a query to a ULDB • Theorem: Given a ULDB D and a query Q: • Algorithm 1 returns Q(D). • If D is a well-behaved ULDB, then so is Q(D).

  26. Combining lineage and uncertainty –ULDB minimality • ULDBs do not have a unique representation. • We can have two different x-relations that have exactly the same set of possible instances. • Two notions of ULDB minimality: • Data minimality • Lineage minimality

  27. Combining lineage and uncertainty –Data minimality • D-minimality: A ULDB D is D-minimal if it does not include any extraneous alternatives or ‘?’s.

  28. Combining lineage and uncertainty –Lazy Algorithm for D-minimization • Extraneous ‘?’: • (44,1) • Extraneous alternative - Search recursively : • (Carol,Acura,Lexus)

  29. Combining lineage and uncertainty –Lineage minimality

  30. Combining lineage and uncertainty –Extraction • Extraction is important in the context of data integration. • A flexible way to bring into the ULDB just the data that is needed from multiple external sources. • The extracted x-relations preserves their information, while discarding irrelevant data and lineage.

  31. Outline • Lineage? Uncertainty? Combined together? • The logic behind the given solution • The solution – The Trio system • Conclusions and future work

  32. The Trio system • The Trio system: • A relational DBMS that supports uncertainty and lineage. • Based on the ULDB data model, and accepts queries in the TriQLlanguage. • TriQL: extension of SQL with uncertainty and lineage-specific features.

  33. The Trio system Standard SQL Standard relational DBMS

  34. The Trio system • Trio API and translator • (Python) • The Python layer presents a simple Trio API that extends the standard Python DB 2.0 API for database access (Python’s analog of JDBC). • The Trio API accepts TriQL queries in addition to regular SQL, and query results may be x-tuples as well as regular tuples. • The API also exposes lineage tracing, along with the other ULDB-specific operations.

  35. TrioExplorer • (GUI client) • Command-line • client The Trio system • Using the Trio API they constructed: • A generic command-line interactive client similar to that provided by most DBMS’s. • A fully-featured graphical user interface called TrioExplorer.

  36. The Trio system –Encoding ULDB data • New Attribute functions: • aid - is a unique alternative identifier. • xid- identifies the x-tuple that this alternative belongs to. • conf- stores the confidence of the alternative. • num- is a nonnegative integer that tracks whether the alternative’s x-tuple has a “?”.

  37. The Trio system –Encoding ULDB data • The lineage information for each table T is stored in a separate table lin_T(aid, src_aid, src_table). • A tuple (, , ) lin_T denotes that T’s alternative a1 has alternative a2 from table in its lineage.

  38. The Trio system –Encoding ULDB data example

  39. The Trio system –TriQL to SQL query • Send the translated SQL query to the underlying DBMS, and opens a cursor on the result. • Trio stored procedures: • Tfetch: a cursor call to the Trio API for the original TriQL query. • Sfetch: a cursor call to the underlying DBMS for the translated SQL query.

  40. The Trio system –TriQL to SQL query • Enable Tfetch to collect all SQL result tuples for a single pair. • In order to propagate the “?” annotations, Multiply the num values underlying base tuples. • These values, together with the table names, comprise the lineage for the alternatives in the result x-tuple.

  41. The Trio system –Trio queries • Definition: TriQL(Trio’s query language for ULDBs): • Extension of SQL. • TriQL queries return: • Uncertain relations • Lineage that connects query result data to the queried data.

  42. The Trio system –Built-in predicates and functions • TriQL provides three built-in predicates and functions: • Conf()- Filter query results based on: • The confidence of the input data - Conf(Saw) • The confidence of the result - Conf(*). • Maybe() - takes no arguments and is true if and only if the current x-tuple has a “?”. • Lineage(X,Y)– is true whenever Y is reachable from X by one or more lineage steps.

  43. The Trio system –More possibilities • TrioExplorer – visualize the lineage. • The trio system demo • Coexistence checks • Extraneous data removal

  44. Outline • Lineage? Uncertainty? Combined together? • The logic behind the given solution • The solution – The Trio system • Conclusions and future work

  45. Conclusions and future work • The writers are not aware of any previously proposed formal data representation that integrates both lineage and uncertainty. • We have seen that ULDBs are a good way to solve data integration semantic problems. • Boost in performance may be achieved over computing query operators and confidences in tandem. • ULDBs are not expressive enough to fully represent complex operations. • Extend ULDBs with richer primitives to support data integration – cost in complexity. • Keep data integration “outside the ULDB box” and transfer the lineage and uncertainty primitives to data integration systems.

  46. QUESTIONS?

  47. Thank you!

More Related