1 / 11

On the Origin of Data

Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences . On the Origin of Data. Data Evolvement. This is the era of Data. Databases, text, blogs, social data ,… Huge volumes Evolving Through Automatic Tools

heaton
Télécharger la présentation

On the Origin of Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences On the Origin of Data

  2. Data Evolvement • This is the era of Data. • Databases, text, blogs, social data,… • Huge volumes • Evolving Through Automatic Tools • Sent Between Applications and Users

  3. Provenance

  4. Data Provenance • Understanding how and why data hasevolvedis of fundamental importance • For authentication • Both origin and propagators of data should be trustworthy • For access control • Confidentiality constraints interplay with the transformation • For hypothetical reasoning • What if we change a piece of data? • How can we optimally affect data evolvement

  5. Example • Alice posted photos with David • David is worried about Eve seeing his photos ) ( ( ) OR OR AND NOT

  6. Tracking Provenance • The logic is already implemented (e.g. to decide what photos to show) • We develop tools to “instrument” applications with provenance tracking. • Simply maintaining an “activity log” is not good enough. • We want also the possible “reasons” for activities • E.g. “not blacklisted” is not an activity • Instead we create formulas in generic algebraic constructions based on semirings • We also developtools that use the provenance information for analysis.

  7. Generic Expression ( ) ( ) OR OR AND NOT Trust: False OR ( (True OR True) ANDNOTFalse ) = True Number of paths (if Alice and Eve are not friends) : 0 + ( (1 +1 ) x 1 ) = 2 min ( (0:05 min 0:08 ) + 0:00 ) = 0:05 Latency:

  8. Provenance for SQL Queries Emps GoodEmps • Amsterdamer, D., Tannen, Provenance for Aggregate Queries [PODS ‘11] • Amsterdamer, D., Tannen, On the limitations of Provenance for Queries with Difference [Tapp ‘11] • D. , Milo, Roy, Tannen, Circuits for Datalog Provenance [ICDT ‘14] • Amsterdamer, D. ,Green, Karvounarakis, Tannen, Semiring-based Provenance for SQL Queries (In preparation) • D. , Moskovitch, Provenance for Relational Updates [In preparation] πDep(Emps⨝ GoodEmps)

  9. Provenance for Social and Web Data • Bienvenu, D., Suchaneck, Provenance for Web 2.0 Data [Secure Data Management ‘12] • Abiteboul, Bienvenu, D., Deduction in the Presence of Distribution and Contradictions [WebDB ‘12] • Abiteboul, D., Vianu, Deduction with Contradictions in Datalog[ICDT ‘14] • Amarilli, D., Senellart, Provenance for Order-Aware Transformations (In preparation)

  10. PROPOLIS:Provenance for Process Analysis • D., Moskovich, Tannen, PROPOLIS: Provisioned Analysis of Data-Centric Processes[VLDB ’13] • D., Moskovich, Tannen, A Provenance Framework for Data-Dependent Process Analysis (Submitted) • D., Moskovich, Provenance for Distributed Processes (In preparation)

  11. Thank you!

More Related