110 likes | 236 Vues
Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences . On the Origin of Data. Data Evolvement. This is the era of Data. Databases, text, blogs, social data ,… Huge volumes Evolving Through Automatic Tools
E N D
Daniel Deutch Blavatnik School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences On the Origin of Data
Data Evolvement • This is the era of Data. • Databases, text, blogs, social data,… • Huge volumes • Evolving Through Automatic Tools • Sent Between Applications and Users
Data Provenance • Understanding how and why data hasevolvedis of fundamental importance • For authentication • Both origin and propagators of data should be trustworthy • For access control • Confidentiality constraints interplay with the transformation • For hypothetical reasoning • What if we change a piece of data? • How can we optimally affect data evolvement
Example • Alice posted photos with David • David is worried about Eve seeing his photos ) ( ( ) OR OR AND NOT
Tracking Provenance • The logic is already implemented (e.g. to decide what photos to show) • We develop tools to “instrument” applications with provenance tracking. • Simply maintaining an “activity log” is not good enough. • We want also the possible “reasons” for activities • E.g. “not blacklisted” is not an activity • Instead we create formulas in generic algebraic constructions based on semirings • We also developtools that use the provenance information for analysis.
Generic Expression ( ) ( ) OR OR AND NOT Trust: False OR ( (True OR True) ANDNOTFalse ) = True Number of paths (if Alice and Eve are not friends) : 0 + ( (1 +1 ) x 1 ) = 2 min ( (0:05 min 0:08 ) + 0:00 ) = 0:05 Latency:
Provenance for SQL Queries Emps GoodEmps • Amsterdamer, D., Tannen, Provenance for Aggregate Queries [PODS ‘11] • Amsterdamer, D., Tannen, On the limitations of Provenance for Queries with Difference [Tapp ‘11] • D. , Milo, Roy, Tannen, Circuits for Datalog Provenance [ICDT ‘14] • Amsterdamer, D. ,Green, Karvounarakis, Tannen, Semiring-based Provenance for SQL Queries (In preparation) • D. , Moskovitch, Provenance for Relational Updates [In preparation] πDep(Emps⨝ GoodEmps)
Provenance for Social and Web Data • Bienvenu, D., Suchaneck, Provenance for Web 2.0 Data [Secure Data Management ‘12] • Abiteboul, Bienvenu, D., Deduction in the Presence of Distribution and Contradictions [WebDB ‘12] • Abiteboul, D., Vianu, Deduction with Contradictions in Datalog[ICDT ‘14] • Amarilli, D., Senellart, Provenance for Order-Aware Transformations (In preparation)
PROPOLIS:Provenance for Process Analysis • D., Moskovich, Tannen, PROPOLIS: Provisioned Analysis of Data-Centric Processes[VLDB ’13] • D., Moskovich, Tannen, A Provenance Framework for Data-Dependent Process Analysis (Submitted) • D., Moskovich, Provenance for Distributed Processes (In preparation)