110 likes | 219 Vues
The Trio System aims to integrate data uncertainty and lineage as first-class concepts within database systems. Motivated by application domains that handle uncertain data—such as sensor data management and scientific experiments—this innovative approach addresses shortcomings in current database systems. By enhancing traditional database management with a new data model and query language (TriQL), Trio provides a robust framework for uncertain data tracking. The ongoing development focuses on efficient confidence computation, aggregation, and integration with Bayesian networks, making it an essential advancement for data management.
E N D
UNCERTAINTY LINEAGE DATA The Trio System for Data, Uncertainty, and Lineage: Overview and Demo Anish Das Sarma Stanford University
Original Motivation for the Project New Application Domains • Many involve data that is uncertain • (approximate, probabilistic, inexact, incomplete, imprecise, fuzzy, inaccurate,...) • Many of the same ones need to track the lineage (provenance) of their data
Original Motivation for the Project New Application Domains • Many involve data that is uncertain • (approximate, probabilistic, inexact, incomplete, imprecise, fuzzy, inaccurate,...) • Many of the same ones need to track the lineage (provenance) of their data Neither uncertainty nor lineage is supported in current database systems
Sample Applications • Data integration • Information extraction • Scientific experiments • Sensor data management • Deduplication (“data cleaning”) • Approximate query processing
Our Goal • Develop a new kind of database management system (DBMS) in which: • Data • Uncertainty • Lineage • are all first-class interrelated concepts • With all the “usual” DBMS features
Another “Trio” in Trio • Data Model • Simplest extension to relational model that’s sufficiently expressive • Query Language • Simple extension to SQL with well-defined semantics and intuitive behavior • System • A complete open-source DBMS that people want to use
Another “Trio” in Trio • Data Model • Uncertainty-Lineage Databases (ULDBs) • Query Language • TriQL • System • Trio-One— built on top of standard DBMS
Ongoing and Future Work • Efficient Confidence Computation • Top-K Queries • Aggregation • External Lineage • Data Modifications and Versioning • Continuous Uncertainty • Dependency Theory for ULDBs • Marrying Trio and Bayes Nets • System Development and Applications
Trio Players, Present and Past • Current • Jennifer Widom, Jeffrey Ullman • Parag Agrawal, Anish Das Sarma, Raghotham Murthy, Martin Theobald • Alums • Omar Benjelloun, Ashok Chandra, Julien Chaumond, Alon Halevy, Chris Hayworth, Ander de Keijzer, Michi Mutsuzaki, Shubha Nabar, Tomoe Sugihara
Thank you! Search “stanford trio”