Readings in Data Management Spring 2008

Readings in Data ManagementSpring 2008 Computer Science Department Rutgers University

Seminar Information • Web page: http://www.cs.rutgers.edu/~amelie/courses/dbseminar.html • Meets Thursday 1-2:30pm in CoRE A

Organization • Weekly presentation on a DB topic (30 minutes) • We will select 2-3 topics to focus on the course of the semester • For each topic • First week: overview paper (survey, influential work) • Subsequent weeks: more complex papers on the subject • Possibly a few external presentations such as: • Students preparing for DB conference talks or quals • Invited speakers • Discussion on the paper

Topics • First Topic:Probabilistic Databases • We will select next topics from (non exhaustive list): • Question answering • Web Search • Personal Information Spaces • Query Optimization • Data Cleaning • Data Integration • Data Mining • Query Processing Techniques • Adaptive, Automatic, Autonomic Systems • OLAP • Stream Aggregation • Storage, Indexing, and System Architecture • XML Processing • Preference functions • Spatial and High-Dimensional Data • Recovery • Privacy in DBMS • …

What I expect from you • 1-2 presentation over the course of the semester • First-year students will be given “overview” presentation assignments at the beginning of each topic • More Senior students will present more research-focused papers • Number of presentations depends on the number of students in the seminar • Everyone should read the paper in advance and prepare 1-2 questions/discussion topics • Participation in discussion • There are no “stupid” questions! If you did not understand something, chances are others did not either

Presentations • I will select a list of papers to present for each topic • Start with an introductory paper • The papers that go deeper into one or more aspect of the problem • You are welcome to suggest some papers on the topic, as long as it is related (so that we can have more meaningful discussions) • Papers that I have overlooked • Papers on a different aspect of the topic that you would like to focus on

First topic: Probabilistic Databases • Uncertainty/Imprecision in data • Query Semantics • Probabilistic Data Representation Next few slides from Dan Suciu’s tutorial, more at

Databases Today are Deterministic • An item either is in the database or is not • A tuple either is in the query answer or is not • This applies to all variety of data models: • Relational, E/R, NF2, hierarchical, XML, …

What is a Probabilistic Database ? • “An item belongs to the database” is a probabilistic event • “A tuple is an answer to the query” is a probabilistic event • Can be extended to all data models;

Two Types of Probabilistic Data • Database is deterministicQuery answers are probabilistic • Database is probabilisticQuery answers are probabilistic

Long History Probabilistic relational databases have been studied from the late 80’s until today: • Cavallo&Pitarelli:1987 • Barbara,Garcia-Molina, Porter:1992 • Lakshmanan,Leone,Ross&Subrahmanian:1997 • Fuhr&Roellke:1997 • Dalvi&S:2004 • Widom:2005

So, Why Now ? Application pull: • The need to manage imprecisions in data Technology push: • Advances in query processing techniques

Application Pull Need to manage imprecisions in data • Many types: non-matching data values, imprecise queries, inconsistent data, misaligned schemas, etc, etc The quest to manage imprecisions = major driving force in the database community • Ultimate cause for many research areas: data mining, semistructured data, schema matching, nearest neighbor

Technology Push Processing probabilistic data is fundamentally more complex than other data models • Some previous approaches sidestepped complexity There exists a rich collection of powerful, non-trivial techniques and results, some old, some very recent, that could lead to practical management techniques for probabilistic databases.

Suggested Papers to discuss • Nilesh Dalvi, Dan Suciu: Efficient Query Evaluation on Probabilistic Databases. (VLDB 2004). • Minos Garofalakis et al, Probabilistic Data Management for Pervasive Computing: The Data Furnace Project. IEEE Data Eng. Bull. 29(1)(2006) • Omar Benjelloun, Anish Das Sarma, Chris Hayworth, Jennifer Widom: An Introduction to ULDBs and the Trio System. IEEE Data Eng. Bull. 29(1)(2006) • Prithviraj Sen, Amol Deshpande, Representing and Querying Correlated Tuples in Probabilistic Databases (ICDE 2007)

Readings in Data Management Spring 2008

Readings in Data Management Spring 2008

Presentation Transcript

Meter Data Management System SWEMA Spring Conference Amarillo, Texas 2008

Navigation Spring 2008

Spring, 2008

Financial Risk Management Spring 2008

Regional Meetings Data Update Spring 2008

Course Readings in Learning Management Systems

Spring 2008

Spring 2008

392G - Management of Preservation Programs Spring 2008

392G - Management of Preservation Programs Spring 2008

Spring Training 2008

Financial Management Standard in Schools Spring 2008

Student Loans and Debt Management Spring 2008

Spring 2008

SUPPLY CHAIN MANAGEMENT Information Sessions Spring 2008

Spring 2008

SGU’s Project Management Course – Spring 2008

Spring 2008

Spring 2008

Regional Meetings Data Update Spring 2008