170 likes | 308 Vues
Temporal Information and XML. Carlo Zaniolo Department of Computer Science University of California, Los Angeles. A Short History of Time in Databases. Relational model: between 33 and 48 temporal DB proposals counted:
E N D
Temporal Informationand XML Carlo Zaniolo Department of Computer Science University of California, Los Angeles
A Short History of Timein Databases • Relational model: between 33 and 48 temporal DB proposals counted: • A struggle to get around the limitations of relational (flat) tables and a rigid query language (SQL) • A key issue: Temporal interval coalescing is needed after each projection! • Clustering, indexing, query optimization for temporal information add to the complexity
Coalescing • Time stamping the individual tuples:If we want the salary history, we have to coalesce the last three tuples into one:
XML • XML: hierarchical views with temporal groups • Temporal grouped models are more natural and powerful, but they did not fit in the flat relational model • XML Query languages can easily express temporal queries on these views.
History Tables • Time-stamped tuples in relations • Temporally grouped time-stamped attribute values
Historical XML Database ArchitectureTwo Approaches • Native XML databases • Historical data are stored in native XML database • XML queries can be specified directly upon the database • Native XML databases:Tamino(Software AG), eXcelon(XIS) • XML-enabled RDBMS • Historical view decomposed into relational databases as binary tables • Historical data can then be published as XML document through SQL/XML publishing functions; or queried through a middleware as XML views
Historical XML Views: Architecture Current Database Active Rules/ update logs SQL Queries Relational Data: Current Content Temporal Queries XML VIEWS Historical Data Historical Database
Relational Storage of Temporal Relational Data • Relational schema: employee(empno, name, sal, title, deptno) • Attribute history tables:employee_sal (empno, sal, tstart, tend) employee_title(empno, title, tstart, tend) … • An internal relation for each time-varying attribute • XQuery statements on the XML views translated into SQL statements on the internal relations
Experiments • Simulated data with history of 300,024 employees • Comparing: Native XML DBs: • SoftwareAG’s Tamino (text-based storage). XPath • eXcelon’s XIS (XML Information Server) (OODBMS-based storage). XQuery • Against DB2.
Preliminary Performance Comparisons Storage Size:
Performance Comparisons (cont’d) Query Performance of DB2 and Tamino: Q2: history query Q4,Q6: snapshot queries Q3,Q5: interval queries Q1: scan of databases Q7: join
Related Problems • Query Performance: • Indexing: R* trees • Temporal clustering: tuples from the same time period should be assigned to same page • Page Usefulness method. A page with employee records for a department. After 60% quit that page is only 40% useful. • Compression should not be ruled out: • sparingly used in DBs, but important for XML • DB2 mainframes, Oracle … • Updates not a problem for histories.
Research (cont.) • XML Query languages are powerful and temporal queries can be expressed in XQuery without any extension, but not for all users • User-friendly QBE-like language for temporally grouped model • SQLXML temporal views and queries • ROLLUPS-like temporal views (and SQL:1999) • Different views—but the same RDBMS-based implementation underneath.
XML Representation of DB HistoryTable Columns as XML Elements <employees tstart="1995-01-01" tend="1996-12-31"> <employee tstart="1995-01-01" tend="1996-12-31"> <empno tstart="1995-01-01" tend="1996-12-31">10003</empno> <name tstart="1995-01-01" tend="1996-12-31">Bob</name> <salary tstart="1995-01-01" tend="1995-05-31">60000</salary> <salary tstart="1995-06-01" tend="1996-12-31">70000</salary> <title tstart="1995-01-01" tend="1995-09-30">Engineer</title> <title tstart="1995-10-01" tend="1996-01-31">Sr Engineer</title> <title tstart="1996-02-01" tend="1996-12-31">Tech Leader</title> <dept tstart="1995-01-01" tend="1995-09-30">QA</dept> <dept tstart="1995-10-01" tend="1996-12-31">RD</dept> <DOB tstart="1995-01-01" tend="1996-12-31">1945-04-09</DOB> </employee> <!-- More… --> </employees>
Thank you! http://wis.cs.ucla.edu
References • S. Sarawagi, S. Thomas,R. Agrawal: Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications,SIGMOD 1998 • Fusheng Wang, Carlo Zaniolo: Publishing and Querying the Histories of Archived Relational Databases in XML. 4thInternational Conference on Web Information Systems Engineering, December 10th - 12th, 2003 Roma, Italy. • Haixun Wang, Carlo Zaniolo, Chang Richard Luo: ATLaS: a Small but Complete SQL Extension for Data Mining and Data Streams. VLDB 2003--Demo. • Haixun Wang and Carlo Zaniolo:ATLaS: A Native Extension of SQL for Data Mining. SIAM International Conference on Data Mining 2003, San Francisco, CA, May 1-3, 2003 • Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi: A Sequential Pattern Query Language for Supporting Instant Data Minining for e-Services, VLDB 2001. • Haixun Wang, Carlo Zaniolo: Using SQL to Build New Aggregates and Extenders for Object- Relational Systems. VLDB 2000.