Managing and Querying Transaction-time Databases under Schema Evolution

1. Managing and Querying Transaction-time Databases under Schema Evolution Hyun J. Moon (UCLA) Carlo Curino (Politecnico di Milano) Alin Deutsch (UC San Diego) Chien-Yi Hou (UC San Diego) Carlo Zaniolo (UCLA)

2. PRIMA Hyun Moon 2



5. PRIMA Hyun Moon 5 Outline Background: Transaction-time DBs with Fixed Schemas Challenges with Schema Evolution How to Archive? How to Query? Optimization

6. PRIMA Hyun Moon 6 Schema Evolution Schemas change. Why? Data reorganization Better logical design, performance Changes in the modeled reality New data collection, M&A and restructuring, government regulations Is it real? Wikipedia DB: 171 schema versions over 4.5 years [ICEIS08] More case studies [Marche93] [Sjoberg93]

7. PRIMA Hyun Moon 7 Schema Modification Operators (SMOs) Language for schema change Procedural fashion Do this, do that, � Easier for regular DBAs Similar to primitives in [Bernstein06]


9. PRIMA Hyun Moon 9 Previous work: relational approaches Single-pool and Multi-pool [DeCastro97] Problem: temporally ungrouped, i.e. redundant, expensive coalescing Our approach: MV-Document Extending V-Document for evolving schemas Temporally grouped!

10. PRIMA Hyun Moon 10 MV-Document: Example

11. PRIMA Hyun Moon 11 Problem 2: How to Query w/ Schema Changes? Manual Querying Write one query per version Doesn�t scale: 100�s of versions Schema Versioning by Data Translation Translate data into the queried version. Survey [Roddick95] Inefficient! Implementation by Query Rewriting Use the above method as the semantics definition Rewrite the input query into source versions Efficient!

12. PRIMA Hyun Moon 12 Query Rewriting MARS for a query rewriting engine [VLDB03] Input: XQuery, XICs Output: XQuery Chase input query to find an equivalent query modulo XICs PRIMA translate SMO into XICs XML Integrity Constraints ICs for XML ~ First-order logic with XPath Simple case: MERGE TABLE S, T into R (from v1 to v2)

13. PRIMA Hyun Moon 13 Query Rewriting: A Simple Example engineerpersonnel(empno, name, hiredate, title, deptname) otherpersonnel(empno, name, hiredate, title, deptname) MERGE TABLE engineerpersonnel, otherpersonnel into empacct empacct(empno, name, hiredate, title, deptname)

14. PRIMA Hyun Moon 14 Rewriting Optimization Problem: Rewriting with many constraints is expensive Goal: minimize constraints used in rewriting SMO Prune use SMOs that affect the tables/columns used in the input query SMO Compress group SMOs of a same type and produce a single mapping

15. PRIMA Hyun Moon 15 Query Optimization Problem: query rewriting is for non-temporal queries Goal: exploit queries� temporal semantics to optimize Minimal Source-version Detection E.g. Find Joe�s salary at 2007-01-01 We don�t need to query other schema versions Temporal Join Detection Temporal join: a join with validity overlap We don�t need to perform temporal joins across versions

16. PRIMA Hyun Moon 16 Experiment Results (1/2) Query rewriting scales with many schema versions? Wikipedia�s 171 schema versions between Apr 2003 and Nov 2007 Top 20 queries from Wikipedia�s real query workload

17. PRIMA Hyun Moon 17 Experiment Results (2/2)

18. PRIMA Hyun Moon 18 Challenges of Schema Evolution & Panta Rhei Framework Show that we�re working and making progress on SE, in several aspects SMO is common, which unifies our approach � mention the SMOs in the previous slidesShow that we�re working and making progress on SE, in several aspects SMO is common, which unifies our approach � mention the SMOs in the previous slides

19. PRIMA Hyun Moon 19 Conclusions Transaction-time DB is a great thing But, it doesn�t support evolving schemas We propose an effective and efficient solution Temporal data model for schema evolution, Efficient query answering, by query rewriting Optimization for rewriting and queries

20. PRIMA Hyun Moon 20 Thank you!

21. Extra Slides Smile! Enjoy! Introduce myself Introduce collaborators � 2 faculties, 3 students, 3 schools � collaboration/expertise brought together Smile! Enjoy! Introduce myself Introduce collaborators � 2 faculties, 3 students, 3 schools � collaboration/expertise brought together

22. PRIMA Hyun Moon 22 Presentation in a Nutshell Problem Transaction-time DBs have strong applications Study cases reveal that schema evolution is a real problem How to support evolving schemas in transaction-time DB? Challenges and our contributions Data model: how to archive, with evolving schemas? Previous approach: relational-based approaches, which are temporally ungrouped Our contribution: XML-based temporally grouped data model Query support: how to query evolving data with evolving schemas? Previous approach: schema versioning, based on data translation Our contribution: schema versioning based on query rewriting Solution scalability: how to rewrite complex temporal queries over hundreds of schema versions Our contribution: mapping pruning, rewriting using minimal source versions and more

23. PRIMA Hyun Moon 23 PRIMA in Schema Versioning Taxonomy Schema versioning supported Schema modification � allow schema changes Schema evolution � above + data preserved after schema changes Schema versioning � above + any schema version can be used for querying/updating Single-pool supported Single-pool: one big table for all schema versions, using completed schema, single line of history Multi-pool: one table for each schema, multiple lines of history Synchronous management of versioned data and schemata Synchronous � temporal data conform to schema of same time Asynchronous � temporal data may not conform to schema of same time Partial schema versioning Partial schema versioning � supports querying, retrospectively and prospectively. Update allowed only on a designated (usually current) schema version Full schema versioning � supports both querying and update retrospectively and prospectively

24. PRIMA Hyun Moon 24

Managing and Querying Transaction-time Databases under Schema Evolution

Managing and Querying Transaction-time Databases under Schema Evolution

Presentation Transcript

Schema Evolution and Version Management

Managing and querying encrypted data

Managing Transaction Exposure

Schema-less databases

Managing Transaction Risks

Managing Transaction Exposure

Managing Transaction Exposure

Querying Infinite Databases

Relational Databases and Transaction Design

Multimedia Databases and Querying techniques

No-Schema SQL Querying Relational Databases Independent of Schema

Schema Evolution

Querying Sequence Databases

Databases : SQL-Schema Definition and View

Indexing transaction time databases

Managing Schema Objects

Managing Schema Objects

Managing Schema Objects

Managing Schema Objects

Querying PostgreSQL Databases

Managing Transaction Exposure