Propagating Updates Through XML Views Using Lineage Tracing

Leonidas Fegaras University of Texas at Arlington Propagating Updates Through XML Views Using Lineage Tracing

Updating XML Views • Problem: Given a virtual XML view over a relational database expressed in XQuery, translate an XQuery update over the XML view to SQL updates over the base tables (embedded into pure XQuery) • Assumptions: • We use the XQuery Update Facility (XUF) • Snapshot semantics: one XQuery with updates = a single transaction • Goals: • prevent most view side-effects statically • ... but detect the rest efficiently at run-time • collectively consider all compatible updates in a transaction • This is far more expressive than considering each update in isolation

Highlights • Prevent many forms of view side-effects by using syntactic restrictions and a static analysis • Allow the full extent of XQuery to define views but restrict the table connections in a view to trees • Use a conservative static analysis to detect exclusive data sources • These are the table columns that can not cause view side-effects when updated • Based on polymorphic type inference and type usage analysis • Detect the rest of the view side-effects using a general (but efficient) run-time analysis • Propagate information about the origins of the updatable data through views and XQuery updates • to be used at the update destinations • Methodology: lineage tracing • The lineage is not just a tuple-id/column-name • an XML element has a context: a chain of tuples used in reaching the element • The table updates are generated by considering all chains in a transaction

Exclusive Data Source (EDS) • An exclusive data source (EDS) is a table column whose values • do not appear in the view output at all ie, they are not updatable • or they appear in the view output only once, but they are not accessed elsewhere in the view code • Exclusive data sources cannot cause view side effects when updated • Our goal is to prevent/detect side effects on non-EDS columns as well

How to Detect Exclusive Data Sources? • Use polymorphic type inference with a simple usage analysis: • each table column = a fresh type variable • if the principal type of a view is polymorphic over a type variable => the view is oblivious to the table column values • in addition, if the type variable appears once in the view principal type => the table column is an exclusive data source • A proof based on the parametricity theorem ('theorem for free') • Problems: • Not all XQuery engines support static validation • May be too conservative • Limitation: the analysis granularity is a table column • Not good for generic mappings, eg, a view that uses a single tag table • The tag table may be used multiple times in the view (once per tag) • Cannot statically deduce disjointness • But may use extra annotations in the view to specify disjointness

Updating non-EDS Columns • Database $DB: Department ( dno, dname, dphone ) Employee ( ename, dno, salary ) • View $view: for $i in $DB/Employee let $d := $DB/Department[dno=$i/dno] return <employee>{ $i/ename, $i/salary, $d/dname, $d/dphone }</employee> • EDS columns: Employee.ename, Employee.dno, Employee.salary • Update on a non-EDS column: replace $view[ename="Smith"]/dname with "CS" View tree: Employee ename dno salary dno dname dphone Department

Example (cont.) • Update on an non-EDS column: replace $view[ename="Smith"]/dname with "CS" • If the CS department exists, can we just connect Smith to CS? • May cause a side-effect on dphone • What if the dphone were not visible in the view? • What if there is an update replace $view[ename="Smith"]/dphone with "x5678" in the same transaction? ename dno salary Smith update EE x1234 CS x5678 dno dname dphone dno dname dphone

Checking for Side-Effects at Run-Time • An XQuery replace update generates database updates of the form: • replace ( T.A, value, tids ) • to replace T.A with a new value • tids is the tuple id list of the tuple chain to reach the update destination tuple • We do not need to start from a root • anchor tuple: the lowest-level tuple in the chain whose foreign key in the chain is EDS • If T.A is EDS, then tids contains the tuple id of the destination only • At the end of each transaction • We collect all updates associated with the same anchor tuple • We construct a shadow tuple chain that contains the updated values • If there is an existing chain in the database with the same visible tuple values • ... we update the anchor foreign key to link it to the existing chain • A visible column is one that appears in the view output (from usage analysis) Otherwise, we try to insert the tuples of the shadow chain • ... which may cause key constraint violation

Example E must be an EDS column

More on Updates • This translation of a replace update will not cause side-effects • But is this what the programmer intended to do? replace $view[dname="EE"]/dname with "CS" • Too expensive to test at run-time • Deletions are handled similarly • We generate replace calls to change T.A to null • If all visible column values in the shadow chain are null, then disconnect the chain from the anchor • Needs garbage collection • Very hard to handle insertions • Generate replace-null(T.A,value,tids) to replace only if T.A is null • Generate insert(T) to insert a new tuple • Hard to generate these calls from a general XQuery insert

Using Lineage Tracing • The replace/replace-null/insert calls are generated at compile-time • but the tuple chain ids (tids) used must be propagated at run-time • At run-time, every XML atomic value (xs:anyAtomicType) is annotated with lineage attributes • tids: sequence of tuple ids • path: sequence of foreign key names T.A • Requires minor extensions to an existing XQuery engine • Constructed when an atomic value is generated from a table in a view • from the view tree • XQuery operations must propagate these values as is • Newly constructed values have empty lineage

Other Contributions • A general algorithm for translating XQuery updates to embedded SQL update calls • Generates pure XQuery with embedded replace/replace-null/insert calls • It is type-guided (based on the update destination type) • Limitation: XQuery insert translations are not complete • requires that there is at least one child in the parent node to copy parts of its lineage • Each XQuery update query is translated to pure XQuery with • embedded SQL code to retrieve data, • calls to replace/replace-null/insert to update data • A general framework for optimizing XQuery with embedded SQL statements • Source-to-source rewrite heuristics to • promote XQuery predicates to SQL code • fuse SQL statements in pairs to form SQL join queries • An implementation in Haskell:http://lambda.uta.edu/HXQ/

Conclusion • Presented a framework for updating virtual XML views • Characteristics: • Hybrid of schema-based and data-driven methods for detecting view side-effects • Allows general XQueries but restricts the table connections in the views • XQuery updates are translated statically to embedded SQL code • ... but the resulting SQL updates use information that is propagated at run-time (through lineage tracing) • Current work: • A source-to-source, type-guided approach to incremental maintenance of XML views

Related Work • L. Wang et al [vldb'06]: HUX • A hybrid of schema-based and data-driven methods • Uses data lineage • Uses the concept of clean extended source • Semi-decidable method to determine translatability of updates • deduced from relationship cardinalities • Barsalou et al [sigmod'91]: Object view updates • View trees from ownership connections • Literal translation to base updates within a dependency island • Braganholo et al [vldb'04]: Map XML views to relational views • B. Choi et al [icde'07]: data-driven with XPath views • Needs to maintain a reachability matrix • Bidirectionalization research in programming languages • Relational lenses: Bohannon et al [pods'06] • Bidirectionalization for free: Voigtlander [popl'09]

Propagating Updates Through XML Views Using Lineage Tracing