Schema mappings are logical assertions that describe the correspondence between two schemas

Route for the Accounts tuple Dependents Accounts Clients D2 C1 123 ID2 Bob ID2 Bob A2 L2 ID2 SPIDER: a Schema Mapping Debugger Bogdan Alexe Laura Chiticariu Wang-Chiew Tan University of California, Santa Cruz Debugging a Data Exchange Debugging Schema Mappings Schema Mappings and Data Exchange • Schema mappings are logical assertions that describe the correspondence between two schemas • Higher-level, declarative programming constructs • Hide implementation details, allow for optimizations • Key elements in data exchange and data integration systems • Data Exchange [FKMP03] • Translate data conforming to a source schema S into data conforming to a target schema T so that the schema mapping M is satisfied The process of exploring, understanding and refining a schema mapping through the use of (test) data at the level of schema mappings • Approach 1: At the level of the implementation (unsatisfactory) • Specific to the exchange engine • Specific to implementation language. E.g., XQuery, XSLT, etc • Commercial tools available: Altova MapForce, Stylus Studio, etc • Approach 2: At the level of the schema mapping (desirable) • Currently, NO SUPPORT!!! • Motivation for debugging at the level of the schema mappings: Uniformity in specifying and debugging • Reduce programming effort by allowing a user to specify and debug at the level of schema mappings M Source schema S Target Schema T SPIDER Source instance I Target instance J M routes Source schema S Target Schema T tuple selection tuple selection Source instance I Target instance J SPIDER is the first prototype debugger for schema mappings XQuery/SQL/Java Example Main Idea: Debugging Schema Mappings with Routes Features • Routesillustrate the relationship between source and target data with the schema mapping • Declarative semantics, based on the logical satisfaction of the dependencies • Independent of any implementation of the schema mapping • Concept applies to any mapping-based data exchange or data integration system • Computing routes for selected source or target data • Compute all routes • Compute one route • Compute alternative routes on demand • Guided exploration of all routes • Standard debugging features • Breakpoints on dependencies • Watch windows – zoom into details about each step in the routes • Schema-level exploration of routes • Facilitates the understanding of schema mappings directly at the level of source and target schemas • Implementation details • On top of the Clio data exchange system • Supports relational and XML schema mappings • Schema mapping language: XSML - XML Schema Mapping Language Source schema Target schema MANHATTAN CREDIT CardHolders: cardNo ² limit ² ssn ² name ² Dependents: accNo ² ssn ² name ² FARGO FINANCE Accounts: ² accNo ² creditLine ² accHolder Clients: ² ssn ² name D1 fk1 C1 D2 Debugging scenario 1 Debugging scenario 2 Source-to-target dependencies Accounts Accounts D1: foreach s0in MANHATTAN-CREDIT.CardHolders exists t0in FARGO-FINANCE.Accounts, t1in FARGO-FINANCE.Clients, where t0.accHolder=t1.ssn with s0.cardNo= t0.accNo and s0.ssn= t0.accHolder and s0.name= t1.name Unknown credit limit? Unknown account number? A2 L2 ID2 D2: foreach s0in MANHATTAN-CREDIT.Dependents exists t0in FARGO-FINANCE.Clients with s0.ssn= t0.ssn and s0.name= t0.name A route for the Accounts tuple Accounts CardHolders 123 L1 ID1 D1 Target dependencies 123 $15K ID1 Alice Clients C1: foreach s0in FARGO-FINANCE.Clients exists t0in FARGO-FINANCE.Accounts with s0.ssn= t0.accHolder ID1 Alice 123 is not copied to the target as Bob’s account number $15K is not copied over to the target Target instance J Source instance I D’1: foreach s0in MANHATTAN-CREDIT.CardHolders exists t0in FARGO-FINANCE.Accounts, t1in FARGO-FINANCE.Clients, where t0.accHolder=t1.ssn with s0.cardNo= t0.accNo and s0.ssn= t0.accHolder and s0.name= t1.name and s0.limit= t0.creditLine CardHolders Accounts D’2: foreach s0in MANHATTAN-CREDIT.Dependents, s1in MANHATTAN-CREDIT.CardHolders where s0.accNo=s1.cardNo exists t0in FARGO-FINANCE.Clients, t1in FARGO-FINANCE.Accounts where t1.accHolder=t0.ssn with s0.ssn= t0.ssn and s0.name= t0.name and s1.cardNo= t1.accNo and s1.limit= t1.creditLine Clients Dependents Solution for I under the schema mapping Schema-level exploration of routes Compute All Routes and Compute One Route • Illustrate the schema mapping at the level of the source and target schemas Forest of routes for the Account tuple Routes obtained from the forest Accounts MANHATTAN CREDIT CardHolders: cardNo ² limit ² ssn ² name ² Dependents: accNo ² ssn ² name ² FARGO FINANCE Accounts: ² accNo ² creditLine ² accHolder Clients: ² ssn ² name A2 L2 ID2 Clients selected schema element ID2 Bob CardHolders Accounts D1 C1 456 $7K ID2 Bob A2 L2 ID2 C1 D1 Accounts fk1 C1 456 L3 ID2 D2 Clients ID2 Bob Dependents Accounts Clients D1 D2 D2 C1 123 ID2 Bob ID2 Bob A2 L2 ID2 CardHolders Dependents 456 $7K ID2 Bob 123 ID2 Bob Towards a full-fledged debugger • Compute all routes • For each selected target tuple ts, consider every possibility for witnessing t. Do not consider the same tuple twice. • Complete, polynomial time algorithm • The route forest is a polynomial representation of all routes (possibly exponentially many) for the selected tuples • Computation can be user-guided, or stopped with breakpoints on dependencies • Compute one route • Non-exhaustive: adapted compute all routes to stop when one witness is found • Inference procedure: to deduce all consequences of a proven tuple and avoid recomputation of “branches” • Complete, polynomial time algorithm • Future work • Extension to handle nested schema mappings • Adapt the target instance with changes in the schema mapping • Acknowledgements • Daniel Pepper, UC Santa Cruz • The Clio team in IBM Almaden Research Center

Schema mappings are logical assertions that describe the correspondence between two schemas

Schema mappings are logical assertions that describe the correspondence between two schemas

Presentation Transcript

Debugging Schema Mappings with Routes

Nested Mappings: Schema Mapping Reloaded

Assertions

Efficiently Supporting Changes to Declarative Schema Mappings

Foundations and Applications of Schema Mappings

Aggregate Query Answering under Uncertain Schema Mappings

SCHEMAS Forum for metadata schema implementers

XML-to-SQL Query Mapping in the Presence of Multi-valued Schema Mappings and Recursive XML Schemas

Schema Mappings Data Exchange & Metadata Management

Data Exchange & Composition of Schema Mappings

Foundations of Schema Mappings

Update Exchange with Provenance Schemas are related by GLAV schema mappings (tgds) :

Schemas and Schema-driven Metadata Software

Debugging Schema Mappings with Routes

Inferring Likely Mappings Between APIs

Core Schema Mappings

Using Correlation to Describe Relationships between two Quantitative Variable.

What Are Mappings?

Debugging Schema Mappings with Routes

Composing Mappings between Schemas using a Reference Ontology

Schema mappings are logical assertions that describe the correspondence between two schemas