Simplifying Data Translation Using Schema Matching

Using Schema Matching to Simplify Heterogeneous Data Translation Tova Milo, Sagit Zohar Tel Aviv University

Introduction • There are large amounts of data available on the Web but the format of the data is not homogeneous. • Most applications can handle only one or a small number of formats. • There is a need to translate data from one format to another.

Introduction • Two approaches to translating data: • A specific program to translate from format A to format B. (e.g. Latex to HTML) • Data translation languages.

Introduction • The solution – TranScm • A data translation system • Automatically translates a portion (often a large portion) of the desired data • Does not replace data translation languages, but reduces the amount of programming needed in them

TranScm Architecture Input Schema Output Schema Import/Export Library GUI Rule Base Matching Module Typing Module

Data Model • Tree (Forest) Model • Similar to OEM • Allows an order on children • Can handle cyclic structures using ids as “pointers”

Data Model Article title authors sections author author “Conceptual Concepts” “Al Gore Ithm” “G WWW Bush”

Schema Model • Labeled graphs • Some nodes may be ordered • Each vertex is a schema element (type) • Labels carry information about the node

Schema Model Article [3] author [1] string sections [2] title [1] authors [0,…,->] ref string

Rules • Rules are the basis of the matching and translation • Rules have an associated priority

Rules • Each rule has two components: • Matching component • Match function • Decendents (sic) function • Translation component • Translation function

Matching • The Match function examines schema labels to determine possible matches. • The Decendents function checks the numbers and types of the children of the current node.

Matching Article Article authors author author author author

When Matching Fails • Matching can fail for two reasons: • Something in the source can’t be matched to something in the target with the current set of rules. • Something in the source matches several items in the target equally well.

When Matching Fails • Via the GUI, the user can do the following: • Add • Disable • Modify • Override

Translation • Using the mapping generated from the Matching step and the appropriate rules, data is transformed from the input schema to the output schema. • The translation process can make use of data translation languages • The translation process can perform type checking.

Conclusion • TranScm • Provides a general mechanism for data translation • Handles the common relatively simple translations automatically • Can use data translation languages for more difficult translations

Simplifying Data Translation Using Schema Matching

Simplifying Data Translation Using Schema Matching

Presentation Transcript

Automatic Schema Matching

Informationsintegration Schema Matching

Schema Matching Algorithms

Automating Schema Matching for Data Integration

Information capacity in schema and data translation

SIMPLIFY using

Automating Schema Matching

Multi-column Substring Matching for Database Schema Translation

Model-independent schema and data translation

Model-independent Schema and Data Translation

Model-independent Schema and Data Translation

Privacy-Preserving Schema Matching Using Mutual Information

Corpus-based Schema Matching

Generic Schema Matching using Cupid

Informationsintegration Schema Matching

Ontology Matching and Schema Integration using Node Ranking

Schema and Data Translation

Model-independent Schema and Data Translation

SCHEMA-BASED SEMANTIC MATCHING