1 / 17

Using Schema Matching to Simplify Heterogeneous Data Translation

Using Schema Matching to Simplify Heterogeneous Data Translation. Tova Milo, Sagit Zohar Tel Aviv University. Introduction. There are large amounts of data available on the Web but the format of the data is not homogeneous. Most applications can handle only one or a small number of formats.

willis
Télécharger la présentation

Using Schema Matching to Simplify Heterogeneous Data Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Schema Matching to Simplify Heterogeneous Data Translation Tova Milo, Sagit Zohar Tel Aviv University

  2. Introduction • There are large amounts of data available on the Web but the format of the data is not homogeneous. • Most applications can handle only one or a small number of formats. • There is a need to translate data from one format to another.

  3. Introduction • Two approaches to translating data: • A specific program to translate from format A to format B. (e.g. Latex to HTML) • Data translation languages.

  4. Introduction • The solution – TranScm • A data translation system • Automatically translates a portion (often a large portion) of the desired data • Does not replace data translation languages, but reduces the amount of programming needed in them

  5. TranScm Architecture Input Schema Output Schema Import/Export Library GUI Rule Base Matching Module Typing Module

  6. Data Model • Tree (Forest) Model • Similar to OEM • Allows an order on children • Can handle cyclic structures using ids as “pointers”

  7. Data Model Article title authors sections author author “Conceptual Concepts” “Al Gore Ithm” “G WWW Bush”

  8. Schema Model • Labeled graphs • Some nodes may be ordered • Each vertex is a schema element (type) • Labels carry information about the node

  9. Schema Model Article [3] author [1] string sections [2] title [1] authors [0,…,->] ref string

  10. Rules • Rules are the basis of the matching and translation • Rules have an associated priority

  11. Rules • Each rule has two components: • Matching component • Match function • Decendents (sic) function • Translation component • Translation function

  12. Matching • The Match function examines schema labels to determine possible matches. • The Decendents function checks the numbers and types of the children of the current node.

  13. Matching Article Article authors author author author author

  14. When Matching Fails • Matching can fail for two reasons: • Something in the source can’t be matched to something in the target with the current set of rules. • Something in the source matches several items in the target equally well.

  15. When Matching Fails • Via the GUI, the user can do the following: • Add • Disable • Modify • Override

  16. Translation • Using the mapping generated from the Matching step and the appropriate rules, data is transformed from the input schema to the output schema. • The translation process can make use of data translation languages • The translation process can perform type checking.

  17. Conclusion • TranScm • Provides a general mechanism for data translation • Handles the common relatively simple translations automatically • Can use data translation languages for more difficult translations

More Related