1 / 11

From Conceptual to Instance Matching

From Conceptual to Instance Matching. George A. Vouros AI Lab Department of Information and Communication Systems Eng. University of the Aegean 83200 Karlovassi, Samos, Greece georgev@aegean.gr. Ontology Matching at the Conceptual Level.

peri
Télécharger la présentation

From Conceptual to Instance Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Conceptual to Instance Matching George A. Vouros AI Lab Department of Information and Communication Systems Eng. University of the Aegean 83200 Karlovassi, Samos, Greece georgev@aegean.gr

  2. Ontology Matching at the Conceptual Level Given two Ontologies (S1,A1, I1), (S2,A2,I2) find a mapping (i.e. equivalences) between their signatures so that The translation of A1 with respect to this mapping is satisfied by A2.

  3. Instance Matching Given two Ontologies (S1,A1, I1), (S2,A2,I2), find a mapping between their • Signatures (i.e. equivalences) & • Instances (i.e. “same as” assertions) such that the assertions in I2, together with the “translated” assertions in I1 are consistent with A2 and the translated axioms in A1

  4. OAEI 09 Instance Matching Track The Instance Matching contest was composed by two tracks The ISLab Instance Matching Benchmark (IIMB) is a benchmark automatically generated starting from one data source that is automatically modified according to various criteria. The AKT-Rexa-DBLP test case aims at testing the capability of the tools to match individuals. All three datasets were structured using the same schema. The challenges for the matchers included ambiguous labels (person names and paper titles) and noisy data (some sources contained incorrect information).

  5. Issues • Scalability • Different methods exploit different information concerning instances, or different facets of the same type of information • Assumptions concerning the structure of the “search space”

  6. Scalability Our first approach: • Computing clusters of “same as” instances where each cluster is represented by a “model” of the cluster. • Clusters and models are stored on disk files • New instances are compared with each cluster by exploiting the “models” • The highest similarity above a specific threshold indicates the cluster of the new instance

  7. Methods • COCLU: Aims at discovering typographic similarities between sequences of characters over an alphabet (ASCII or UTF character set), aiming to reveal the similarity of classes instances’ lexicalizations during ontology population. It is a partition-based clustering algorithm which divides data into clusters and searches the space of possible clusters using a greedy heuristic.

  8. Methods • Vector Space Model – based (VSM) method: It computes the matching of two pseudo documents. In our case each such document corresponds to an instance and it is produced by the words in the vicinity of that instance. The “vicinity” includes all words occurring (i) to the local name, label and comments of this concept, (ii) to any of its properties (exploiting the properties’ local names, labels and comments), as well as (iii) to any of its related concepts or instances. Each document is represented by a vector of n weighted index words, where the weight of a word is the frequency of its appearance in the document. The similarity between two vectors is computed by means of the cosine similarity measure.

  9. Synthesis of different methods • Simple (e.g. the union/intersection of clusters with at least one common member) • Biased : The clusters of one method may be used as input by the another. • Model based: Set the constraints that must be satisfied according to the axioms of the schemas and run a generic method (e.g. the max-sum algorithm or a DCOP method) that reconciles the prefernces and conflicts among the individual methods.

  10. To be done • Get results from the individual methods…. • Implement the synthesis of different methods • Investigate the interaction between conceptual mapping and instance mapping for a sophisticated but scalable synthesis method.

More Related