From Conceptual to Instance Matching

From Conceptual to Instance Matching George A. Vouros AI Lab Department of Information and Communication Systems Eng. University of the Aegean 83200 Karlovassi, Samos, Greece georgev@aegean.gr

Ontology Matching at the Conceptual Level Given two Ontologies (S1,A1, I1), (S2,A2,I2) find a mapping (i.e. equivalences) between their signatures so that The translation of A1 with respect to this mapping is satisfied by A2.

Instance Matching Given two Ontologies (S1,A1, I1), (S2,A2,I2), find a mapping between their • Signatures (i.e. equivalences) & • Instances (i.e. “same as” assertions) such that the assertions in I2, together with the “translated” assertions in I1 are consistent with A2 and the translated axioms in A1

OAEI 09 Instance Matching Track The Instance Matching contest was composed by two tracks The ISLab Instance Matching Benchmark (IIMB) is a benchmark automatically generated starting from one data source that is automatically modified according to various criteria. The AKT-Rexa-DBLP test case aims at testing the capability of the tools to match individuals. All three datasets were structured using the same schema. The challenges for the matchers included ambiguous labels (person names and paper titles) and noisy data (some sources contained incorrect information).

Issues • Scalability • Different methods exploit different information concerning instances, or different facets of the same type of information • Assumptions concerning the structure of the “search space”

Scalability Our first approach: • Computing clusters of “same as” instances where each cluster is represented by a “model” of the cluster. • Clusters and models are stored on disk files • New instances are compared with each cluster by exploiting the “models” • The highest similarity above a specific threshold indicates the cluster of the new instance

Methods • COCLU: Aims at discovering typographic similarities between sequences of characters over an alphabet (ASCII or UTF character set), aiming to reveal the similarity of classes instances’ lexicalizations during ontology population. It is a partition-based clustering algorithm which divides data into clusters and searches the space of possible clusters using a greedy heuristic.

Methods • Vector Space Model – based (VSM) method: It computes the matching of two pseudo documents. In our case each such document corresponds to an instance and it is produced by the words in the vicinity of that instance. The “vicinity” includes all words occurring (i) to the local name, label and comments of this concept, (ii) to any of its properties (exploiting the properties’ local names, labels and comments), as well as (iii) to any of its related concepts or instances. Each document is represented by a vector of n weighted index words, where the weight of a word is the frequency of its appearance in the document. The similarity between two vectors is computed by means of the cosine similarity measure.

Synthesis of different methods • Simple (e.g. the union/intersection of clusters with at least one common member) • Biased : The clusters of one method may be used as input by the another. • Model based: Set the constraints that must be satisfied according to the axioms of the schemas and run a generic method (e.g. the max-sum algorithm or a DCOP method) that reconciles the prefernces and conflicts among the individual methods.

To be done • Get results from the individual methods…. • Implement the synthesis of different methods • Investigate the interaction between conceptual mapping and instance mapping for a sophisticated but scalable synthesis method.

From Conceptual to Instance Matching

From Conceptual to Instance Matching

Presentation Transcript

Instance Validation

An Integrated Approach to Assurance on XBRL Instance Document: A Conceptual Framework

Instance Transformation

Multi-Instance EBS to OTM Integration

From Hoare Logic to Matching Logic Reachability

Crete AMI Instance

So Many Amplifiers To Choose From; Matching Amplifiers To Applications

Instance Based Approach

Multiple Instance Ranking

Large Instance Points

Extending the Multi-Instance Problem to Model Instance Collaboration

ESNet instance

ESNet instance (a)

Attention and emotion: From data to conceptual issues

Connecting to an Instance

Instance Based Learning

Database Design: From Conceptual Design to Physical Implementation

Assurance on XBRL Instance Document: A Conceptual Framework of Assertions

Matching Kids with Books: From Lexiles to Leveling

From Conceptual Models to Simulation Models

Instance based learning

Extending the Multi-Instance Problem to Model Instance Collaboration