530 likes | 600 Vues
Ontology Alignment/Matching. Prafulla Palwe. Agenda. Introduction Being serious about the semantic web Living with heterogeneity Heterogeneity problem I have a plan for you Matching Problem Matching Operation Motivation Schema Matching Vs Ontology Matching Correspondence Alignment
E N D
Ontology Alignment/Matching Prafulla Palwe
Agenda • Introduction • Being serious about the semantic web • Living with heterogeneity • Heterogeneity problem • I have a plan for you • Matching Problem • Matching Operation • Motivation • Schema Matching Vs Ontology Matching • Correspondence • Alignment • Matching Process • Sequential composition • Parallel composition • Application Domains • Traditional • Emergent • Classification • Matching Dimensions • Basic Techniques • Element Level • Structure Level • Summary and Challenges
Introduction • Being serious about the semantic web - • It is not one guy's ontology • It is not several guys' common ontology • It is many guys and girls' many ontologies • So it is a mess, but a meaningful mess
Introduction • Living with heterogeneity - • The semantic web will be: • Huge • Dynamic • Heterogeneous • These are not bugs, they are features. • We must learn to live with them.
Introduction • Heterogeneity problem – • Resources being expressed in different ways must be reconciled before being used. • Mismatch between formalized knowledge can occur when: • different languages are used; • different terminologies are used; • different modeling is used.
Introduction • I have a plan for you – Reconciliation
Matching Problem • Matching Operation • Definition – Matching operation takes as input ontologies, each consisting of a set of discrete entities (e.g., tables, XML elements, classes, properties) and determines as output the relationships (e.g., equivalence, subsumption) holding between these entities
Matching Problem • Motivation – • 2 XML Schemas • 2 Ontologies
Matching Problem • Schema mapping Vs ontology mapping • Differences - • Schemas often do not provide explicit semantics for their data • Relational schemas provide no generalization • Ontologies are logical systems that constrain the meaning • Ontology definition as set of logical axioms • Commonalities - • Schemas and ontologies provide a vocabulary of terms that describes the domain of interest • Schemas and ontologies constrain the meaning of terms used in the vocabulary.
Matching Problem • Correspondence • Definition – • Given 2 ontologies O and O’ , a correspondence between M between O and O’ is a 5-uple : <id,e,e’,R,n> such that: • id is a unique identifier of the correspondence. • e and e’ are entities of O and O’ (e.g. XML Elements, classes) • R is a relation (e.g. equivalence (=), disjointness (_|_)) • n is a confidence measure in some mathematical structure (typically in the [0,1] range)
Matching Problem • Alignment • Definition – • Given 2 ontologies O and O’, an alignment A between O and O’: • Is a set of correspondence on O and O’ • With some cardinality: 1-1, 1-* etc. • Some additional metadata (method, date, properties etc)
Matching Process • General Basic Matching Process
Matching Process • Sequential Composition
Matching Process • Parallel composition
Matching Process • Similarity Filter, alignment extractor and alignment filter –
Matching Process • Aggregation Operations – • There are many different ways to aggregate matcher results, usually depending on confidence/similarity: • Triangular norms (min, weighted products) useful for selecting only the best results • Multidimensional distances (Eudidean distance, weighted sum) useful for taking into account all dimensions • Fuzzy aggregation (min, weighted average) useful for aggregating competing algorithms and averaging their results • Other specific measures (e.g., ordered weighted average)
Application Domains • Traditional - • Ontology evolution • Schema integration • Catalog integration • Data integration
Application Domains • Ontology Evolution
Application Domains • Catalog Integration
Application Domains • Emergent • P2P information sharing • Agent communication • Web service composition • Query answering on the web
Application Domains • P2P information sharing
Application Domains • Web Service Composition
Application Domains • Agent communication
Classifications • Matching Dimensions • Input Dimensions • Underlying models (e.g. XML, OWL) • Schema Level Vs Instance Level • Process Dimensions • Approximate Vs Exact • Interpretation of the input • Output Dimensions • Cardinality • Equivalence Vs Diverse relations • Graded Vs Absolute Confidence
Classifications • Three Layers • Upper Layer • Granularity of match • Interpretation of the input information • Middle Layer • Represents classes of elementary (basic) matching techniques • Lower Layer • Based on the kind of input which is used by elementary matching techniques
Classifications • Classification of schema based techniques
Basic Techniques • Element Level Techniques • String based – • Prefix - • Takes an input 2 strings and checks whether the first string starts with the second • e.g. net = network but also hot = hotel • Suffix – • Takes an input 2 strings and checks whether the first string ends with the second • e.g. ID = PID but also word = sword • Edit Distance – • Takes as input 2 strings and calculates the number of edit operations (insertion,deletion,substitution) of characters required to transform one string into other normalized by length of the max string. • editDistance(NKN, Nikon) = 0.4
Basic Techniques • Language based – • Tokenization – • Parses names into tokens by recognizing punctuation, cases • Hands-Free_Kits <hands, free, kits> • Lemmatization – • Analyses morphologically tokens in order to find all their possible basic forms • Kits Kit • Elimination – • Discards empty tokens that are articles, prepositions, conjuctions • a, the, by, type of, their, from
Basic Techniques • Structure Level Techniques • Ontologies are viewed as graph-like structure containing terms and their inter-relationships. • Taxonomy based • Bounded path matching • These take 2 paths with links between classes defined by the hierarchical relations, compare terms and their positions along these paths and identify similar terms. • Super(sub)-concept rules • If super concepts are the same, the actual concepts are similar to each other
Basic Techniques • Tree based • Children • 2 non leaf schema elements are structurally similar if their immediate children sets are highly similar • Leaves • 2 non leaf schema elements are structurally similar if their leaf sets are highly similar, even if their immediate children are not.
Summary and Challenges • Summary • Ontology Matching and alignment is the process of developing the common or most common structure/semantic terms out of 2 or more different ontologies/structures/schemas. • Different efficient and complex algorithms using basic techniques of matching process, can be developed for matching and alignment generation. • Challenges • Developing generic and highly efficient matching and alignment generation algorithms.