Multi-Concept Alignment and Evaluation

Multi-Concept Alignment and Evaluation Shenghui Wang, Antoine Isaac, Lourens van der Meij, Stefan Schlobach Ontology Matching Workshop Oct. 11th, 2007

Multi-Concept Alignment and Evaluation Introduction: Multi-Concept Alignment • Mappings involving combinations of concepts • o1:FruitsAndVegetables → (o2:Fruits OR o2:Vegetables) • Also referred to as: • Multiple, complex • Problem: only a few matching tools consider it • Cf. [Euzenat & Shvaiko]

Multi-Concept Alignment and Evaluation Why is MCA a Difficult Problem? • Much larger search space: |O1| x |O2| → 2|O1|x 2 |O2| • How to measure similarity between sets of concepts? • Based on which information and strategies? “Fruits and vegetables” vs. “Fruits” and “Vegetables” together • Formal frameworks for MCA? • Representation primitives • owl:IntersectionOf? skosm:AND? • Semantics A skos:broader ( skosm:AND B C)  A broader B & A broaderC ?

Multi-Concept Alignment and Evaluation Agenda • The multi-concept alignment problem • The Library case and the need for MCA • Generating MCAs for the Library case • Evaluating MCAs in the Library case • Conclusion

Multi-Concept Alignment and Evaluation Yet MCA is needed in real-life problems • KB collections (cf. OAEI slides) • Scenario: re-annotation of GTT-indexed books by Brinkman concepts

Multi-Concept Alignment and Evaluation Yet MCA is needed in real-life problems • Books can be indexed by several concepts • with post-coordination: co-occurrence matters {G1=“History” , G2=“the Netherlands”} in GTT → a book about Dutch history • Granularity of two vocabularies differ →{B1=“Netherlands; History”} • Alignment should associate combination of concepts

Multi-Concept Alignment and Evaluation MCA for Annotation Translation: Approach • Produce similarity measures between individual concepts • Sim(A,B) =X • Grouping concepts based on their similarity • {G1,B1,G2,G3,B2} • Creating conversion rules • {G1,G2,G3} → {B1,B2} • Extraction of deployable alignment

Multi-Concept Alignment and Evaluation MCA Creation: Similarity Measures • KB scenario has dually indexed books • Brinkman and GTT concepts co-occur • Instance-based alignment techniques can be used • Between concepts from a same vocabulary, similarity mirrors possible combinations!

Multi-Concept Alignment and Evaluation MCA Creation: 2 Similarity Measures • Jaccard overlap measure applied on concept extensions • Latent Semantic Analysis • Computation of similarity matrix • Filter noise due to insufficient data • Similarity between concepts between vocabularies and inside vocabularies

Multi-Concept Alignment and Evaluation MCA Creation: 2 Concept Aggregation Methods • Simple Ranking • For a concept, take the top k similar concepts • Gather GTT concepts and Brinkman ones • Clustering • Partitioning concepts into similarity-based clusters • Gather concepts Global approach: the most relevant combinations should be selected

Multi-Concept Alignment and Evaluation Generated Rules • Clustering generated much less rules • With more concepts

Multi-Concept Alignment and Evaluation Evaluation Method: data sets • Training and evaluation set from dually-indexed books • 2/3 training, 1/3 testing • Two training sets (samples) • Random • Rich: books that have at least 8 annotations (both thesauri)

Multi-Concept Alignment and Evaluation Evaluation Method: Applying Rules Gr1→Br1 Gt Gr2→Br2 Gr3→Br3 • Several configurations for firing rules • 1. Gt = Gr • 2. Gt  Gr • 3. Gt  Gr • 4. ALL

Multi-Concept Alignment and Evaluation Evaluation Measures • Precision and recall for matched books • Books that were given at least one good Brinkman annotation • Pb, Rb • Precision and recall for annotation translation • Averaged over books

Multi-Concept Alignment and Evaluation Results: for ALL Strategy

Multi-Concept Alignment and Evaluation Results: Rich vs. Random Training Set • Rich does not improve the results a lot • Bias towards richly annotated books • Jaccard performances go down • LSA does better • Statistical corrections allow simple grouping techniques to cope with data complexity

Multi-Concept Alignment and Evaluation Results : for Clustering

Multi-Concept Alignment and Evaluation Results: Jaccard vs. LSA • For 3 and ALL, LSA outperforms Jaccard • For 1 and 2 Jaccard outperforms LSA • Simple similarity is better at finding explicit similarities • Really occurring in books • LSA is better at finding potential similarities

Multi-Concept Alignment and Evaluation Results : using LSA

Multi-Concept Alignment and Evaluation Results: Clustering vs. Ranking • Clusters performs better on strategies 1 and 2 • They match existing annotations better • They have better precision • Ranking has higher recall but lower precision Classical tradeoff (ranking keeps noise)

Multi-Concept Alignment and Evaluation Conclusions • There is an important problem: multi-concept alignment • Not extensively dealt with current litterature • Needed by applications • We have first approaches to create such alignments • And to deploy them! • We hope that further research will improve the situation (with our ‘deployer’ hat on) • Better alignments • More precise frameworks (methodology research)

Multi-Concept Alignment and Evaluation Conclusions: performances • Evaluation shows mitigated results • Performances are generally very low • These techniques cannot be used alone • Notice: dependence on requirements • Settings were manual indexer choose among several candidates allow for lower precision • Notice: indexing variablity • OAEI have demonstrated that manual evaluation somehow compensates for the bias of automatic one

Multi-Concept Alignment and Evaluation Thanks!

Multi-Concept Alignment and Evaluation