240 likes | 361 Vues
Large Scale Integration of Senses for the Semantic Web. Jorge Gracia , Mathieu d’Aquin, Eduardo Mena Computer Science and Systems Engineering Department (DIIS) University of Zaragoza , Spain Knowledge Media Institute (KMi) Open University , United Kingdom.
E N D
Large Scale Integration of Sensesfor the Semantic Web Jorge Gracia, Mathieu d’Aquin, Eduardo Mena Computer Science and Systems Engineering Department (DIIS) University of Zaragoza, Spain Knowledge Media Institute (KMi) Open University, United Kingdom 18th International World Wide Web Conference Madrid, Spain, 20th-24th April 2009
Outline • Introduction • Method • Optimization study • Experiments • Conclusions WWW 2009
Introduction • Current Semantic Web • Favoured by the increasing amount of online ontologies already available on the Web • Hampered by the high heterogeneity that this growing semantic content introduces • The redundancy problem • Excess of different semantic descriptions, coming from different sources, to describe the same intended meaning • Our proposal • A method to cluster the ontology terms that one can find on the Semantic Web, according to the meaning that they intend to represent WWW 2009
Introduction WWW 2009
Introduction WWW 2009
Introduction • Redundancy problem: many representations of the same meanings apple Watson ? The Semantic Web WWW 2009
Introduction The Tree The Fruit The Company • Proposed solution: pool of cross-ontology integrated senses apple “clustered” Watson The Semantic Web WWW 2009
Introduction Question Answering Scarlet Ontology Matching Folksonomy Enrichment Watson Semantic Browsing QueryGen Semantic Query Generation Multiontology Semantic Disambiguator Ontology Evolution The Semantic Web WWW 2009
Method Synonym expansion Keyword maps Synonym maps (each synonym map) Ontology terms Sense clustering Watson OFF-LINE Extraction Similarity Computation CIDER Similarity > threshold? no yes yes more ont. terms? integration no Modify integration? Senses yes Modify integration degree rise threshold? no yes RUN-TIME Integration Disintegration Senses Clustering WWW 2009
Method apple apple apple apple apple apple apple apple apple apple apple apple • Keyword maps: ontology terms with identical label Watson WWW 2009
Method manzana apple Apple Inc. apple apple apple apple apple Apple Inc. apple apple apple tree apple apple apple tree apple apple • Synonym maps: ontology terms with synonym labels Watson WWW 2009
Method • Agglomerative clustering a’’ a’ c a a a d d CIDER b . . . b b c e c d e e WWW 2009
Method • Sense maps: semantically equivalent terms grouped The Tree The Fruit apple apple apple apple CIDER apple apple tree apple tree apple apple Apple Inc. apple manzana Apple Inc. apple apple The Company apple apple WWW 2009
Method Falling threshold (Integration) Rising threshold (Disintegration) Optimal threshold WWW 2009
Optimization study • Integration level varies with similarity threshold Integration Level = 1 - # finalSenses / # initialOntologyTerms WWW 2009
Optimization study • Which similarity threshold is the best one? • Three exploration ways: • Experimenting with ontology matching benchmarks • Obtained 0.13 lower bound for optimal threshold • Contrasting with human opinion • Range of good values between 0.2 and 0.3 • Optimizing time response. Because: • It will reduce the response time of the overall system • Compatible with the other two ways • It is not always feasible to have a large enough number of humans to ask or reference alignments WWW 2009
Optimization study • Response time varies with threshold • Optimal value around 0.22 WWW 2009
Experiments • Scalability study • 9156 keywords, 73169 different ontology terms to be clustered, • Processing time is linear with number of ontology terms WWW 2009
Experiments • Scalability study • Processing time is independent of ontology size WWW 2009
Experiments • Illustrative example • Keyword = turkey • Synonym map = turkey, Türkei, Türkiye • Nº ontology terms = 58 • Nº Integrated senses = 9 (threshold = 0.27) WWW 2009
Experiments • More examples(threshold = 0.19) WWW 2009
Experiments • Positive facts • Terms from different versions of the same ontology are easily detected • Very different meanings are not wrongly integrated (e.g., “plant” as “living organism” with “plant” as “industrial buildings”) • Negative facts • Hard to obtain a total integration of the same meanings (caused by very different semantic descriptions) WWW 2009
Conclusions • Conclusions • Redundancyof semantic descriptions on the Web can be significantly reduced • Our integration technique scales when used on a large body of knowledge • The proposed method is flexible enough to configure and adapt our integration level to the necessities of client applications • Future work • More advanced prototype • More extensive human-based evaluation • Study and evaluation of impact on other systems WWW 2009
END of presentation Thank you! WWW 2009