290 likes | 435 Vues
This presentation explores advancements in federated ontology search, focusing on merging, scoring, and performance optimization. With a plethora of open-domain ontologies such as Cyc and SUMO, the need for effective strategies to merge and select relevant ontologies becomes critical due to the challenges posed by autonomy and language mismatches. Our approach involves using middleware to query multiple ontologies while applying machine learning algorithms to improve the accuracy and completeness of results. We analyze outcomes derived from real-world ontologies, presenting methods to score and evaluate the quality of merged information.
E N D
Federated Ontology Search Vasco Calais Pedro, Eric Nyberg and Jaime Carbonell Presenter: PushkarAcharya
Overview • Introduction • Ontological Search • Ontology description and selection • Merging • Scoring • Results • Related Work • Challenges and Future Work • Conclusion
Introduction • Large number of open-domain ontologies available • Cyc, SUMO, Omega, Thought Treasure, Swoogle, etc. • Offer easily accessible and open domain information
Introduction • Large number of open-domain ontologies available • Cyc, SUMO, Omega, Thought Treasure, Swoogle, etc. • Offer easily accessible and open domain information • CHALLENGES ?? • Information merging and reuse • Different frameworks and languages
Introduction • SOLUTIONS?? • At the Ontology provider side – Absorb all knowledge into a single ontology beforehand • Establish Full mapping between concepts and relations • Absorb other ontologies
Introduction • At the Ontology provider side – Absorb all knowledge into a single ontology beforehand • Drawbacks – • Non-scalability • Losing autonomy of ontological knowledge • Language level mismatches • Ontology level mismatches • Updating mappings as Ontologies are updated
Introduction • At the Ontology provider side – Absorb all knowledge into a single ontology beforehand • At application developer side – Querying each ontology individually
Introduction • At the Ontology provider side – Absorb all knowledge into a single ontology beforehand • At application developer side – Querying each ontology individually • Middleware • Query multiple ontologies and merge results • Form ontological chains and inferences
Introduction • At the Ontology provider side – Absorb all knowledge into a single ontology beforehand • At application developer side – Querying each ontology individually • Middleware • Only for small fragments of ontologies • On demand basis • Take advantage of redundant and complementary knowledge to improve performance • Parallelize query execution
Ontological Search • This approach will be successful only if the “search” is separated from information need and ontology. • Abstracts the formal representation of query as required by the ontologies • Describes 3 operators – • Rel(a, b, rels) • Parents(a) • Children(a) • By defining operators we delegate the their execution to ontologies. • Freedom to use extended features
Ontological Search • Constraint – The output of the query execution should be in form of a Rooted Directed Acyclic Graph
Ontological Search • Constraint – The output of the query execution should be in form of a Rooted Directed Acyclic Graph
Ontological Search • Two sub-problems – • Ontology Description and Selection • Merging and Scoring
Ontology Description and Selection • Goal : Selection of subset of relevant ontologies • Can be modeled as P(O,q) • Difficult with constant updates to ontologies • Use of inference engines and logic mechanisms • Evaluate relative utility of different ontologies by comparing results generated for given input query. • Comparison against gold set of queries. • Time consuming process • Paper uses a parameter to model general accuracy for a given resource. • Use of machine learning algorithms like expectation maximization
Merging • Reduces problem of merging ambiguous concepts • Primary goal is to find complementary information in the results • Makes the result more complete • Involves inexact graph matching and maximum common sub-graph problems • When dealing with non-isomorphic graphs • NP-Complete problem
Merging • Isomorphic graphs
Merging • Graph similarity • Cost Based Distance • Use of edit operations • Feature Based Distance • Use a set of invariants established from the graph structural description • Maximum Common Subgraph • Maximum clique detection
Merging • Localized Confidence Boosting • Confidence is indicative of the reliability of the association. • Graphs are broken into tuples (cx, cy, r) and merged if the tuples are similar. Confidence is boosted when merging using Soft Or –
Merging • Tuple Similarity • Based on the linear combination of edge similarity and concept similarity • Uses Q-Gram distance for comparing concepts or relations
Scoring • Score outcome of each operator before final score • Each operator focuses on either precision or recall • Precision operator : relation • Recall operator : similarity • Precision : relevant results in retrieved outcome • Recall : fraction of relevant instances that are retrieved
Scoring • Precision = relevant instances in outcome Total outcome • Recall = relevant instances in outcome Relevant results
Scoring • Precision scoring metric • Recall scoring metric
Results • Experimental Setup: Type Checking • Ontologies used: WordNet and ThoughtTreasure • 9558 pairs from Javelin question answering system in TREC QA • Gold standard, for a subset of full set of pairs, was created in order to test the accuracy
Results Improved Confidence after merging Recall Precision and recall
Related Work • Different way to approach same problem • FCA-merge algorithm • IF-Map method • PROMPT system • SWOOGLE • DRAGO
Challenges and Future Work • Current approach is not robust to relations in different ontologies differ significantly • Compare the structures in which the 2 concepts occur to determine similarity • Ontology description in constantly changing ontologies is difficult • Future Work: Model an ontology based on use of random queries to determine the domain of the ontology
Conclusions • Approach discussed here presents several benefits over full merge • Helps mitigate the issue of dynamic ontologies • Establishes a parallel to federated search