Question Answering over Linked Data

Question Answering over Linked Data London Text Analytics Meetup, 5 August 2011 DanicaDamljanović contact: danica.damljanovic@gmail.com

The goal Automatically answer Natural Language questions by a machine as if it were a human.

MARY <is a> PERSON UNIVERSITY OF SHEFFIELD <is an> ORGANISATION MARY <works for> UNIVERSITY OF SHEFFIELD SHEFFIELD <is a> CITY UNIVERSITY OF SHEFFIELD <is located in> SHEFFIELD UNITED KINGDOM <is a> COUNTRY SHEFFIELD <is located in> UNITED KINGDOM MARY <lives in> SHEFFIELD SELECT ?country WHERE { ?person <lives in> ?city ?city <located in> ?country • FILTER ?person = MARY } Mary works for University of Sheffield, which is located in Sheffield. Sheffield is located in the United Kingdom. Mary lives in Sheffield.

Motivation In which country does Mary live?

Building Question-Answering systems: Challenges Application developers: customise the system when porting it to work with different kind of data e.g. Where >> Location/City Who >> Person, Organisation

Requirements • Flexibility of the supported language • No strict adherence to syntax • Habitable system: the user easily makes the queries and also avoids constructions that are not supported without any problems • Portable with minimal customisation

Our Approach

FREyA - Feedback, Refinement, Extended VocabularyAggregator • Feedback: showing the user the system’s interpretation of the query • Refinement: • resolving ambiguity: generating dialog whenever one term refers to more than one concept in the ontology (precision) • Extended Vocabulary: • expressiveness: generating dialog whenever an “unknown” term appears in the question (recall)

Feedback in FREyA • http://gate.ac.uk/freya ESWC 2010

Clarification dialogs • Generated by combining the syntactic parsing and ontology-based lookup • the system learns from the user’s selections • “No ranking will be perfect” • Customisation through the dialog with the user • application developers vs. end-users

New York is a city

New York is a state

Learning IF THEN

FREyA Workflow

Finding POCs

Finding OCs

The User Controls the Output POC min geo:loElevation point POC geo:isLowestPointOf geo:LoPoint POC max state geo:stateArea area geo:State

evaluation • Question-Answering over Linked Data Challenge • Two datasets of different kind, MusicBrainz and DBPedia • 50 training questions per dataset along with the correct answers • 50 testing questions per dataset without the answer

FREyA: results • Question-Answering over Linked Data challenge (ESWC’11, Crete) • FREyA the only system that participated with both provided datasets demonstrating portability • DBPedia, f-measure 0.58 (PowerAqua 0.5) • MusicBrainz, f-measure 0.71 (SWIP 0.66) • 94.4 % precision and recall on the Mooney GeoQuery dataset (PANTO recall 88.05%, Querix precision 87.11%)

Results: F-measure statistics

Learning

Conclusion • Combining syntactic parsing with ontology-based lookup in an interactive process of feedback and query refinement can increase the precision and recall of Question-Answering over Linked Data, • while reducing the time for customisation by shifting some tasks from application developers to end users.

Future Challenges • Output: • Correct answer OR • Identifying the flaws in the data? • Ranking/disambiguation algorithms to improve MRR

thank you for your attention! questions? Contact: danica.damljanovic@gmail.com

Demos • http://gate.ac.uk/sale/dd/ • United States geography • MusicBrainz • Dbpedia

Question Answering over Linked Data

Question Answering over Linked Data

Presentation Transcript

Question-Answering

Answering Question 1

Question Answering

Question AnswerinG

Question-Answering: Overview

Answering question 2

Question Answering Tutorial

Question Answering Technologies

Question Answering Tutorial

Question Answering

Question Answering

Question Answering

Question Answering

Question Answering over Implicitly Structured Web Content

Question Answering

Question Answering

Question Answering