1 / 14

Integration of Information Extraction with an Ontology

KMi. Integration of Information Extraction with an Ontology. Knowledge Media Institute. M. Vargas-Vera, J.Domingue, Y.Kalfoglou, E. Motta and S. Buckingham Sum. Introduction. Ontology -> Information Extractor English text (NLP) Group of tools their IE system: KMi Ontology From UMass:

alta
Télécharger la présentation

Integration of Information Extraction with an Ontology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. KMi Integration of Information Extraction with an Ontology Knowledge Media Institute M. Vargas-Vera, J.Domingue, Y.Kalfoglou, E. Motta and S. Buckingham Sum

  2. Introduction • Ontology -> Information Extractor • English text (NLP) • Group of tools their IE system: • KMi Ontology • From UMass: • Marmot • Crystal • Badger • OCML preprocessor

  3. Presentation Layout • Background on tool origins and area of work • Description of tool integration • Coping with ambiguity • Description of output • Population of Ontology • Future Work

  4. UMassUniversity of Massachutes Amherst • Marmot, Crystal, Badger • Classifies text by recognizing extraction patterns and semantic features associated to slots in predefined frames.

  5. Testing Area: KMi Planet • Web-based new server • Story Library • Collections of news stories and postings • Ontology Library • Ontologies stored for use in extracting information from the story library. • Uses OCML myPlanet myPlanet uses cue-phrases defined as “research areas” to query KMi planet through the ontology library and the information extraction tools we’re about to talk about

  6. The Ontology Library • 40 different types of events or activities that can be described by the ontology library. Event type 3: demonstration-of-technology technology-being-demostrated (technology) (Info Extraction) has-duration (duration) (30 min) start-time (time-point) (3:30pm) end-time (time-point) (4pm) has-location (a place) (room 120 TMCB BYU campus) other agents-involved (list of person(s)) (Dr. Embley) main-agent (list of person(s)) (Brian Goodrich) location-at-start (a place) (room 120 TMCB BYU campus) location-at-end (a place) (room 120 TMCB BYU campus) medium-used (equipment) (mutli-media projector, ppt) subject-of-the-demo (title) (Integration of Information Extraction with an Ontology)

  7. Marmot • Natural Language Processor • Noun, Verb, and Prepositional Phrases “John DomingueWed, 15 Oct 1997. David Brown, Universityfor Industryvisitsthe OU.” • <ex> 2 1 • SUBJ(1): DAVID BROWN %COMMA% UNIVERSITY • PP (2): FOR INDUSTRY • VB (3): VISITS • OBJ1(4): THE OU • PUNC(5): %PERIOD% • </ex> • <ex> 1 1 • SUBJ(1): JOHN DOMINGUE • ADVP(2): @WED_%COMMA%_15_OCT_1997@ • PUNC(3): %PERIOD% • </ex>

  8. Crystal • Dictionary Induction Tool • Using keyword to annotate text with semantic tags. • Visitor (<VI> David Brown <VI>) • Place (<PL> the OU <PL>) • Specific-to-general driven data search • Relaxes constraints on initial definitions until it finds the most specific definition that covers all instances of the word in the text. • Retains results for future use • Tested on over 300 stories, 100% precision and recall

  9. Badger (fairly certain whoever wrote this section did not speak English as first language) Matches sentences from text against concept nodes passed from Crystal. Select the best match by max number of features matching the concept node. Can remove irrelevant sentences from problem set. } => + http://rockape.qgl.org/crap/badger.swf

  10. Coping with Ambiguity Query list of institutions Return list of institutions – no match Query list of projects Return list of project - match No discussion of whether this was automatically done by the extractor or manually by the users.

  11. OCML Code Translator (Operational Conceptual Modeling Language) • Tokenise Badger output, find corresponding CN definitions and extract all the objects found in the story

  12. Ontology Maintenance • Use Badger (lexicon) and Crystal (concept) output to automatically update Ontology library whenever a new story is added to the Story library • Some cannot be automatically updated: • There is not enough information in the story • No current template to match with the sentence concepts.

  13. Conclusion • IE system created using Marmot, Crystal, Badger and the OCML translator. • Obtained good results in KMi stories. Assessment Sporadic periods of quality technical writing, interspersed with nearly impenetrable English A borrowing of tools, translated to OCML and ported for KMi

  14. Future Work • Deriving the type of an object when it does not match a predefined template. • Automatic creation of new classes and subclasses. • Using this IE tool in other domains (need new training data?) • Trying out a new Machine Learning algorithm in Crystal and comparing performance. • Using the IE tool hypertext. • Saving Badger’s output in XML • Creating a more visual gui for the ontologies.

More Related