1 / 26

Overview of Research - Computational Terminology - Knowledge extraction from Text

Overview of Research - Computational Terminology - Knowledge extraction from Text - Study of causal relation - Corpus building - Uncertainty - Computer Assisted Language Learning (CALL) - Interdisciplinary project on French Second Language - Text understanding

yank
Télécharger la présentation

Overview of Research - Computational Terminology - Knowledge extraction from Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Research - Computational Terminology - Knowledge extraction from Text - Study of causal relation - Corpus building - Uncertainty - Computer Assisted Language Learning (CALL) - Interdisciplinary project on French Second Language - Text understanding - From speech to sentence CLiNG - May 24 2002

  2. SeRT - a tool for knowledge extraction from text Caroline Barrière School of Information Technology and Engineering University of Ottawa Ottawa, Ontario, Canada barriere@site.uottawa.ca CLiNG - May 24 2002

  3. A few questions... - Why knowledge extraction from text? For building a Knowledge Base... - What’s a Knowledge Base? It depends who defines it.... - From a terminological standpoint: A static repository of domain-specific knowledge, giving the important concepts and their relations. - What kind of relations? Hyperonymy (is-a), meronymy (part-of), synonymy, function, definition, causality - Why start from text? What are the alternatives? CLiNG - May 24 2002

  4. Semantic Relations in Text (SeRT) - Goal : Starting from a corpus of texts on a specific domain, capture and store the important concepts (terms) of that domain, as well as their relations. - Hypothesis - definitions can be derived from text analysis - text is used as language and meta-language - paradigmatic relations can be found in texts by pattern search - present knowledge representation formalism allow the representation of this information CLiNG - May 24 2002

  5. Example of a pattern search for hyperonymy (Corpus on Composting) CLiNG - May 24 2002

  6. SeRT - Features - parallel search of terms and relations - term extraction - search for surface patterns leading to semantic relations - focus on user interaction (nothing fully automatic) - term selection and validation - user definition of surface patterns corresponding to semantic relations - user selection of concepts involved (tuple) in the semantic relation - raw text used (no preprocessing necessary) - easy access to KB : save and retrieval - to be used in “bootstrapping” mode CLiNG - May 24 2002

  7. Term extraction - Usage of a stop list a, able, about, above, according, accordingly, across, actually … - appropriate method for English (but maybe not for French) satellite link - liaison par satellite laser printer - imprimante au laser communication network - réseau de communication - no syntactic analysis - different from: Daille 1994: linguistic patterns (French) Bourigault 1994: morpho-syntactic markers (French) - lemmatization 'moving quickly'  ‘mov[ing] quick[ly]  'mov* quick* CLiNG - May 24 2002

  8. Results - Corpus on “composting” - Terms 503 compost 373 pile 258 composting 202 soil 170 materials 155 material 142 nitrogen 110 compost pile 103 water 102 bin 100 time 92 leaves 83 bacteria 402 compost 369 pile 199 soil 187 composting 149 material 146 materials 133 nitrogen 105 compost pile 102 bin 96 time 95 water 94 Compost 85 leaves 402 compost 369 pile 295 materi* 260 compost* 199 soil 133 nitrogen 105 compost pile 105 temperatur* 102 bin 96 time 95 leav* 95 water 94 Compost CLiNG - May 24 2002

  9. CLiNG - May 24 2002

  10. Search for patterns indicating semantic relations - pre-encoded patterns (earlier work - Barrière 1997) - find list from all other authors - pattern search has multiple possibilities: - string matching - lemmatized token matching - part of speech matching - inclusion of a dictionary look-up (derived from Collins + morphological rules added) - possibility of searching for a pattern around 1 term - usually what Computational Terminologists want to do - display limited or enlarged context CLiNG - May 24 2002

  11. Example of search patterns Hyperonymy such as (string matching) and other *|n (string + POS) includ* *|n (lemmatized string + POS) *|n is a *|a of [~part] (negative filter) *|y organic materi* [mostly, especially, specifically] (positive filter) + (search with specific term) Synonymy known as (string matching) also called (string matching) Meronymy contains *|n (string + POS) is a *|a part of (string + POS) CLiNG - May 24 2002

  12. CLiNG - May 24 2002

  13. CLiNG - May 24 2002

  14. CLiNG - May 24 2002

  15. CLiNG - May 24 2002

  16. Information storage in the TKB - transfer of info found at previous step - user selects the terms (concepts) around the pattern - semantic relation / pattern / tuple are stored in the TKB - an uncertainty factor can also be added to the tuple - research on causal relation has lead to realize the necessity of this information - applies to different relations CLiNG - May 24 2002

  17. Semantic relation extraction CLiNG - May 24 2002

  18. Results - semantic relations - Exploration of a few patterns - contain? (meronymy) - such as & and other (hypernymy) CLiNG - May 24 2002

  19. CLiNG - May 24 2002

  20. CLiNG - May 24 2002

  21. Could we infer is-a relations and extend the type hierarchy? CLiNG - May 24 2002

  22. SeRT use - Parallel mode - searching on patterns can suggest terms to be explored - search on terms can suggest patterns around them - Bootstrapping mode for relations - start with one pattern: enhance - tuplet compost/soil found used to find other patterns CLiNG - May 24 2002

  23. CLiNG - May 24 2002

  24. CLiNG - May 24 2002

  25. Future workShort term (tool itself) - Add list of predefined relations & patterns - Add flexibility in pattern search - toward a mix of semantic and syntactic search - Construction of a graphical representation of the semantic network built CLiNG - May 24 2002

  26. Future workLong term (tool + theoretical background) - Work on compound nouns - much implicit information that could be put explicitly in the KB - Work on representational scheme - the relational database is too limiting - causal relation requires a different type of representation - contexts for expressing the relation (possibly nested) - uncertainty factors - inferencing - Explore pattern search in French - Batch mode extraction (no user) - automatic selection of terms around patterns - after certain terms and patterns have been identified - need an integration of confidence levels on patterns CLiNG - May 24 2002

More Related