1 / 21

MEANING: a Roadmap to Knowledge Technologies

Meaning. MEANING: a Roadmap to Knowledge Technologies German Rigau. TALP Research Center. UPC. Barcelona. rigau@lsi.upc.es Bernardo Magnini. ITC-IRST. Povo-Trento. magnini@itc.it Eneko Agirre. IXA group. EHU. Donostia. eneko@si.ehu.es

zytka
Télécharger la présentation

MEANING: a Roadmap to Knowledge Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Meaning MEANING: a Roadmap to Knowledge Technologies German Rigau. TALP Research Center. UPC. Barcelona. rigau@lsi.upc.es Bernardo Magnini. ITC-IRST. Povo-Trento. magnini@itc.it Eneko Agirre. IXA group. EHU. Donostia. eneko@si.ehu.es Piek Vossen. Irion Technologies. Delft. Piek.Vossen@irion.nl John Carroll. COGS. U. Sussex. Brighton. johnca@cogs.susx.ac.uk http://www.lsi.upc.es/~nlp/meaning/meaning.html

  2. Knowledge technologies (semantic web): make sense of petabytes of information Range of techniques to automate knowledge lifecycle Lexical KB (ontologies) Text understanding (IE or other) extract high-level meaning represent and manage in a KB HLT to enable knowledge technologies MeaningIntroduction

  3. Building large and rich KB by hand: ExpensiveE.g. CYC, WordNet (EuroWordNet) Introspection fails to reflect reality in texts, domains Is a “saint” an animate being? not always, image. Contradictions  Hamper applications of HLT and KT Richer KBs (ontologies) Domain knowledge Contradictory subsets  Semi-automatic means MeaningIntroduction

  4. Crucial intermediate tasks Word Sense Disambiguation From words to concepts (word sense≈concept in KB) Large scale enrichment of (multilingual) Lexical KB Enable semantic processing Goal Large-scale extraction of shallow meaning: relations among concepts MeaningIntroduction

  5. MeaningShallow semantics act Invite s456 source object destination s378 s412 s933 (Chirac) (invita) (al Dalai_Lama) (a un almuerzo oficial) (Chirac) (invites) (the Dalai_Lama) (to an official lunch)

  6. Crucial intermediate tasks Word Sense Disambiguation Large scale enrichment of (multilingual) Lexical KB Problems (research goals): Enriching LKBs, acquisition of linguistic knowledge: Corpora need to be accurately tagged with concepts Accurate WSD needs: Hand-tagged data OR richer LKB Multilinguality: Words in several languages linked to common concepts MeaningIntroduction

  7. Major research goals Knowledge acquisition into LKBs WSD into LKB concepts Multilingualism Meaning roadmap Overview of the project MeaningOutline

  8. Semi-automatic acquisition of linguistic knowledge from corpora is working Subcategorization information Selectional preferences Thematic role assignments Diathesis alternations Domain information Topic signatures Rich lexico-semantic relations between words (dictionaries) … Large bodies of text with (fast) shallow processors MeaningKnowledge acquisition into LKBs

  9. Knowledge for words is not enough: Verb senses have different selectional preferences for e.g. the subject The car ate all the petrol (WN) Verb senses may have different subcat. frames … Better to key into word senses: source corpora should be tagged Better reflect linguistic phenomena Detect new senses Clustering senses Integrate easily into the multilingual LKB MeaningKnowledge acquisition into LKBs

  10. Senseval-2 uses word senses (concepts) from WN 1.7 No large-scale broad-coverage WSD system is available Accuracy around 60%-70% (V/A/N) when hand-tagged data available Use hand-tagged data to train ML systems Ng’s estimate: 16 persons/year (short) Promising research lines Automatically create training corpus using semantic relations in the LKB (WN) Use untagged data to improve performance Higher precision if more knowledgeable features are used (subcat, sel. preferences, domains) Coarse grained: Domain tagging / Clusters of senses MeaningWSD into LKB concepts

  11. <evento> <agrupación grupo colectivo> <evento social> <grupo_social> <competición, concurso> <organización> <partido_1> <partido_2, partido_político> <semifinal> <cuartos_de_final> <partido_laborista> MeaningExploiting EWN Semantic Relations WSD

  12. MeaningExploiting EWN Semantic Relations partido 1 Pero España puso al partido intensidad, ritmo y coraje. El seleccionador cree que el partido de hoy contra Italia dará la medida de España El Racing no gana en su campo desde hace seis partidos. partido 2 Todos los partidos piden reformas legales para TV3. La derecha planea agruparse en un partido. El diputado reiteró que ni él ni UDC, “como partido”, han recibido dinero de Pellerols.

  13. MeaningExploiting EWN Semantic Relations partido 1 Rivera pide el soporte de la afición para encarrilar las semifinales. Sólo el equipo de Valero Ribera puede sentenciar una semifinal como lo hizo ayer en un Palau Blaugrana completamente entregado. El Racing ganó los cuartos de final en su campo. partido 2 No negociaremos nunca com un partido político que sea partidario de la independencia de Taiwan. Una vez más es noticia la desviación de fondos destinados a la formación ocupacional hacia la financiación de un partido político. Estas lleyes fueron votadas gracias a un consenso general de los partidos políticos.

  14. Language diversity is a barrier Language diversity is helpful Languages realize meaning in different ways Use EuroWN multilingual architecture: Interlingual Index (ILI) links translation equivalents via interlingual concepts: head ---------- s984574 --------- cabeza -------- s984557 --------- jefe Research on how linguistic knowledge behaves when ported to other language (e.g.subcat information) Very important for resource-poor languages MeaningMultilingualism

  15. Selectional preference for the object of the first sense of know: sense 1: know, cognize -- (be cognizant or aware of a fact or a specific piece of information; possess knowledge or information about; 0,1128 <communication> 0,0615 <measure quantity amount quantum> 0,0535 <attribute> 0,0389 <object physical_object> 0,0307 <cognition knowledge> In EuroWordNet (http://ixa.si.ehu.es) antzeman_1, jakin_2 and ezagutu_1 in Basque. conocer_1 and saber_1 in Spanish conèixer_1 and saber_1 in Catalan MeaningMultilingualism

  16. Solutions have been tried with relative success in isolation Combination for significant advances (which?) Web as corpus: BNC (100 Mw) small for many phenomena Incremental design: WSD using whatever knowledge available at the time for bootstrapping Acquisition of linguistic knowledge using WSD available at the time (may discard low accuracy examples) Integrating acquired knowledge in the Multilingual Central Repository and porting knowledge from one language to the other Series of cycles: WSD0, WSD1, WSD2, ACQ0, ACQ1, ACQ2, PORT0, PORT1, PORT2 MeaningMEANING roadmap

  17. Meaning Architecture English Web Corpus Italian Web Corpus WSD WSD English EWN Italian EWN ACQ UPLOAD UPLOAD ACQ Multilingual Central Repository PORT PORT PORT PORT Spanish EWN Basque EWN ACQ ACQ UPLOAD UPLOAD Spanish Web Corpus Catalan EWN Basque Web Corpus WSD Catalan Web Corpus WSD

  18. 3 years research project (started march 2002) 1.610 M Euro 2 contracted people per site Consortium TALP, UPC (German Rigau) ITC-IRST (Bernardo Magnini) IXA, UPV/EHU (Eneko Agirre) University of Sussex (John Carroll) Irion Technologies (Piek Vossen) Meaning Project overview

  19. A Tool Set that using the semantic knowledge of EWN will obtain automatically from the web large collections of examples for each particular word sense. A Tool Set for enriching EWN using the knowledge acquired automatically from the Web. A Tool Set for selecting accurately the senses of the open-class words for the languages involved in the project. Multilingual Central Repository to maintain compatibility between WordNets of different languages and versions, past and new. A semantically annotated corpus for each WordNet word sense, that is, a multilingual web corpus with semantically annotated corpora Demonstration: CLIR, Q/A system. The results of MEANING will be public and free for research. MeaningProject results

  20. Huge amounts of data: throw out non reliable Syntactic dependencies with high enough accuracy Supervised WSD with high enough accuracy Coarser grains, sense domain tagging Bootstrapping Success coping with multilingualism: Porting linguistic knowledge from one language to other using MT / comparable corpora CLIR as good as monolingual IR MeaningWhy now?

  21. Meaning MEANING: a Roadmap to Knowledge Technologies German Rigau. TALP Research Center. UPC. Barcelona. rigau@lsi.upc.es Bernardo Magnini. ITC-IRST. Povo-Trento. magnini@itc.it Eneko Agirre. IXA group. EHU. Donostia. eneko@si.ehu.es Piek Vossen. Irion Technologies. Delft. Piek.Vossen@irion.nl John Carroll. COGS. U. Sussex. Brighton. johnca@cogs.susx.ac.uk http://www.lsi.upc.es/~nlp/meaning/meaning.html

More Related