1 / 1

I nformation

Diana Santos Lu í s Costa Luís Miguel Cabral. Oslo (1998). Odense (2000). Braga (2000). Lisboa XLDB (2004). Coimbra (2005). Porto (2002). Lisboa COMPARA (2002). Lisboa LabEL (2001). Linguateca , a distributed resource center for language technology for Portuguese

svein
Télécharger la présentation

I nformation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Diana Santos Luís Costa Luís Miguel Cabral Oslo (1998) Odense (2000) Braga (2000) Lisboa XLDB (2004) Coimbra (2005) Porto (2002) Lisboa COMPARA (2002) Lisboa LabEL (2001) Linguateca, a distributed resource center for language technology for Portuguese www.linguateca.pt SINTEF ICT Cooperative and Trusted Systems (Untill March) IRE model: Information, Resources and Evaluation • Resources • Corpora (large bodies of text): • AC/DC: allows one to query syntactically annotated texts (up to 250 million words) online • COMPARA: the largest post-edited parallel corpus in the world: Portuguese and English source texts and their translations • Floresta sintá(c)tica: treebank • CETEMPúblico, CETENFolha • IR collections • WPT03: all Portuguese Web • CHAVE: newspaper doc.s and topics • Tools • Question answering (Esfinge) • Named entity recognition (SIEMÊS) • Tokenizers, sentence separators • Morphological analysers (AnELL) • Spellcheckers (Jspell) • Word aligners (NATools) • Other resources • Corpógrafo (a full-fledged system for terminology and knowledge management) • GKB (Geographic Knowledge Base) and Geo-Net-PT01 • REPENTINO: a NER gazetteer • BACO: database of collocations • Research tools or resources • Example-based machine translation • Ontology extraction from text • Ontology building from dictionaries • SUPERB: Extraction and quality checking of publication citations • Evaluation • Organization of evaluation contests • Compare several systems around a shared task • Create evaluation resources • Create evaluation programs • Organize a workshop to discuss the results and the evaluation • Evaluation contests • Morfolimpíadas (morphological analysis out of context): 2003 • CLEF for Portuguese (Cross-language Information Retrieval, QA, geographic IR, WebIR, ImageIR): 2004, 2005, 2006 • HAREM (Named entity recognition): 2005, 2006 • Other evaluation activities • MT from English into Portuguese: evaluating the performance of actual Web translation engines • Unobstrusive user evaluation of Web services • Component evaluation of Esfinge Information • We maintain a large web portal on the computational processing of the Portuguese language, with more than 2,000,000 visits so far. • We list resources, tools and services, as well as actors and publications, and we offer a repository in the area. • We also answer questions and help users about any related subject. • We make available already existing resources and develop new, as well as their full documentation. The architecture of Esfinge The architecture of SUPeRB Diana.Santos@sintef.no Luis.Costa@sintef.no Luis.M.Cabral@sintef.no

More Related