120 likes | 263 Vues
The Open Lexicon Interchange Format (OLIF) Version 2 is a user-friendly framework developed by the OLIF2 Consortium, including SAP and industry leaders like IBM and Xerox, to facilitate common formats for natural language processing (NLP) tools, particularly machine translation (MT) systems. OLIF aims to provide a robust lexicon and terminology handling mechanism while ensuring XML compliance and improved language representation. Initiated in 2000, OLIF2 presents a more detailed structure with enhanced functionalities for lexical entries, including cross-references and semantic readings, supporting linguistic diversity in NLP applications.
E N D
Developing OLIF, Version 2 Susan M. McCormick Christian Lieske OLIF2 Consortium SAP/Walldorf, Germany
The Original OLIF The Open Lexicon Interchange Format • Developed as part of OTELO and Aventinus projects: • attempt to define common formats and interfaces for different NLP tools, especially MT systems • Aim of OLIF format: • simple, user-friendly vehicle for interfacing with multiple electronic lexical and terminological resources
OLIF Lexicon/Terminology Handling • Grammatical description: • relatively complex to meet needs of MT systems • linguistic analysis must represent common base • Terminology coverage: • adequate to handle basic term exchange • no duplication of well-established term exchange formats, e.g., MARTIF Purpose generate basic, usable NLP-system entry from an OLIF record
OLIF2 Consortium www.olif.net Initiated by SAP in March 2000 • Xerox Lotus SAP Microsoft Trados IBM Logos Sail Labs The EC L10NBRIDGE • Build and improve on OLIF by revising for • XML-compliance • improved language coverage • more comprehensive linguistic analysis
Concertation with SALT • Integrate exchange standards generated by OLIF2 and SALT initiatives XLT Terminology SALT Lexicon: OLIF2
Structure and Content of OLIF2 • Maintains straightforward structure of OLIF: • minimal nesting • features informally grouped based on character of information being represented, e.g., semantic, syntactic, administrative • Supports representation of vital system data, rather than an exhaustive store of features • implies implementation of defaulting strategies on part of vendors using OLIF2
Body of the OLIF2 File • Monolingual entries identified uniquely by: • language • part of speech • canonical form • subject field • semantic reading • Entries may include: • unidirectional, bilingual transfer links • monolingual cross-reference links
Sample OLIF2 Entry <body> <entry> <mono id="1" lang="DE" ptOfSpeech="noun"> <canForm>Briefkurs</canForm> <subjField>gac-fi</subjField> <semReading>meas</semReading> </mono> <transfer target="2" equival="full"> </transfer> </entry> <entry> <mono id="2" lang="EN" ptOfSpeech="noun"> <canForm>bank selling rate</canForm> <subjField>gac-fi</subjField> </mono> </entry> </body>
Improvements • Inflection class patterns for all languages • Expanded syntactic frame analysis • More detailed semantic type hierarchy • Cross-reference options augmented by ISO 12620 categories and EuroWordNet (July, 2000). • Improved syntax for transfer conditions and actions • User guidelines for formulating canonical forms
Transfer Conditions Specifies context in source language for which transfer is valid <transCond> <context>head</context> <transTest> <featTest type="case">d</featTest> </transTest> </transCond> Transfer is valid if the source word is in the dative case
Transfer Actions Action performed in the transfer language based on the context specified for the source <transCond> <context>head</context> <transTest> <featTest type="case">d</featTest> </transTest> <transAct> <addToHead type="prep">for</addToHead> </transAct> </transCond> If the source word is dative, the corresponding target word is the object of the preposition ‘for’
Plans for Completion of OLIF2 • Final specifications February 2001 • DTD February 2001 • Testing April 2001 • Harmonization with SALT April 2001 • Implementation = Import, Export facilities for vendors within consortium 2001