html5-img
1 / 26

MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information

LREC 2010, 19 May 2010. MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information. ISO TC37 SC4 WG3 24616. Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary, Nasredine Semmar. Outline. Scope of MLIF Purpose and Justification of MLIF Description of MLIF Use Cases

hallam
Télécharger la présentation

MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LREC 2010, 19 May 2010 MLIF: A Metamodel to Represent and Exchange Multilingual Textual Information ISO TC37 SC4 WG3 24616 Samuel Cruz-Lara, Gil Francopoulo, Laurent Romary, Nasredine Semmar

  2. Outline • Scope of MLIF • Purpose and Justification of MLIF • Description of MLIF • Use Cases • Current Status

  3. Scope of MLIF

  4. Scope • MLIF aims at proposing a specification platform to represent multilingual data within a large variety of applications such as translation memories, localization, computer-aided translation, multimedia or electronic document management • MLIF introduces a metamodel in combination with chosen data categories in order to allow the description of any specific domain • MLIF provides a way to validate any instance of this metamodel, as well as, interoperability principles with numerous translation and localization standards

  5. Outline Scope of MLIF Purpose and Justification of MLIF Description of MLIF Use Cases Current Status 5

  6. Purpose and Justification • The evolution of Communication and Information Technologies and in particular natural language processing, makes acute the question of standardization • The issues related to standardization are of an industrial, economic and cultural nature • The control of the interoperability between the existing industrial standards for localization (XLIFF), translation memory (TMX), … constitutes a major objective for a coherent and global management of multilingual data • MLIF could be associated to multimedia standards such as MPEG-4 [ISO/IEC 14496 ], MPEG-7 [ ISO/IEC 15938 ], and W3C SMIL, in order to handle multilingual data within several multimedia applications such as, interactive TV, video conferencing, subtitling, etc • All these formats work well in the specific field they are designed for, but they lack a synergy that would make them interoperable when using one type of information in a slightly different context • MLIF should be considered as a unified conceptual representation of multilingual and multimedia content

  7. Outline Scope of MLIF Purpose and Justification of MLIF Description of MLIF Use Cases Current Status 7

  8. Description of MLIF • As with “Terminological Markup Framework” TMF [ISO 16642] in terminology, MLIF introduces a metamodel in combination with chosen data categories [ ISO 12620 ] • These data categories will be derived as a subset of a Data Category Registry (DCR) in order to ensure interoperability between several multilingual applications and corpora • A Data Category Specification (DCS) will define, in combination with the metamodel, the various constraints that apply to a given domain-specific information structure or interchange format • MLIF describes elementary linguistic segments (i.e. sentence, syntactical component, word, …)

  9. MLIF Metamodel MLDC (Multi Lingual Data Collection) GI (Global Information) HistoC (History Component) GroupC (Grouping Component) MultiC (Multilingual Component) MonoC (MonoLingual Component) SegC (Segmentation Component)

  10. MLIF Metamodel • Multi Lingual Data Collection (MLDC) • Represents a collection of data containing global information and several multilingual units • Global Information (GI) • Represents technical and administrative information applying to the entire data collection. Example: title of the data collection, revision history, …

  11. MLIF Metamodel • History Component (HistoC) • This generic component allows to trace modifications on the component it is anchored to (i.e. versioning) • Grouping Component (GroupC) • Represents a sub-collection of multilingual data having a common origin or purpose within a given project

  12. MLIF Metamodel • Multi Lingual Component (MultiC) • This component represents a unique multilingual entry • Mono Lingual Component (MonoC) • Part of a multilingual component containing information related to one language • Segmentation Component (SegC) • A recursive component allowing any level of segmentation for textual information • In order to provide a larger description of the linguistic content, the MLIF metamodel allows anchoring of other metamodels, such as MAF (morphological description), SynAF (syntactical annotation), TMF (terminological description), or any other metamodel based on ISO 12620

  13. MLIF Metamodel Data Categories Domain Project Source sourceType sourceLanguage class duration begin next xml:id xml:lang xlink …

  14. MLIF: a simple example

  15. Outline Scope of MLIF Purpose and Justification of MLIF Description of MLIF Use Cases Current Status 15

  16. Use Cases • Interoperability • Linguistic Properties • Related Standards • Multimedia • Interactive TV

  17. Interoperability TMX “the sentence contains differentformatting information” 17

  18. Interoperability TMX file MLIF file 18

  19. Linguistic Properties • Sentences, words, … • Time related issues

  20. Linguistic Properties TMX file produced by TRADOS MLIF file produced by CEA LIST Sentence Aligner 20

  21. Related Standards • TEI (Text Encoding Initiative) • The description of all different XML elements has been done by using RelaxNG [ ISO 19757-2 ] with the help of ODD • W3C ITS (International Tag Set) • ITS is a set of rules, expressed in elements, that provide information on how parts of a given DTD or XML Schema are related to specific internationalization & localization propertie • W3C SMIL • SMILtext • MLIF may be used to include pre-existant non-MLIF data like the ones that are produced by NLP tools

  22. Multimedia

  23. W3C SMIL Standardization - Development of Interactive TV Profile - Integration of Annotation Support - Definition of Temporal Text Processing ISO MLIF Standardization - Development of MLIF format - Development of a multilingual processing pipeline - Interaction with SMIL and MPEG standards l’histoire du courage d’une femme pour démasquer un mystère la historia da la valentía de una mujer para desenmascarar unmisterio linguistic segment linguistic segment Monolingual component Monolingual component multilingual component multilingual DB Interactive TV Timed, Multilingual Textual Descriptions

  24. Outline Scope of MLIF Purpose and Justification of MLIF Description of MLIF Use Cases Current Status 24

  25. Current Status • AWI (August 2006) • CD (Mai 2009) • DIS (February 2010) • Ongoing ballot process 25

  26. Thank you! • Thank you for your attention • Any question? • Mailing list • mlif@loria.fr • Web site • http://mlif.loria.fr

More Related