1 / 74

MDF and its Applications

MDF and its Applications. Sebastian Drude & Irina Nevskaya Goethe-Universität Frankfurt RELISH / Lexicon Meeting Nijmegen July 2010. MDF and ist Applications. MDF: what is it? Organization of the MDF-format Advantages, problems with MDF Applications and conversions

cormac
Télécharger la présentation

MDF and its Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MDF and its Applications Sebastian Drude & IrinaNevskaya Goethe-Universität Frankfurt RELISH / Lexicon Meeting Nijmegen July 2010

  2. MDF and ist Applications • MDF: what is it? • Organization of the MDF-format • Advantages, problems with MDF • Applications and conversions • MDF in the RELISH project: Udi

  3. MDF and ist Applications • MDF: what is it? • Organization of the MDF-format • Advantages, problems with MDF • Applications and conversions • MDF in the RELISH project: Udi

  4. 1. MDF: what is it? • Originally, the Multiple Dictionary Formatterwas an independent computer program • It converted certain files in Standard Formatinto RTF (to be further processed and printed with office software) • Today it is part of the Toolbox (formerly Shoebox) program, in form of Consistent Changes tables (*.cct, complex scripts for search-and-replace routines) and MS-Word template files (*.dot)

  5. 1. MDF: what is it? Standard Format (SF) is a very old text format developed by SIL with minimal mark-up: • The content is organized in “fields” • Each field consists of a “marker” (a newline followed by a backslash and a sequence of letters, hyphens, digits etc.) and the “field content” (free text), separated from the marker by a space character • This is a simple feature–value structure

  6. “Standard Format” data file Entry Field Field marker Field content

  7. 1. MDF: what is it? • The MDF program uses a certain SET of markers, representing typical data categories used in traditional lexicography • Properties of the fields (Language etc.) and a minimal hierarchical structure through a “is–below”–relation are kept in a separate “.typ” (type) file, which is also in SF • In this sense, a file in MDF format is a (SF) text file which uses the MDF set of markers (in the MDF hierarchical organization)

  8. MDF.typ (config file) Marker def. Description Language Position in hierarchy

  9. MDF and ist Applications • MDF: what is it? • Organization of the MDF-format • Advantages, problems with MDF • Applications and conversions • MDF in the RELISH project: Udi

  10. 2. Organization of the MDF-format • There are currently about 100 markers directly supported by MDF (“MDF-fields”) • The basic hierarchy is:\lx (lexeme)└˃\se (sub-entry)└˃\ps (part of speech)└˃\sn (sense number) • Other hierarchies might or used to be supported: ( \lx>\se >\sn>\psor\lx>\sn>\ps>\se )

  11. 1. MDF: what is it? MDF is documented by the book: Coward, David F. & Grimes, Charles E. (2000). Making Dictionaries: A guide to lexicography and the Multi-Dictionary Formatter. Waxhaw, North Carolina: SIL International (1st ed. 1995) URL: http://www.sil.org/computing/shoebox/MDF_2000.pdfhttp://www.sil.org/computing/shoebox/MDF_Updates.html

  12. 2. Organization of the MDF-format Several fields can be repeated for up to four different languages, where “..” → v = vernacular, e = English, n = national, r = regional • \ps, \pn– part of speech for main entry word (English, national) • \g..– gloss for main entry word • \d..– definition for main entry word • \re, \rn, \rr– reverse (for indexes) • \we, \wn, \wr– word-level gloss • \x..– example (sentence and translations) • \e..– encyclopedic information • \u..– usage information • \o..– only (restriction) information • (\va), \ve, \vn, \vr– variant form comment • (\cf), \ce, \cn, \cr– cross reference gloss • (\lf), \le, \ln, \lr– “lexical function” (gloss for related word) • \pd..– “paradigm” (gloss for –irregular– form)

  13. 2. Organization of the MDF-format Some 20 fields are discouraged: • \an (antonym), \sy (synonym) are to be substituted by the \lf (lexical function), \lfv(lexical function vernacular), \lf.. (lexical function gloss) fields • \sg (singular), \pl (plural), \1s (first person singular) etc. are to be substituted by the \pdl (paradigm form label), \pdv (paradigm form vernacular), \pd.. (paradigm form gloss) fields (not yet in the documentation) Two fields (\dt, \st) are administrative fields So there are only about 50 genuinely different MDF fields

  14. 2. Organization of the MDF-format • Some of the fields form blocks/groups via the hierarchy, for instance: • \lf (lexical function, relations to other entries) └˃\lfvrelatedform, \lf..glossof rel. form (Engl., nat., reg.) • \pd (Paradigm information & irregular forms) └˃\pdlpdg. label, \pdvpdg. form, \pd..pdg. gl. (Engl., nat., reg.) • \rf (reference to an example) └˃\xvexampleform in thevernacular └˃\x..translationof rel. form (Engl., nat., reg.) • \cf (cross-reference form) └˃\c..cross-referencegloss (Engl., nat., reg.) • \va (variant form) └˃\v..commentonvariantform (Engl., nat., reg.)

  15. MDF and ist Applications • MDF: what is it? • Organization of the MDF-format • Advantages, problems with MDF • Applications and conversions • MDF in the RELISH project: Udi

  16. 3. Advantages, problems with MDF Advantages: • Very flexible SF database format (optional fields, repeated fields etc.) • Quite exhaustive for standard lexicography in field research on minority languages • Is a de-facto standard, although Toolbox is officially not supported by SIL any more (now replaced by FIELD / FLEX)

  17. 3. Advantages, problems with MDF General problems: • Flexibility of SF allows for inconsistencies • Only recommended order for sister fields • Almost always extended and adjusted arbitrarily by individual users (MDF-derived / MDF-based formats) • Changes in the hierarchy in the configuration are not reflected in the data file and vice versa • Missing closing tags in SF impair conversions

  18. 3. Advantages, problems with MDF Specific problems in the RELISH project: • \ph (phonetic form) is too generic, it would be needed in several different contexts (\cf, \va, \pdv, \lfv…) • \lt(literal meaning) exists only for head word, it would be needed for borrowed words etc. • Even the 3 languages are not sufficient • To set a “language” property should be possible for arbitrary fields

  19. 3. Advantages, problems with MDF Specific problems in the RELISH project: • No clear solution for covering several dialects • In particular if no dialect is “standard” • Different solutions: • \ue (usage information) • \oe (only / restriction) • \ns (notes on sociolinguistics, varieties) • \lf SynD = … (lexical function “Dialectal Synonym”) • \va & \ve (variant form and English comment) • Most of these solutions only hold for the head word, we would need dialect marking for \lx, \xv, \va, …

  20. 3. Advantages, problems with MDF Comment on dialect problem in MDF book: “We intend future enhancements of MDF to have fields dedicated to dialectal information, but at present the programming limitations do not allow us any more field bundles. For the present, use \va and \lf SynD =. (footnote p23)

  21. MDF and ist Applications • MDF: what is it? • Organization of the MDF-format • Advantages, problems with MDF • Applications and conversions • MDF in the RELISH project: Udi

  22. 4. Applications and conversions “Applications” (of the format) may have different meanings: • For different languages / dictionary projects • For transformations / conversions: • print-dictionaries (via Toolbox, MDF, Word / RTF) • HTML (Lexique Pro) • XML (Toolbox export) • LMF – XML (Lexus import) • FLEX database

  23. 4. Applications and conversions Problems with all conversions: • What happens with inconsistencies? • What happens with different orders of same-level-fields? • What happens with additional (non-MDF) fields? • What happens with sub-entries?

  24. 4. Applications and conversions

  25. 4. Applications and conversions

  26. 4. Applications and conversions

  27. MDF and ist Applications • MDF: what is it? • Organization of the MDF-format • Advantages, problems with MDF • Applications and conversions • MDF in the RELISH project: Udi

  28. 5. MDF in the RELISH project: Udi

  29. 5. MDF in the RELISH project: Udi

  30. 5. MDF in the RELISH project: Udi • Digital representationof a printdictionary, withadditions • Mainproblem: severallanguages: • Udi (v) • Azerbaidjan (Cyrillic) (n1) • Azerbaidjan (Latin) (n1lat) (addition) • Georgian (n2) • Russian (r) • English (e) (addition)

  31. 5. MDF in the RELISH project: Udi • The Udi Toolbox database uses 53 fields • of these, 14 are standard MDF fields • 11 are MDF fields which have a slightly different position in the hierarchy • 28 fields are additional fields • most (19) of these are for adjusting the additional “languages” (and scripts) • 5 are for additional phonetic representations

  32. MDF-LEXUS conversion • From a printeddictionaryto a markuptextfile • From a markuptextfiletothe MDF structure in the Toolbox environment • Fromthe MDF structuretothe LEXUS structure

  33. Step 1. From a printeddictionaryto a markuptextfile - 1

  34. Step 1. From a printeddictionaryto a markuptextfile - 2

  35. Step 2. From a markuptextfiletothe MDF structure in the Toolbox environment - 1 • Establishing correlations of different sign combinations and their linguistic counterparts • Establishing the MDF markers‘ structure and their hierarchies • Consistency checks: • Cross-reference failures: • - absence of the head word • - absence of the variant • Numerous spelling mistakes • Numerous mistakes in the Russian and English translations • Inconsistencies in contrasting subentries and examples

  36. Step 2. From a markuptextfiletothe MDF structure in the Toolbox environment -2

  37. Step 3. Fromthe MDF structuretothe LEXUS structure - 1

  38. Step 3. Fromthe MDF structuretothe LEXUS structure - 2

  39. Step 3. Fromthe MDF structuretothe LEXUS structure - 3

  40. Step 3. Fromthe MDF structuretothe LEXUS structure - 4

  41. 5. MDF in RELISH: Udi into Lexique Pro

  42. 5. MDF in RELISH: Udi into Lexique Pro

  43. Fromthe MDF tothe FLEX structure • Defining writing systems • Problems with introducing digraphs and the corresponding sort orders • Defining import properties • Problems with markers‘ matching due to different markers and their hierarchies • Import failures • 2 attempts: • project Udi1 • Project Udi 2

More Related