1 / 37

LIFT ing LEGO with RELISH : L exicon I nterchange F orma T in Use

LIFT ing LEGO with RELISH : L exicon I nterchange F orma T in Use. Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U. Outline. Background: The RELISH project The LEGO project Project interface Project workflow LIFT: Lexicon Interchange Format

daw
Télécharger la présentation

LIFT ing LEGO with RELISH : L exicon I nterchange F orma T in Use

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LIFTingLEGO with RELISH:Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U

  2. Outline • Background: The RELISH project • The LEGO project • Project interface • Project workflow • LIFT: Lexicon Interchange Format • What it is • Use in the LEGO project (“LL-LIFT”) • Sample entry: comparison with LMF-compliant XML output by LEXUS

  3. The RELISH Project • RELISH: Rendering Endangered Lexicons Interoperable through Standards Harmonization • Joint project: U of Frankfurt, MPI-Nijmegen, LINGUIST List • Goal: markup harmonization at two levels • Semantic • Structural • Test cases: 6 lexicons fr. LEGO & Lexus

  4. RELISH: MPI, ILIT and Lexicon Standards

  5. LEGO & RELISH LEGO Test Lexicons LEXUS Test Lexicons RELISH - LL-LIFT - GOLD - LMF-compliant XML (various) - ISOCats Interchange format: TEI? LIFT?

  6. The LEGO Project • 3-year project sponsored by the NSF • Participants: ILIT (Linguist List) & U at Buffalo • Goal: a “datanet” of interoperable lexicons. • Interoperability based on: • grammatical information mapped to GOLD • structure mapped to a common schema (LL-LIFT) • output in RDF

  7. The LEGO Project • Initial set of lexical resources: • 17 EL lexicons prepared by LINGUIST List: • Shoshone, Archi, Kayardild, Fulfulde, Mocovi, Biao Mien, Potawatomi, Saliba, W. Pantar, W. Sissala, Wichi, … • 3000+ wordlists prepared by U at Buffalo: • Usher-Whitehouse lists, Loanword Typology Lists, Intercontinental Dictionary Series lists. • Extensible: • In practice: to Lexicons in LIFT • Possibly: RDF, TEI (or another official serialization of LMF)

  8. The LEGO Project • Purposes • Not intended to develop a lexicon creation or lexicon display tool • Intended to • support multi-lexicon search and comparison • demonstrate the value of digital standards in linguistic research

  9. LEGO Workflow Descriptive XML LL-LIFT (XML) LEGO db mapping to GOLD LL Lexicon - MS Word Excel Access Toolbox Filemaker LEGO Interoperable Lexicon

  10. LEGO: lego.linguistlist.org

  11. LEGO Browse Lexicons Page

  12. LEGO: Lexicon ‘homepage’

  13. LEGO: Browse Lexicon Entries

  14. Demonstration of Interoperability: LEGO Search

  15. LEGO: Search multiple lexicons by LIFT field (Definition, Example, Variant, etc.

  16. LEGO: Search multiple lexicons by grammatical information (GOLD concept).

  17. Extending LEGO: Upload Lexicon or Wordlist

  18. Map to GOLD: Choose Lexicon to Edit

  19. Mapping 1: View lexicon’s labels for grammatical concepts

  20. Mapping 2: Click to view GOLD concepts, Click ‘Add’ to Map

  21. Mapping 4: Lexicon label mapped to GOLD concept

  22. Mapping 3: Access GOLD interface (if needed)

  23. LIFT • LIFT = Lexicon Interchange Format • XML format for storing and exchange of lexical information • Developed by SIL International • Designed to be easy to convert into and out of MDF and Fieldworks formats • Current version: http://code.google.com/p/lift-standard/downloads/detail?name=lift_13.pdf

  24. Programs that support LIFT • WeSay uses LIFT as its primary format. • FieldWorks Language Explorer (FLEx) can import and export LIFT files. • Lexique Pro can open LIFT documents for viewing, printing, and making web pages. It can also save to LIFT format. (fr. http://code.google.com/p/lift-standard/)

  25. Utilities for LIFT • Solid can convert basic SFM (standard format markers, e.g. Toolbox format) to LIFT (see: http://lingtransoft.info/apps/solid • LiftTweaker Can selectively modify a LIFT file for targeting different audiences

  26. Lexicons in LIFT • LIFT chosen as upload format for LEGO because of the large number of lexicons potentially available in LIFT • About 50 published lexicons in Lexique Pro • 180+ lexicons in Fieldworks Language Explorer (FLEx) ? • 300+ lexicons in Shoebox/Toolbox ? • With the owner’s permission, these could easily be integrated into the LEGO system

  27. LIFT UML Diagram for Entry

  28. Notes • Grammatical Info is attached to Sense, not to Entry or Form (differs from LMF) • Variant is attached to Entry, not ‘sense’ – can’t add a ‘sense’ to a variant • Multiple senses and variants allowed • Highly customizable: Field, Type, and Range can be added to virtually any element (can be defined in the document header)

  29. LL-LIFT • Lack of constraint on use of Field and Trait constituted a problem for LEGO • Developed ‘LL-LIFT’ • a constrained form of LIFT • which still validates against the LIFT schema

  30. LL-LIFT • Major constraints • No Header • Grammatical Information confined to a single element • Delimited within db field • Parsed out during GOLD mapping • Minor entries, comparison forms, etc • separate entries • unified via ‘relation’ element

  31. <entry id="d1e56244"> <trait name="original-id" value="2c9090a22632946601267b22f98e5098"/> <lexical-unit> <form lang="syv"> <text>дадагалзаар</text> </form> </lexical-unit> <variant> <form lang="syv"> <text>dadagalzaar</text> </form> </variant> <sense> <grammatical-info value="n."/> <definition> <form lang="eng"> <text>doubt</text> </form> </definition> </sense> </entry> Tuva entry in LIFT

  32. <LexicalEntry id="2c9090a22632946601267b22f98e5098"><DataCategory type="rank">5155</DataCategory><DataCategory type="lexeme">дадагалзаар</DataCategory><DataCategory type="part of speech">n.</DataCategory><Form id="2c9090a22632946601267b22f9fd50a0" type="form"> <DataCategory type="image"/> <DataCategory type="Audio">tvn_5155.mp3</DataCategory></Form><Sense id="2c9090a22632946601267b22f9fd50a6" type="sense"> <DataCategory type="gloss">doubt</DataCategory> <DataCategory type="transcription">dadagalzaar</DataCategory></Sense><ListOfComponents/> </LexicalEntry> Tuva entry exported from LEXUS (LMF-compliant)

  33. <LexicalEntry id="2…"><DatCat type="rank">5155</DatCat><DatCat type="lexeme"> дадагалзаар</DatCat><DatCattype="part of speech">n. • </ DatCat ><Form id="2…" type="form"> < DatCattype="image"/> < DatCattype="Audio"> tvn_5155.mp3</DatCat></Form><Sense id="2…" type="sense"> < DatCattype="gloss">doubt</DatCat >< DatCattype="transcription"> dadagalzaar</DatCat></Sense><ListOfComponents/> </LexicalEntry> <entry id="d1e56244"> <trait name="original-id” value=“2…"/> <lexical-unit> <form lang="syv"> <text>дадагалзаар</text> </form> </lexical-unit> <variant> <form lang="syv"> <text>dadagalzaar</text> </form> </variant> <sense> <grammatical-info value="n."/> <definition> <form lang="eng"> <text>doubt</text> </form> </definition> </sense> </entry> LMF LL-LIFT

  34. Summary • LL-LIFT = current XML schema used in LEGO • Easy to transform (at least one) LMF-compliant XML output of LEXUS to LL-LIFT • Transformation could be extended to TEI serialization of LMF. May require constraint

More Related