1 / 16

Overview & Update

Overview & Update. Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization of Lexical Data Aug. 2-5, 2002. What Is E-Meld?. “Electronic Metastructure for Endangered Languages Data”. 5 year collaborative project, begun Sept. 2001 Participants:

leac
Télécharger la présentation

Overview & Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview & Update Helen Aristar DryThe LINGUIST List & Eastern Michigan University EMELD Workshop onThe Digitization of Lexical Data Aug. 2-5, 2002 EMELD Workshop 2002

  2. What Is E-Meld? “Electronic Metastructure for Endangered Languages Data” • 5 year collaborative project, begun Sept. 2001 • Participants: • The LINGUIST List (Eastern Michigan U., Wayne State U., U. of Arizona) • The Linguistic Data Consortium (University of Pennsylvania) • The Endangered Languages Fund (Yale University, Haskins Laboratories) • Funded by NSF EMELD Workshop 2002

  3. The LINGUIST List • 16,500 subscribers • 106 different countries • 4 European mirror sites: • Tübingen | Stockholm • Edinburgh | Moscow EMELD Workshop 2002

  4. Objectives To aid in … • …the preservation of Endangered Languages data and documentation • …the development of infrastructure for linguistic archives EMELD Workshop 2002

  5. Components • Metadata server facilitating access to language resources • Promulgation of best practice in: • Language identification • Resource description • Markup or annotation • Involvement of linguistic community in deciding best practice • Query Room, where questions can be addressed to native speakers • Demonstrationproject: texts and lexicons from 10 EL’s marked up according to best practice EMELD Workshop 2002

  6. Languages EMELD Workshop 2002

  7. Outreach • Workshops • 2001 – Santa Barbara, CA: • focus: metadata, markup, language codes • 2002 – Ann Arbor/Ypsilanti, MI • focus: lexicon markup & metadata • 2003, 2004: workshops • 2005, 2006: “digital institutes” EMELD Workshop 2002

  8. Project Emphasis: Breadth • Widest access to information • Web-based tools • Open standards • Simple interfaces EMELD Workshop 2002

  9. 2001-2 Progress • Metadata Collection: • Search facility • Metadata editor • Language Identification • Query Room • Markup OLAC Service Provider ORE Ethnologue + LL Codes:used throughout LL site (ELF & Rosetta) Ontology (U. of Arizona) EMELD Workshop 2002

  10. Markup • Focus: morphosyntactic markup • Objective: a system which allows: • Field workers to submit data in different markups • Searcher to retrieve all relevant data despite varying markups • No “gold standard” in linguistic markup • Instead: ontology to serve as “interlanguage” for translation among markups EMELD Workshop 2002

  11. Markup • Tool to translate common markup formats (RDF, Shoebox, Word) into XML • Tool to help linguist identify aspects of markup with concepts in the ontology • More on this today from Langendoen, Lewis, and Farrar EMELD Workshop 2002

  12. Data Input Tool • Web-based • Potentially portable • Creates database input– to be output as xml • Can be customized to fit individual language • More on this tomorrow from Martha Ratliff & Zhenwei Chen EMELD Workshop 2002

  13. Affiliation w/OLAC • Resource identification OLAC Service Provider • OLAC = Open Language Archives Community • Part of Open Archives Initiative • Multi-disciplinary initiative to promote multi-archive searching via http protocols EMELD Workshop 2002

  14. Contributor Coverage Creator Date Description Format Identifier Language OLAC Metadata Set Based on Dublin Core Set of 15 Elements • Publisher • Relation • Rights • Source • Subject • Title • Type With 2 refinements Subject.language Type.linguistic: Draft of controlled vocabulary Type.linguistic EMELD Workshop 2002

  15. OLAC Service Provider LINGUIST List http: GET or POST Metadata Data Provider(Archive) Data Provider 3(Archive) Data Provider 2: Individual Data Provider 2: Individual EMELD Workshop 2002

  16. On LINGUIST • OLAC Search: http://linguistlist.org/olac/ • 18 archives, 30,000+ records • Metadata Editor (ORE): http://linguistlist.org/olac/ore/ • Form-based editor • Creates OLAC metadata in xml • Makes it available to OLAC search engine • Language Lookup: http://linguistlist.org/languages EMELD Workshop 2002

More Related