160 likes | 174 Vues
Explore the EMELD Workshop 2002, a project focused on preserving endangered languages data and developing linguistic archives infrastructure. Components include a metadata server, best practices promotion, query room, and outreach workshops. The emphasis is on wide information access using web-based tools and open standards. Progress includes metadata collection, language identification, and morphosyntactic markup creation. Affiliated with OLAC, the initiative aims to promote multi-archive searching. Explore resources on LINGUIST List and participate in language data preservation.
 
                
                E N D
Overview & Update Helen Aristar DryThe LINGUIST List & Eastern Michigan University EMELD Workshop onThe Digitization of Lexical Data Aug. 2-5, 2002 EMELD Workshop 2002
What Is E-Meld? “Electronic Metastructure for Endangered Languages Data” • 5 year collaborative project, begun Sept. 2001 • Participants: • The LINGUIST List (Eastern Michigan U., Wayne State U., U. of Arizona) • The Linguistic Data Consortium (University of Pennsylvania) • The Endangered Languages Fund (Yale University, Haskins Laboratories) • Funded by NSF EMELD Workshop 2002
The LINGUIST List • 16,500 subscribers • 106 different countries • 4 European mirror sites: • Tübingen | Stockholm • Edinburgh | Moscow EMELD Workshop 2002
Objectives To aid in … • …the preservation of Endangered Languages data and documentation • …the development of infrastructure for linguistic archives EMELD Workshop 2002
Components • Metadata server facilitating access to language resources • Promulgation of best practice in: • Language identification • Resource description • Markup or annotation • Involvement of linguistic community in deciding best practice • Query Room, where questions can be addressed to native speakers • Demonstrationproject: texts and lexicons from 10 EL’s marked up according to best practice EMELD Workshop 2002
Languages EMELD Workshop 2002
Outreach • Workshops • 2001 – Santa Barbara, CA: • focus: metadata, markup, language codes • 2002 – Ann Arbor/Ypsilanti, MI • focus: lexicon markup & metadata • 2003, 2004: workshops • 2005, 2006: “digital institutes” EMELD Workshop 2002
Project Emphasis: Breadth • Widest access to information • Web-based tools • Open standards • Simple interfaces EMELD Workshop 2002
2001-2 Progress • Metadata Collection: • Search facility • Metadata editor • Language Identification • Query Room • Markup OLAC Service Provider ORE Ethnologue + LL Codes:used throughout LL site (ELF & Rosetta) Ontology (U. of Arizona) EMELD Workshop 2002
Markup • Focus: morphosyntactic markup • Objective: a system which allows: • Field workers to submit data in different markups • Searcher to retrieve all relevant data despite varying markups • No “gold standard” in linguistic markup • Instead: ontology to serve as “interlanguage” for translation among markups EMELD Workshop 2002
Markup • Tool to translate common markup formats (RDF, Shoebox, Word) into XML • Tool to help linguist identify aspects of markup with concepts in the ontology • More on this today from Langendoen, Lewis, and Farrar EMELD Workshop 2002
Data Input Tool • Web-based • Potentially portable • Creates database input– to be output as xml • Can be customized to fit individual language • More on this tomorrow from Martha Ratliff & Zhenwei Chen EMELD Workshop 2002
Affiliation w/OLAC • Resource identification OLAC Service Provider • OLAC = Open Language Archives Community • Part of Open Archives Initiative • Multi-disciplinary initiative to promote multi-archive searching via http protocols EMELD Workshop 2002
Contributor Coverage Creator Date Description Format Identifier Language OLAC Metadata Set Based on Dublin Core Set of 15 Elements • Publisher • Relation • Rights • Source • Subject • Title • Type With 2 refinements Subject.language Type.linguistic: Draft of controlled vocabulary Type.linguistic EMELD Workshop 2002
OLAC Service Provider LINGUIST List http: GET or POST Metadata Data Provider(Archive) Data Provider 3(Archive) Data Provider 2: Individual Data Provider 2: Individual EMELD Workshop 2002
On LINGUIST • OLAC Search: http://linguistlist.org/olac/ • 18 archives, 30,000+ records • Metadata Editor (ORE): http://linguistlist.org/olac/ore/ • Form-based editor • Creates OLAC metadata in xml • Makes it available to OLAC search engine • Language Lookup: http://linguistlist.org/languages EMELD Workshop 2002