1 / 8

Advancing Machine Translation for Scarce Resource Languages: The AVENUE Project

The AVENUE Project addresses the challenges of machine translation for languages with scarce resources, like Mapudungun from Chile. With limited electronic text and few linguists proficient in computational rules, our approach involves learning translation rules directly from bilingual informants. Utilizing machine learning, we aim for a low-cost, rapid development of translation systems. Our partnerships with indigenous communities in Latin America and Alaska ensure their languages are represented in technology, allowing native speakers to contribute to their language's digitalization and preservation.

inigo
Télécharger la présentation

Advancing Machine Translation for Scarce Resource Languages: The AVENUE Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Translation with Scarce Resources The Avenue Project

  2. Scarce Resources • Not much text in electronic form. • Very few linguists who can write computational rules. • No standard orthography • Kudaw, kusaw (work) (Mapudungun, Chile) • Not even sure of pronunciation: • EH-nvelope, AH-nvelope (envelope) (English, US, not a language with scarce resources)

  3. Our Approach • Learn rules from a controlled corpus. • Corpus is elicited from bilingual speakers. • The informant only needs to translate and align words.

  4. AVENUE Project • New Ideas • Use machine learning to learn translation rules from native speakers who are not trained in linguistics or computer science. • Multi-Engine translation architecture can flexibly take advantage of whatever resources are available. • Research partnerships with indigenous communities in Latin America and Alaska (Mapudungun (Chile), Siona (Colombia), Inupiaq (Alaska)) Interface for data elicitation • Impact • Rapid and low-cost development of machine translation for languages with scarce resources. • Policy makers can get input from indigenous people. • Indigenous people can participate in government and internet. Schedule Year 1: Seeded Version Space learning– first version Year 2: Example-Based Machine Translation of Mapudungun (Chile). Year 3: Multi-Engine Mapudungun system (EBMT and partially learned transfer rules) Carnegie Mellon University, Language Technologies Institute: L. Levin, J. Carbonell, A. Lavie, R. Brown

  5. Elicitation Interface

  6. Elicitation Corpus: example English : I fell. Spanish: Caí Mapudungun: Tranün English: I am falling. Spanish: Estoy cayendo Mapudungun: Tranmeken

  7. Elicitation Corpus: example English: You (John) fell. Spanish: Tu (Juan) caiste Mapudungun: Eymi tranimi (Kuan) English: You (Mary) fell. Spanish: Tu (María) caiste Mapudungun: Eymi tranimi (Maria) English: The rock fell. Spanish: La piedra cayó Mapudungun: Trani chi kura

More Related