1 / 27

A machine transliteration model based on correspondence between graphemes and phonemes

A machine transliteration model based on correspondence between graphemes and phonemes. Presenter : Chien-Hsing Chen Author: Jong-Hoon Oh Key-Sun choi Hitoshi Isahara. 2007.TALIP (ACM Transactions on Asian Language Information Processing). Outline. Motivation

eccleston
Télécharger la présentation

A machine transliteration model based on correspondence between graphemes and phonemes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A machine transliteration model based on correspondence between graphemes and phonemes Presenter:Chien-Hsing Chen Author: Jong-Hoon Oh Key-Sun choi Hitoshi Isahara 2007.TALIP (ACM Transactions on Asian Language Information Processing)

  2. Outline • Motivation • Objective • Previous work • Method • Experiments • Conclusions • Opinion

  3. data (English) G-based Motivation • deiteo (korean); deta (Jap.) • Machine transliteration (MT) • automatically convert in one language into phonetically equivalent ones in another language • Such as from English to Korean, Japanese, or Chinese • a special case of CLIR, it is useful for query translation … • Graphemes-based • Source G target G • Phonemes-based • Source G source P target G • Hybrid • linear interpolation • dynamically handle source graphemes and phonemes data (English) [`det#] P-based • deiteo (korean); deta (Jap.)

  4. Objective data (English) P-based [`det#] • Correspondence-based • correspondence between source G and P • dynamically handle source G and P based on the contexts • an example: neomycin (G + P) • deiteo (korean); deta (Jap.) data (English) C-based G [`det#] P • deiteo (korean); deta (Jap.)

  5. Previous work- Grapheme-Based 1/4 • G-based transliteration modes are classified into: • statistical translation, decision trees, transliteration network, joint source channels • board (/B AO R D/); b, oa, r, d are PUs 依音節切割 • Ei = epui1, … epuin [1998, 1999] • Ki = kpui1, … kpuin • E=b:oar:d, b:oa:r:d, b:o:a:r:d, • K=b:o:deu, b:o:reu:deu, b:o:a:reu:deu • error PUs, incorrect • PUs for each word, time

  6. Previous work- Grapheme-Based 2/4 • Decision trees [2000; 2001] • English grapheme to Korean grapheme conversion • no consider the phonetic aspect of the transliteration

  7. Previous work- Grapheme-Based 3/4 • network [2000] • Each node is composed of more than one English grapheme and the corresponding Korean graphemes. • Each arc represents a possible link between nodes. • The optimal path is the highest total weight, Viterbi and tree-trellis algorithms ca ka ca ki

  8. Previous work- Grapheme-Based 4/4 • Network [2003] • EN: actinium • Jap: a ku chi ni u mu chunking model z4=um e78:u m Translation model =P(ku3 | ku21,e41) c=2, b=1

  9. Previous work- Phoneme-Based 1/3 • source language word pronunciation target language • Weighted finite-state transducers (WFSTs) • sord sequence • word to English sound • English sound to Japanese sound • Japanese sound to katakana • katakana to OCR • A basic framework for Phoneme-based 0.6

  10. Previous work- Phoneme-Based 2/3 • Two-step procedure • English PUs English phoneme, [statistical translation model] • English phoneme Korean PUs, [EKSCRs standard conversion rule] • Two problems: • error propagation: English PU English phoneme usually error • limitation EKSCRs

  11. Previous work- Phoneme-Based 3/3 • decision trees • Phoneme-based English Korean transliteration • depend on a pronunciation dictionary

  12. Previous work- Hybrid Transliteration 1/1 • Combined through linear interpolation • 0.4 G-based + 0.6 P-based • not consider the dependence between the source graphemes and phonemes in the combining process

  13. data (English) Summary C-based G [`det#] P • G-based • source grapheme target grapheme • P-based • source grapheme source phoneme target grapheme • Correspondence-based • minimize error caused by error propagation by using source grapheme corresponding to a source phoneme • use dynamically source graphemes and source phoneme depending on context, produce effectively • deiteo (korean); deta (Jap.)

  14. C-based find relevant • d find relevant

  15. C-baed

  16. C-based • Producing Pronunciation • The most relevant source phoneme of b, /B/ can be produced by means of the context, fs, fStype, and fp at L1-L3, C0, and R1-R3.

  17. C-based • Producing Target Graphmemes

  18. C-based Maximum Entropy Model 1/2/1/3

  19. C-based Maximum Entropy Model 2/2/1/3

  20. C-based Decision Tree 2/3

  21. C-based Memory-Based Learning 3/3 • k-nearest neighborhood algorithm

  22. Experiments 1/2 P-based G-based C-based

  23. Experiments 2/2

  24. Discuss

  25. Conclusion • The author plans to apply the transliteration model to an English-to-Chinese transliteration.

  26. Conclusion • The author plans to apply the transliteration model to an English-to-Chinese transliteration.

  27. Opinion • Advantage • Combine Grapheme and Phoneme • Drawback • lack dynamic alignment • Application • machine translation, CLIR, IR, NLP applications

More Related