1 / 63

A Hybrid Machine Translation System from Turkish to English

A Hybrid Machine Translation System from Turkish to English. Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer. Introduction. Goal: Create a machine translation system that translates Turkish text into English text Turkish has an agglutinative morphology ev+im+de+ki+ne

rendor
Télécharger la présentation

A Hybrid Machine Translation System from Turkish to English

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Hybrid Machine Translation System from Turkish to English Ferhan Türe MSc Thesis, Sabancı University Advisor: Kemal Oflazer

  2. Introduction • Goal: Create a machine translation system that translates Turkish text into English text • Turkish has an agglutinative morphology • ev+im+de+ki+ne • to the one at my home • Turkish has free word order • Ben eve gittim, Eve gittim ben, Gittim ben eve, ... • I went to the house • Idea Write rules to translate analyzed Turkish sentence into English

  3. Outline • Machine Translation (MT) • Motivation • Challenges in MT • History of MT • Classical Approaches to MT • The Hybrid Approach • Challenges • Translation Steps • Analysis and Preprocessing • Transfer and Generation • Decoding • Evaluation • Methods • Experimental Results • Examples • Conclusions

  4. Machine Translation Translation • Given: Input text s in source language S • Find: A well-formed text in target language T that is equivalent to s Machine Translation (MT) • Any system using an electronic computer to perform translation

  5. Motivation • Satisfy increasing demand for translation • 100 languages with 5 million or more native speakers • Reduce the cost and effort of human translation • 13% of EU budget • weeks vs. minutes • Make information available to more people in less time • translation of web sites automatically • Exploring limits to computers’ ability and linguistic challenges

  6. Challenges in MT • Morphological issues • Each language has a different morphology • Syntactical issues • Word order in sentences and noun phrases • Language-specific features (narrative past tense in Turkish, distinguishing feminine and masculine nouns) • Semantical issues • Word sense ambiguities • bank geographical term OR financial institution? • Idiomatic phrases • kafa çekmek pull head OR drink alcohol?

  7. History of MT • Idea by Warren Weaver in 1945 • 1950s: Russian-English MT research during cold war between US and USSR • 1960s: Funding for research stopped due to failure • Mid-1970s • METÉO: English-French MT in Canada • Systran and Eurotra: Multi-lingual MT in Europe • TITRAN and MU Project in Kyoto University, Japan • After 90s • Statistical MT: Use statistics and large amount of data

  8. MT between English and Turkish • Morphological analyzer • Oflazer, 1993. • Morphological disambiguator • Oflazer & Kuruöz, 1994. • Hakkani-Tür et al., 2000. • Yuret & Türe, 2006. • English-to-Turkish MT • Sagay, 1981. • Hakkani et al., 1998. • Keyder Turhan, 1997. • No Turkish-to-English system

  9. Classical Approaches to MT

  10. Vauquois Triangle Interlingua Semantic level Transfer Analysis Generation Syntactic level Lexical level

  11. Word-by-word Translation Source sentence Bilingual Dictionary Target sentence Source sentence: Ali evdeki kediyi çok sevmez Translation: Ali home cat very like Reference: Ali does not like the cat at home very much

  12. Direct Translation Source sentence Morphological Analyzer Lexical Transfer Local Reordering Target sentence Source: Ali evde -ki kediyi çok sevmez Analysis: Ali ev+LocRel+Adjkedi+Accçok+Advsev+Neg+Present Lexical: Ali home+Locat+Adjcat+Accvery much+Adv like+Neg+Present Reorder: Ali at+Adjhome+Loc cat+Acclike+Neg+Present very much+Adv Generate: Ali at home cat not like very much

  13. Transfer-based Translation SL Grammar TL Grammar Transfer rules / Dictionary Source sentence SL Representation TL Representation Target sentence

  14. Transfer-based Translation SL Grammar TL Grammar Transfer rules / Dictionary Source sentence SL Representation TL Representation Target sentence NP NP mavi evin duvarı the wall of the blue house NP PP NP NP N duvar+ı Det the NP Prep of NP AP NP N wall N ev+in A mavi Det the NP AP N house A blue

  15. Interlingual Translation Source sentence Target sentence Analysis Interlingua Generation • Source: Ali evdeki kediyi çok sevmez • Interlingua: ¬holds(in_general, • like(subj: Ali, • obj: cat(at: home), • degree: very much)) • Translation: Ali does not like the cat at home very much

  16. Statistical MT Given a Turkish sentence t, find the English sentence e that is the “most likely” translation of t

  17. Statistical MT Turkish-English aligned text English text whether an English text e is well-formed English or not whether an English text e is a good translation of a Turkish text t Translation Model P(t|e) Language Model P(e) Decoding argmax P(e) * P(t|e) e

  18. Statistical MT Ali çok açtı Ali was so hungry

  19. Outline • Machine Translation (MT) • Motivation • Challenges in MT • History of MT • Classical Approaches to MT • The Hybrid Approach • Challenges • Translation Steps • Analysis and Preprocessing • Transfer and Generation • Decoding • Evaluation • Methods • Experimental Results • Examples • Conclusions

  20. The Hybrid Approach

  21. Why Hybrid? Classical transfer-based approaches are good at • representing the structural differences between the source and target languages. and statistical methods are good at • extracting knowledge from large amounts of data, about how well-formed a sentence or how “meaningful” a translation is.

  22. Challenges Morphological differences Avrupalılaştıramadıklarımızdanmışsınız Youwereamongthe ones whowewerenotableto causetobecomeEuropean • Extreme case of a word in an agglutinative language • Each Turkish morpheme corresponds to one or more words in English

  23. Challenges Morphological differences arkadaşımdakiler the ones atmyfriend

  24. Challenges Structural differences dinle+miş+sin  (someone told me that) you listened dinle+di+n  you listened dinle+t+ti+n  you made (someone) listen dinle+t+tir+di+n  you had (someone) make (someone) listen dinle+r+im  I listen dinle+r+di+m  I used to listen dinle+t+ebil+ir+miş+im  ???

  25. Challenges Structural differences Adam evde kitap okuyordu  The man was reading a book at home SUBJ ADJCT OBJ V SUBJ V OBJ ADJCT mavi kitap  blue book AP NP AP NP evdeki kitap  the book at home AP NP NP AP kitabımın kapağı  my book’s cover NP1 NP2 NP1 NP2 arkadaşımın yüzünden  because of my friend NP1 NP2 NP2 NP1

  26. Challenges Ambiguities • koyun • sheep (or bosom) • your bay • your dark (one) • of the bay • put!

  27. Challenges Ambiguities • silahını evine koy • put your gun to your home • put your gun to his home • put his gun to your home • put his gun to his home • put your gun to her home • put her gun to your home • put her gun to her home • . • .

  28. Challenges Ambiguities • kitabın kapağı • the book’s cover • book’s cover • the cover of the book

  29. Challenges Ambiguities ev+Dative (gitti)  (went) to the house masa+Dative (çıktı)  (jumped) on the table adam+Dative(baktı)  (looked) at the man

  30. Challenges Morphological differences --------------------------------------------------------------------------- Structural differences --------------------------------------------------------------------------- Ambiguities Use morphological analysis on Turkish side and generation on English side Transfer rules can represent such transformations An English language model can determine the most probable translation statistically

  31. The Avenue Transfer System • Avenue Project initiated by CMU LTI Group • Grammar formalism, which allows one to manually create a parallel grammar between two languages and • Transfer engine, which transfers the source sentence into possible target sentence(s) using this parallel grammar

  32. Overview of Our Approach Turkish sentence Morphological Analyzer Analysis Preprocessor Lattice Transfer rules Avenue Transfer Engine English translations ... English Language Model Most probable English translation

  33. I. Analysis and Preprocessing Morphological analyses of each word: A set of features, describing the structural properties of the word adam evde oğlunu yendi

  34. I. Analysis and Preprocessing Lattice representation of the sentence ye+V +Pass+V+Past ev+N+Loc ada+N+P1Sg oğul+N+P2Sg 4 0 1 2 3 6 Zero+V+Past yen+N adam+N+PNon oğul+N+P3Sg 5 yen+V+Past

  35. I. Analysis and Preprocessing Representation of IGs

  36. II. Transfer and Generation

  37. II. Transfer and Generation

  38. II. Transfer and Generation N N N V

  39. II. Transfer and Generation N N N N N V V N adam evde oğlunu yendi man won son house

  40. II. Transfer and Generation NP NP N N N N N V V N the adam evde oğlunu yendi man won son house

  41. II. Transfer and Generation SUBJ SUBJ NP NP N N N N N V V N the adam evde oğlunu yendi man won son house

  42. II. Transfer and Generation SUBJ SUBJ NP NP NP NP N N N N N V V N the the adam evde oğlunu yendi man won son house

  43. II. Transfer and Generation SUBJ Adjct SUBJ Adjct NP NP NP NP at N N N N N V V N the the adam evde oğlunu yendi man won son house

  44. II. Transfer and Generation SUBJ Adjct SUBJ Adjct NP NP NP NP NP NP at N N N N N V V N the his the adam evde oğlunu yendi man won son house

  45. II. Transfer and Generation OBJ SUBJ Adjct OBJ SUBJ Adjct NP NP NP NP NP NP at N N N N N V V N the his the adam evde oğlunu yendi man won son house

  46. II. Transfer and Generation OBJ SUBJ Adjct OBJ SUBJ Adjct NP NP NP Vc NP Vc NP NP at N N N N N V V N the his the adam evde oğlunu yendi man won son house

  47. II. Transfer and Generation Vfin Vfin OBJ SUBJ Adjct OBJ SUBJ Adjct NP NP NP Vc NP Vc NP NP at N N N N N V V N the the his the adam evde oğlunu yendi man won son house

  48. II. Transfer and Generation S S Vfin Vfin OBJ SUBJ Adjct OBJ SUBJ Adjct NP NP NP Vc NP Vc NP NP at N N N N N V V N the his the adam evde oğlunu yendi man won son house

  49. II. Transfer and Generation S S Vfin Vfin OBJ Adjct SUBJ Adjct OBJ SUBJ

  50. II. Transfer and Generation Adjunct Adjunct NP NP at {Adjunct,3} Adjunct::Adjunct : [NP] -> ["at" NP] ( (x1::y2) (x0 = x1) ((x1 CASE) =c Loc) ((x1 poss) =c yes) (y0 = x0) )

More Related