1 / 37

Semitic Linguistic Phenomena and Variations

MT Summit IX Workshop Machine Translation for Semitic Languages. Semitic Linguistic Phenomena and Variations. Nizar Habash University of Maryland Institute for Advanced Computer Studies. Road Map. Introduction Orthography Morphology Syntax Translation Divergences Conclusion.

aspasia
Télécharger la présentation

Semitic Linguistic Phenomena and Variations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MT Summit IX Workshop Machine Translation for Semitic Languages Semitic Linguistic Phenomena and Variations Nizar Habash University of MarylandInstitute for Advanced Computer Studies

  2. Road Map • Introduction • Orthography • Morphology • Syntax • Translation Divergences • Conclusion

  3. Introduction • What this talk is about • Similarities that define “the Semitic family” • Variations differentiating members within the family • Similarities do not go beyond morphology and syntax • Relevance to NLP and MT • Most researchers focus on one Semitic language • Modern Standard Arabic (henceforth, A) • Modern Hebrew (henceforth, H) • Arabic Dialect: Palestinian Arabic (henceforth, P)

  4. Road Map • Introduction • Orthography • Phonology • Scripts • Spelling • Ambiguity • Morphology • Syntax • Translation Divergences • Conclusion

  5. Orthography: Phonology

  6. Orthography: Script • Alphabets • Graphemic Variants • ك ككك (27 out of 36), כ ך (5 out of 22) • Encoding issues • Optional diacritics • Some Vowels שַ שֵ سَ سُ • Lack of vowel שְ سْ • Consonantal Doubling שּ سّ

  7. Orthography: Spelling • Mostly consonantal Spelling • سلام = slam = salām, שלום = ʃlvm = ʃalom • Dual use of (a w/v jاويא ו י) as consonant and vowel • Diacritics as semantic markers • זכָר(zaxar male)זכַר(zaxar to remember) • كتب (kataba to write)كُتب(kutiba to be written)

  8. Orthography: Spelling • Hebrew • Full Spelling, “Defective” Spelling (כתיב מלא,כתיב חסר) • kotelכותלכתל (wall) • Arabic • Morphophonemic Spelling • Feminine Marker ة (ta marbuta) • كبير (kabīr big ♂) كبيرة (kabīra big ♀) • Derivation Marker • hawa (to love هوى) (air هوا) • Hamza Variants (6 characters for one phoneme) • (ء أآإؤئ) بهاء بهاؤه بهائه

  9. Orthography: Ambiguity A ى ئ ؤ إ آ أ ء ي و ه ن م ل ك ق ف غ ع ظ ط ض ص ش س ز ر ذ د خ ح ج ث ة ت ب ا ī j ū w h n m l k q f ʁ ʕ ḍ̄ ̣ ṭ ḍ ṣ ʃ s z r đ d x ħ ʤ θ t b ā ʔ ת ש ר ק צ פ ע ס נ מ ל כ י ט ח ז ו ה ד ג ב א ʃ r ts p f s n m l k j e i t x z o u h d g v b a H

  10. Orthography: Ambiguity P ى ئ ؤ إ آ أ ء ي و ه ن م ل ك ق ف غ ع ظ ط ض ص ش س ز ر ذ د خ ح ج ث ة ت ب ا ī j ū w h n m l k q f ʁ ʕ ḍ̄ ṭ ḍ ṣ ʃ s z r đ d x ħ ʤ θ t b ā ʔ ẓ ē ō ת ש ר ק צ פ ע ס נ מ ל כ י ט ח ז ו ה ד ג ב א ʃ r ts p f s n m l k j e i t x z o u h d g v b a H

  11. Road Map • Introduction • Orthography • Morphology • Derivational • Inflectional • Noun Inflections • Verb Inflections • Syntax • Translation Divergences • Conclusion

  12. Morphology: Derivational • Roots and Patterns Meaning = (Root.Meaning+Pattern.Meaning)*Idiosyncrasy.Random ب ت ك ב ת כ K T B ? و ? ? مَ ? ו ? ? כתוב مكتوب

  13. Morphology: Root Meaning • KTB: writing “stuff” كتاب book write كتب כתב כתיב spelling مكتبة library letter מכתב مكتوب כתובת address مكتب office writer كاتب כתב

  14. Morphology: Root Meaning • LHM-1 لحم laHm לחם lexem

  15. Morphology: Root Meaning • LHM-2 (battle sense) • ملحمة • Fierce battle, massacre, epic • מלחמה לוחמה לחם לוחם לחימה • War, battle, quarrel, conflict, combat, warfare, belligerence, fighting, quarreling, fighter, militarism, militancy, bellicosity

  16. Morphology: Root Meaning • LHM-3 (Solder sense) • لحم تلاحم التحم ملتحم لُحمة • Weld, solder, get stuck, cling together, merged, fused, kinship • לחם הלחים מולחם מלחם • Solder, soldered, soldering iron,

  17. Morphology: Root Meaning • LHM-4 (Conjuctiva sense) • لحمية • conjunctiva • לחמית • conjunctiva

  18. Morphology: Noun Inflections conj • وكبيوتنا • نا + بيوت + ك + و • And-like-houses-our • And like our houses • שבבית • ש+ב+ה+בית • That-in-the-house • Which is in the house • Arabic Broken Plurals • Hebrew Ambiguous definiteness prep noun article plural poss

  19. Morphology: Verb Inflections A: وسنكتبها And-will-we-write-it And we will write it H: ואהבתיה And-loved-I-her And I loved her P: وماحتستعمليلوش And-not-will-use-you-for-it-not And you will not use for it conj verb tense neg subj IOBJ object

  20. Morphology: Verb Inflections • Perfect Verb Derivation (Suffixes only) • Imperfect Verb Derivation (Prefix+Suffix)

  21. Morphology: Semantics of Verb Inflections

  22. Road Map • Introduction • Orthography • Morphology • Syntax • Sentence Structure • Noun Phrase Structure • Translation Divergences • Conclusion

  23. Sentence Structure • Sentence structure • Copular sentences • Verbal sentences • Copular sentences • Topic  Complement • Definite  Indefinite • الكلب كبيرהכלב גדול • The-dog big * topic comp كلب كبير dog big

  24. Sentence Structure • Verbal sentences • The children wrote the poems • A: Verb Subject Object • كتب الاولاد الاشعار • Wrote the-children the-poems • H, P: Subject Verb Object • הילדים כתבו את השירים • The-children wroteobj the-poems • الاولاد كتبو الاشعار • The-children wrote the-poems

  25. Noun Phrase • Noun  Adjective • Noun-Adjective Agreement • number, gender, definiteness

  26. Noun Phrase • اضافة / סמיכות (idafa/smixut) • Noun1 of Noun2 encoded structurally • Noun1-indefinite  Noun2-definite • ملك الاردن מלך ירדן • king Jordan = the king of Jordan / Jordan’s king • Noun1 Form Change • Feminine (H and P) • ירדן + מלכה מלכת ירדן Queen of Jordan • Plural (A and H) • ירדן + מלכים ירדןמלכי Kings of Jordan • Alternatives (only H and P) • Noun1 <particle> Noun2 • الملك تبع الاردن the-king belonging-to Jordan • המלך של ירדן the-king that-for Jordan

  27. Road Map • Introduction • Orthography • Morphology • Syntax • Translation Divergences • Conclusion

  28. Translation Divergences • Variations beyond syntax • How languages map semantics to syntax • As complex and diverse as any other language • Divergence Dimensions • Categorial Variation (develop  development) • Conflation (become frozen  freeze) • Inflation (freeze  become frozen) • Structural (enter the room  enter into the room) • Head Swap (swim across  cross swimming) • Thematic (John likes Mary  Mary pleases John)

  29. Translation Divergencesconflation * have יש عند كلب I dog כלב ל انا אני عندي كلب at-me dog I have a dog ישלי כלב therefor-me dog

  30. Translation Divergences conflation ليس be * ا نا هنا I not here אני לא פה لست هنا I-am-not here I am not here לא פהאני I not here

  31. Translation Divergencesthematic * be * ا نا بردان I cold קר ל אני انا بردان I cold I am cold קר לי cold for-me

  32. Translation Divergencesstructural عثر find מצא انا على I man אני את رجل איש عثرت على الرجل found-I upon the-man I found the man מצאתי את האיש found-I obj the-man

  33. Translation Divergences structural عثر find لقى انا على I man انا رجال رجل عثرت على الرجل found-I upon the-man I found the man لقيت الرجال found-I the-man

  34. اسرع انا عبور سباحة swim نهر I across quickly river Translation Divergenceshead swap and categorial I swam across the river quickly اسرعت عبور النهر سباحة I-sped crossing the-river swimming

  35. חצה swim אני את ב ב I across quickly נהר שחיה מהירות river Translation Divergenceshead swap and categorial חציתי את הנהר בשחיה במהירות I-crossed obj river in-swim speedily I swam across the river quickly

  36. اسرع חצה انا عبور سباحة swim אני את ב ב نهر I across quickly נהר שחיה מהירות river Translation Divergences head swap and categorial verb verb noun noun verb noun noun prep adverb

  37. Conclusion • Many defining features of the Semitic family • Orthographic conventions, morphological derivation and inflection, phrase structure, etc • Many variations that create different kinds of ambiguities and problems • Phonology of orthography, Semantics of derivation and inflection • Do similarities extend beyond morphology and syntax? • Translation divergences within Semitic family • Ambiguity preservation between Semitic languages

More Related