50 likes | 73 Vues
Explore the impact of cognate identification on French-English word alignments, utilizing phrases for conceptual mappings. Develop metrics for cognate matching and optimize alignments.
E N D
Bilingual Alignment Models:Cognates and Phrases Andrea Burbank Dinkar Gupta Spring 2006
French/English word alignments • Cognates • French and English share many words with common roots • Can identifying cognate pairs improve alignments? • What about a distribution based on word lengths? • Phrases capture conceptual mappings • Overcome language specific syntax and constructs • Examples: “pommes frites” “French fries”, “à demain” “see you tomorrow”, “ne veux jamais” “never wants” • Aligned phrases need not be long - 3 or 4 words • Concepts arrangement same in French and English
Cognate Identification • Identifying clear cognate matches can help create benchmarks for alignment • Different cognate matching metrics: • match the first four letters (e.g. suggère, suggests) • count shared bigrams (Dice coefficient) • e.g. unité, unity = un + ni + it =3/4 • longest common subsequence ratio (LCSR) • e.g. couleur, color = c-o-l-r = 4/max(le, lf) = 4/7 • count sharedletters and normalize by length • e.g. chat, cat = cat/chat + cat/cat = (3/4+3/3)/2e–(4-3) • Incorporating cognates: add pairs to training set • Word-length distributions: EM algorithm • P(lf | le) iteratively calculated in Model 1
Good Mappings “la Bourse the Toronto” “The Toronto Stock Exchange” “les actes criminels” “crimes of violence” “excusez-nous si” “excuse us if we” “serait que” “that it would” “profiter de le occasion” “take this opportunity to” Extraneous “la vision que” “vision that” Bad Mappings “les pays” “the country to” “le gouffre financière” “cheered on by” Good: “pouvons travailler” “can work” “can work together” “can work together within” “can all work” “can all work together” “we can all work” Bad: “les administrations” “of the GDP over” “percent of the GDP over” “4.22 percent of” “to 4.22 percent of” “of the GDP” “4.22 percent of the” Phrase mappings
Results: significant improvements! Model 1 trained on the test set Model 1 with and without cognates words only with phrases