1 / 43

Fall 2005 Lecture Notes #9

EECS 595 / LING 541 / SI 661. Natural Language Processing. Fall 2005 Lecture Notes #9. Machine Translation. Example (from the Hansards corpus). English

ianna
Télécharger la présentation

Fall 2005 Lecture Notes #9

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EECS 595 / LING 541 / SI 661 Natural Language Processing Fall 2005 Lecture Notes #9

  2. Machine Translation

  3. Example (from the Hansards corpus) • English • <s id=960001> I would like the government and the Postmaster General to agree that we place the union and the Postmaster General under trusteeship so that we can look at his books and records, including those of his management people and all the memos he has received from them, some of which must have shocked him rigid. • <s id=960002> If the minister would like to propose that, I for one would be prepared to support him. • French • <s id=960001> Je voudrais que le gouvernement et le ministre des Postes conviennent de placer le syndicat et le ministre des Postes sous tutelle afin que nous puissions examiner ses livres et ses dossiers, y compris ceux de ses collaborateurs, et tous les mémoires qu'il a reçus d'eux, dont certains l'ont sidéré. • <s id=960002> Si le ministre voulait proposer cela, je serais pour ma part disposé à l'appuyer.

  4. Example • These lies are like their father that begets them; gross as a mountain, open, palpable(Henry IV, Part 1, act 2, scene 2)

  5. Language similarities and differences • Word order (SVO: English, Mandarin, VSO: Irish, Classical Arabic, SOV: Hindi, Japanese) • Prepositions (Jap.) (to Mariko, Mariko-ni) • Lexical distinctions (Sp.): • the bottle floated out • la botella salió flotando • Brother (Jap.) = otooto (younger), oniisan (older) • They (Fr.) = elles (feminine), ils (masculine)

  6. OUTPUT1 OUTPUT2 OUTPUT2 OUTPUT2 OUTPUT3 Why is Machine Translation Hard? • Analysis • Transfer/interlingua • Generation INPUT

  7. Basic Strategies of MT • Direct Approach • 50’s,60’s • naïve • Indirect: Interlingua • No looking back • Language-neutral • No influence on the target language • Indirect: Transfer • Preferred I E F

  8. Levels of Linguistic Processing • Phonology • Orthography • Morphology (inflectional, derivational) • Syntax (e.g., agreement) • Semantics (e.g., concrete vs. abstract terms) • Discourse (e.g., use of pronouns) • Pragmatics (world knowledge)

  9. Category Ambiguity • Morphological ambiguity (“Wachtraum”) • Part-of-speech (category) ambiguity (e.g. “round”) • Some help comes from morphology (“rounding”) • Using syntax, some ambiguities disappear (context dictates category)

  10. Homography and Polysemy • Homographs: (“light”, “club”, “bank”) • Polysemous words: (“channel”, “crane”) • for different categories - syntax • for same category - semantics

  11. Structural Ambiguity • Humans can have multiple interpretations (parses) for the same sentence • Example: prepositional phrase attachment • Use context to disambiguate • For machine translation, context can be hard to define

  12. Use of Linguistic Knowledge • Subcategorization frames • Semantic features (is an object “readable”?)

  13. Contextual Knowledge • In practice, very few sentences are truly ambiguous • Context makes sense for humans (“telescope” example), not for machines • no clear definition of context

  14. Other Strategies • Pick most natural interpretation • Ask the author • Make a guess • Hope for a free ride • Direct transfer

  15. Anaphora Resolution • Use of pronouns (“it”, “him”, “himself”, “her”) • Definite anaphora (“the young man”) • Antecedents • Same problems as for ambiguity resolution • Similar solutions (e.g., subcategorization)

  16. The Noisy Channel Model • Source-channel model of communication • Parametric probabilistic models of language and translation • Training such models

  17. Statistics • Given f, guess e f e e’ E  F F  E encoder decoder e’ = argmax P(e|f) = argmax P(f|e) P(e) e e translation model language model

  18. Parametric probabilistic models • Language model (LM) • Deleted interpolation • Translation model (TM) P(e) = P(e1, e2, …, eL) = P(e1) P(e2|e1) … P(eL|e1 … eL-1) P(eL|e1 … eK-1)  P(eL|eL-2, eL-1) Alignment: P(f,a|e)

  19. English and Cebuano In the beginning God created the heaven and the earth. Sa sinugdan gibuhat sa Dios ang mga langit ug ang yuta. AndGodcalled the firmament Heaven. Ug gihinganlan sa Dios ang hawan nga Langit. AndGodcalled the dry land Earth Ug ang mamala nga dapit gihinganlan sa Dios nga Yuta use: co-occurrence, word order, cognates corpora are needed sentence alignment needs to be done first Statistical MT

  20. Statistical MT Translate from French: “une fleur rouge”?

  21. Issues to deal with • word order: • I like to drink coffee • watashi wa kohii o nomu no ga suki desu • I-subj coffee-obj drink-dat-rheme like • vocabulary: • wall • pared, muro • phrases: • play • pièce de théâtre

  22. MT/noisy channel models • Text-to-text (summ), also text-to-signal, speech recognition, OCR, spelling correction • P(text|pixels) = P(text) P(pixels|text)

  23. IBM’s EM trained models (1-5) • Word translation • Local alignment • Fertilities • Class-based alignment • Non-deficient algorithm (avoid overlaps, overflow)

  24. Steps • Tokenization • Sentence alignment (1-1, 2-2, 2-1 mappings) • Church and Gale (based on sentence length) • Church (sequences of 4-grams) – based on cognates • Melamed (longest common subsequence of words) – also cognates

  25. Model 1 • Alignments • La maison bleue • The blue house • Alignments: {1,2,3}, {1,3,2}, {1,3,3}, {1,1,1} • All are equally likely • Conditional probabilities • P(f|A,e) = ?

  26. Model 1 (cont’d) • Algorithm • Pick length of translation • Choose an alignment • Pick the French words • That gives you P(f,A|e) • We need P(f|A,e) • Use EM (expectation-maximisation) to find the hidden variables • (see Kevin Knight’s tutorial)

  27. Model 1 • We need p(f|e) but we don’t know the word alignments (which are assumed to be equally likely)

  28. Model 2 • Distortion parameters D(i|j,l,m) • i and j are words in the two sentences • l and m are the lengths of these sentences.

  29. Model 3 • Fertility • P(i|e) • Examples • (a) play = pièce de théâtre • (to) place = mettre en place • p1 is an extra parameter that defines 0

  30. Current work • Handling phrases • Using syntax • In the model • In discriminative reranking • Low density languages

  31. Evaluation • Human judgements: adequacy, grammaticality • Automatic methods • BLEU • ROUGE

  32. When does MT work? • Machine-Aided Translation (MAT) • Restricted Domains (e.g., technical manuals) • Restricted Languages (sublanguages) • To give the reader an idea of what the text is about

  33. Dialogueand conversational agents REMEMBER TO READ THE NEW VERSION OF THIS CHAPTER ON THE WEB!

  34. Abbott • You know, strange as it may seem, they give ball players nowadays very peculiar names...Now, on the Cooperstown team we have Who's on first, What's on second, I Don't Know is on third- • Costello • That's what I want to find out. I want you to tell me the names of the fellows on the Cooperstown team. • Abbott • I'm telling you. Who's on first, What's on second, I Don't Know is on third. • Costello • You know the fellows' names? • Abbott • Yes. • Costello • Well, then, who's playin' first? • Abbott • Yes. • Costello • I mean the fellow's name on first base. • Abbott • Who. • Costello • The fellow's name on first base for Cooperstown. • Abbott • Who. • Costello • The guy on first base. • Abbott • Who is on first base. • Costello • Well, what are you asking me for? • Abbott • I'm not asking you--I'm telling you. Who is on first. • Costello • I'm asking you--who's on first? • Abbott • That's the man's name.

  35. Costello • That's who's name? • Abbott • Yes. • Costello • Well, go ahead, tell me! • Abbott • Who. • Costello • The guy on first. • Abbott • Who. • Costello • The first baseman. • Abbott • Who is on first. • Costello • Have you got a first baseman on first? • Abbott • Certainly. • Costello • Well, all I'm trying to find out is what's the guy's name on first base. • Abbott • Oh, no, no. What is on second base. • Costello • I'm not asking you who's on second.

  36. What makes dialogue different • Turns and utterances (turn-taking) • Turn-taking rules • At each TRP (transition-relevance place): • designated speaker, any speaker, current speaker • Barge-in possible • Significant silence • A: Is there something bothering you or not? (1.0 s) • A: Yes or no? (1.5 s) • A: Eh? • B: No.

  37. Grounding • Common ground between speaker and hearer. • A: … returning on flight 1118 • C: mm hmmm (backchannel, acknowledgment token) • Other continuers: • Continued attention • Relevant next contribution • Acknowledgement (e.g. “sure”) • Demonstration (paraphrasing, reformulating) • Display (repeat verbatim) • Example: • C: I will take the 5 pm flight on the 11th. • A: On the 11th?

  38. Conversational Implicature • Example: • When do you want to travel? • I have a meeting there early in the morning on the 13th. • Implicature: licensed inferences reasonable hearers can make. • Quantity: • Agent: “there are three non-stop flights daily”

  39. Grice’s maxims • Maxim of quantity • make your contribution informative • but not more than needed • Maxim of quality • do not say what you believe is false • do not say that for which you lack evidence • Maxim of relevance • Maxim of manner • avoid ambiguity • avoid obscurity • be brief • be orderly

  40. Dialogue acts • Performative sentences: • I name this ship the Titanic • I second that motion • I bet you five dollars that it will snow tomorrow • Speech acts: • locutionary acts: uttering a sentence with a particular meaning • illocutionary acts: asking, promising, answering… • perlocutionary acts: producing effects upon the feelings, thoughts, or actions of the addressee

  41. Speech acts (cont’d) • Assertives: suggesting, putting forward, swearing, boasting, concluding • Directives: asking, ordering, requesting, inviting, advising, begging • Commissives: promising, planning, vowing, betting, opposing • Expressives: thanking, apologizing, welcoming, deploring • Declarations: I resign, you’re fired.

  42. Automatic interpretation of dialogue acts • DAMSL - Dialogue Act Markup in Several Layers • Agreement (Accept, Maybe, Reject-Part, Hold) • Answer • Understanding (Signal-not-understood, Signal-understood, ack, repeat-rephrase, completion)

  43. Techniques for DA recognition • Plan theoretic (agents, assumptions, goals) • Cue-based (“please”, “are you?”, rising pitch, stress - agreement vs. backchannel) • Statistical approaches

More Related