1 / 83

Dragomir Radev Wroc ł aw, Poland July 29, 2009

Computational Linguistics. Dragomir Radev Wroc ł aw, Poland July 29, 2009. Example (from a famous movie). Dave Bowman: Open the pod bay doors, HAL. HAL: I’m sorry Dave. I’m afraid I can’t do that. Instructor.

malise
Télécharger la présentation

Dragomir Radev Wroc ł aw, Poland July 29, 2009

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Linguistics Dragomir Radev Wrocław, Poland July 29, 2009

  2. Example (from a famous movie) Dave Bowman: Open the pod bay doors, HAL. HAL: I’m sorry Dave. I’m afraid I can’t do that.

  3. Instructor • Dragomir Radev, Professor, Computer Science and Information, Linguistics, University of Michigan • radev@umich.edu

  4. Natural Language Understanding • … about teaching computers to make sense of naturally occurring text. • … involves programming, linguistics, artificial intelligence, etc. • …includes machine translation, question answering, dialogue systems, database access, information extraction, game playing, etc.

  5. Example • How many different interpretations does the above sentence have? How many of them are reasonable/grammatical? I saw her fall

  6. Silly sentences • Children make delicious snacks • Stolen painting found by tree • I saw the Grand Canyon flying to New York • Court to try shooting defendant • Ban on nude dancing on Governor’s desk • Red tape holds up new bridges • Iraqi head seeks arms • Blair wins on budget, more lies ahead • Local high school dropouts cut in half • Hospitals are sued by seven foot doctors • In America a woman has a baby every 15 minutes. How does she do that?

  7. Types of ambiguity • Morphological: Joe is quite impossible. Joe is quite important. • Phonetic: Joe’s finger got number. • Part of speech: Joe won the first round. • Syntactic: Call Joe a taxi. • Pp attachment: Joe ate pizza with a fork. Joe ate pizza with meatballs. Joe ate pizza with Mike. Joe ate pizza with pleasure. • Sense: Joe took the bar exam. • Modality: Joe may win the lottery. • Subjectivity: Joe believes that stocks will rise. • Scoping: Joe likes ripe apples and pears. • Negation: Joe likes his pizza with no cheese and tomatoes. • Referential: Joe yelled at Mike. He had broken the bike. Joe yelled at Mike. He was angry at him. • Reflexive: John bought him a present. John bought himself a present. • Ellipsis and parallelism: Joe gave Mike a beer and Jeremy a glass of wine. • Metonymy: Boston called and left a message for Joe.

  8. NLP • Information extraction • Named entity recognition • Trend analysis • Subjectivity analysis • Text classification • Anaphora resolution, alias resolution • Cross-document crossreference • Parsing • Semantic analysis • Word sense disambiguation • Word clustering • Question answering • Summarization • Document retrieval (filtering, routing) • Structured text (relational tables) • Paraphrasing and paraphrasing/entailment ID • Text generation • Machine translation

  9. Syntactic categories • Substitution test: { } black Persian tabby smalleasy to raise Nathalie likes cats. • Open (lexical) and closed (functional) categories: the in No-fly-zone yadda yadda yadda

  10. Jabberwocky (Lewis Carroll) Twas brillig, and the slithy tovesDid gyre and gimble in the wabe:All mimsy were the borogoves,And the mome raths outgrabe."Beware the Jabberwock, my son!The jaws that bite, the claws that catch!Beware the Jubjub bird, and shunThe frumious Bandersnatch!"

  11. Phrase structure S NP VP That man VBD PP NP butterfly NP the IN caught a net with

  12. Sample phrase-structure grammar S  NP VPNP  AT NNSNP  AT NNNP  NP PPVP  VP PPVP  VBDVP  VBD NPP  IN NP AT  theNNS  childrenNNS  studentsNNS  mountainsVBD  sleptVBD  ateVBD  sawIN  inIN  ofNN  cake

  13. Phrase structure grammars • Local dependencies • Non-local dependencies • Subject-verb agreement The women who found the wallet were given a reward. • wh-extraction Should Peter buy a book? Which book should Peter buy? • Empty nodes

  14. Subcategorization Subject: The children eat candy.Object: The children eat candy.Prepositional phrase: She put the book on the table.Predicative adjective: We made the man angry.Bare infinitive: She helped me walk.To-infinitive: She likes to walk.Participial phrase: She stopped singing that tune at the end.That-clause: She thinks that it will rain tomorrow.Question-form clauses: She asked me what book I was reading.

  15. Phrase structure ambiguity • Grammars are used for generating and parsing sentences • Parses • Syntactic ambiguity • Attachment ambiguity: Our company is training workers. • The children ate the cake with a spoon. • High vs. low attachment • Garden path sentences: The horse raced past the barn fell. Is the book on the table red?

  16. Sentence-level constructions • Declarative vs. imperative sentences • Imperative sentences: S VP • Yes-no questions: S  Aux NP VP • Wh-type questions: S  Wh-NP VP • Fronting (less frequent):On Tuesday, I would like to fly to San Diego

  17. Semantics and pragmatics • Lexical semantics and compositional semantics • Hypernyms, hyponyms, antonyms, meronyms and holonyms (part-whole relationship, tire is a meronym of car), synonyms, homonyms • Senses of words, polysemous words • Homophony (bass). • Collocations: white hair, white wine • Idioms: to kick the bucket

  18. Discourse analysis • Anaphoric relations: 1. Mary helped Peter get out of the car. He thanked her.2. Mary helped the other passenger out of the car. The man had asked her for help because of his foot injury. • Information extraction problems (entity crossreferencing) Hurricane Hugo destroyed 20,000 Florida homes.At an estimated cost of one billion dollars, the disasterhas been the most costly in the state’s history.

  19. Pragmatics • The study of how knowledge about the world and language conventions interact with literal meaning. • Speech acts • Research issues: resolution of anaphoric relations, modeling of speech acts in dialogues

  20. Coordination • Coordinate noun phrases: • NP NP and NP • S  S and S • Similar for VP, etc.

  21. Agreement • Examples: • Do any flights stop in Chicago? • Do I get dinner on this flight? • Does Delta fly from Atlanta to Boston? • What flights leave in the morning? • * What flight leave in the morning? • Rules: • S  Aux NP VP • S  3sgAux 3sgNP VP • S  Non3sgAux Non3sgNP VP • 3sgAux  does | has | can … • non3sgAux  do | have | can …

  22. Agreement • We now need similar rules for pronouns, also for number agreement, etc. • 3SgNP  (Det) (Card) (Ord) (Quant) (AP) SgNominal • Non3SgNP  (Det) (Card) (Ord) (Quant) (AP) PlNominal • SgNominal  SgNoun | SgNoun SgNoun • etc.

  23. Combinatorial explosion • What other phenomena will cause the grammar to expand? • Solution: parameterization with feature structures (see Chapter 11)

  24. Parsing as search

  25. Parsing as search Book that flight. S Two types of constraints on the parses: a) some that come from the input string,b) others that come from the grammar VP NP Nom Verb Det Noun Book that flight

  26. S NP VP Top-down parsing S S S Aux NP VP VP S S S S S S NP VP NP VP Aux NP VP Aux NP VP VP VP V Det Nom PropN Det Nom PropN V NP

  27. Bottom-up parsing Book that flight Noun Det Noun Verb Det Noun Book that flight Book that flight NOM NOM NOM Noun Det Noun Verb Det Noun Book that flight Book that flight NP NP NOM NOM VP NOM NOM Noun Det Noun Verb Det Noun Verb Det Noun Book that flight Book that flight Book that flight VP VP NP NP NOM NOM Verb Det Noun Verb Det Noun Book that flight Book that flight

  28. Grammatical Relationsand Free Ordering of Subject and Object OSV - Кого же Вася увидел?. - Машу Вася увидел. - (Actually,) who did Vasya see? - Vasya saw Masha • SVO • Кого увидел Вася? • Вася увидел Машу. • - Who did Vasya see? • - Vasya saw Masha. VSO - Увидел Вася кого? - Увидел Вася Машу . - Who did Vasya see? - Vasya saw Masha • VOS • Увидел Машу кто? • Увидел Машу Вася. • Who saw Masha, at the end? • It was Vasya who saw Masha • OVS • - Да кого увидел Вася? • Машу увидел Вася • Well, whom did Vasya see? • - It was Masha whom Vasya saw. SOV - Кого же Вася увидел? - Вася Машу увидел - Who did Vasya see? - Vasya saw Masha Slide from Lori Levin, originally by Leonid Iomdin

  29. Features and unification • Grammatical categories have properties • Constraint-based formalisms • Example: this flights: agreement is difficult to handle at the level of grammatical categories • Example: many water: count/mass nouns • Sample rule that takes into account features: S  NP VP (but only if the number of the NP is equal to the number of the VP)

  30. Feature structures CAT NP NUMBER SINGULAR PERSON 3 CAT NP AGREEMENT NUMBER SG PERSON 3 Feature paths: {x agreement number}

  31. Unification [NUMBER SG] [NUMBER SG] + [NUMBER SG] [NUMBER PL] - [NUMBER SG] [NUMBER []] = [NUMBER SG] [NUMBER SG] [PERSON 3] = ?

  32. Agreement • S  NP VP{NP AGREEMENT} = {VP AGREEMENT} • Does this flight serve breakfast? • Do these flights serve breakfast? • S  Aux NP VP{Aux AGREEMENT} = {NP AGREEMENT}

  33. Agreement • These flights • This flight • NP  Det Nominal{Det AGREEMENT} = {Nominal AGREEMENT} • Verb  serve{Verb AGREEMENT NUMBER} = PL • Verb  serves{Verb AGREEMENT NUMBER} = SG

  34. Subcategorization • VP  Verb{VP HEAD} = {Verb HEAD}{VP HEAD SUBCAT} = INTRANS • VP  Verb NP{VP HEAD} = {Verb HEAD}{VP HEAD SUBCAT} = TRANS • VP  Verb NP NP{VP HEAD} = {Verb HEAD}{VP HEAD SUBCAT} = DITRANS

  35. Eliza [Weizenbaum, 1966] User: Men are all alike ELIZA: IN WHAT WAY User: They’re always bugging us about something or other ELIZA: CAN YOU THINK OF A SPECIFIC EXAMPLE? User: Well, my boyfriend made me come here ELIZA: YOUR BOYFRIEND MADE YOU COME HERE User: He says I’m depressed much of the time ELIZA: I AM SORRY TO HEAR THAT YOU ARE DEPRESSED

  36. Eliza-style regular expressions s/.* YOU ARE (depressed|sad) .*/I AM SORRY TO HEAR YOU ARE \1/ s/.* YOU ARE (depressed|sad) .*/WHY DO YOU THINK YOU ARE \1/ s/.* all .*/IN WHAT WAY/ s/.* always .*/CAN YOU THINK OF A SPECIFIC EXAMPLE/ Step 1: replace first person references with second person referencesStep 2: use additional regular expressions to generate replies Step 3: use scores to rank possible transformations

  37. Finite-state automata • Finite-state automata (FSA) • Regular languages • Regular expressions

  38. Finite-state automata (machines) baa! baaa! baaaa! baaaaa! ... baa+! a b a a ! q0 q1 q2 q3 q4 finalstate state transition

  39. Input tape q0 a b a ! b

  40. Finite-state automata • Q: a finite set of N states q0, q1, … qN • : a finite input alphabet of symbols • q0: the start state • F: the set of final states • (q,i): transition function

  41. State-transition tables

  42. Morphemes • Stems, affixes • Affixes: prefixes, suffixes, infixes: hingi (borrow) – humingi (agent) in Tagalog, circumfixes: sagen – gesagt in German • Concatenative morphology • Templatic morphology (Semitic languages) : lmd (learn), lamad (he studied), limed (he taught), lumad (he was taught)

  43. Morphological analysis • rewrites • unbelievably

  44. Inflectional morphology • Tense, number, person, mood, aspect • Five verb forms in English • 40+ forms in French • Six cases in Russian, seven in Polish • Up to 40,000 forms in Turkish (you will cause X to cause Y to … do Z)

  45. Derivational morphology • Nominalization: computerization, appointee, killer, fuzziness • Formation of adjectives: computational, embraceable, clueless

  46. Finite-state morphological parsing • Cats: cat +N +PL • Cat: cat +N +SG • Cities: city +N +PL • Geese: goose +N +PL • Ducks: (duck +N +PL) or (duck +V +3SG) • Merging: +V +PRES-PART • Caught: (catch +V +PAST-PART) or (catch +V +PAST)

  47. Phonetic symbols • IPA • Arpabet • Examples

  48. Using WFST for language modeling • Phonetic representation • Part-of-speech tagging

More Related