1 / 51

Logics for NLP

Logics for NLP. Logic for Artificial Intelligence. Yi Zhou. Content Natural language understanding Production rules Semantic parsing Natural logic Conclusion. Content Natural language understanding Production rules Semantic parsing Natural logic Conclusion. What can We See.

nashm
Télécharger la présentation

Logics for NLP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Logics for NLP LogicforArtificial Intelligence Yi Zhou

  2. Content • Natural language understanding • Production rules • Semantic parsing • Natural logic • Conclusion

  3. Content • Natural language understanding • Production rules • Semantic parsing • Natural logic • Conclusion

  4. What can WeSee WWW2002 The eleventh international world wide web conference Sheraton waikiki hotel, Honolulu, hawaii, USA 7-11 may 2002, 1 location 5 days learn interact Registered participants coming from australia, canada, chile denmark, france, germany, ghana, hong kong,, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire Register now On the 7th May Honolulu will provide the backdrop of the eleventh international world wide web conference. This prestigious event.. Speakers confirmed Tim berners-lee Tim is the well known inventor of the Web, … Ian Foster Ian is the pioneer of the Grid, the next generation internet …

  5. What can a MachineSee WWW2002 The eleventh international world wide web conference Sheraton waikiki hotel Honolulu, hawaii, USA 7-11 may 2002 1 location 5 days learn interact Registered participants coming from australia, canada, chile denmark, france, germany, ghana, hong kong, india, ireland, italy, japan, malta, new zealand, the netherlands, norway, singapore, switzerland, the united kingdom, the united states, vietnam, zaire

  6. Natural Language Processing/Understanding • Fundamental goal: deep understand of broad language • Not just string processing or keyword matching! • End systems that we want to build: • Ambitious: speech recognition, machine translation, question answering… • Modest: spelling correction, text categorization…

  7. NLP Applications • Text Categorization • Spelling & Grammar Corrections • Information Extraction • Speech Recognition • Information Retrieval • Summarization • Machine Translation • Question Answering • Dialog Systems • Chatterbot • Intelligent personal assistant • Automated customer service

  8. NLP/NLU is Hard NLP/NLU is AI complete • Language is ambiguous • Language is flexible • Language is subtle • Language is complex • Language is syntactic (meaning) • Explicit sentence vs implicit knowledge • Massive implicit, uncertain knowledge

  9. Content • Natural language understanding • Production rules • Semantic parsing • Natural logic • Conclusion

  10. Production rules If antecedent then consequent • Wrote symbolic grammar and lexicon • S  NP VP NN  interest • NP  (DT) NN NNS  rates • NP  NN NNS NNS  raises • NP  NNP VBP  interest • VP  V NP VBZ  rates • … • Used proof systems to prove parses from words

  11. Context-free grammars • G = (T, N, S, R) • T is set of terminals • N is set of nonterminals • For NLP, we usually distinguish out a set P  N of preterminals, which always rewrite as terminals • S is the start symbol (one of the nonterminals) • R is rules/productions of the form X  , where X is a nonterminal and  is a sequence of terminals and nonterminals (possibly an empty sequence) • A grammar G generates a language L.

  12. Probabilistic context-free grammars • G = (T, N, S, R, P) • T is set of terminals • N is set of nonterminals • For NLP, we usually distinguish out a set P  N of preterminals, which always rewrite as terminals • S is the start symbol (one of the nonterminals) • R is rules/productions of the form X  , where X is a nonterminal and  is a sequence of terminals and nonterminals (possibly an empty sequence) • P(R) gives the probability of each rule. • A grammar G generates a language model L.

  13. Categorial Grammar • Categories • Primitive Categories : N, NP, S etc. • Man – N • The old man - NP • Functions : Combination of primitive categories, more specifically a function from one category to another. • S/NP • NP/N • (NP\S)/NP

  14. Function Types A simple categorial grammar may have just two function types - • B/A - type of a phrase that results in a phrase of type B when followed (on the right) by a phrase of type A. • A\B - type of a phrase that results in a phrase of type B when preceded (on the left) by a phrase of type A.

  15. Categorial Grammar • English grammar might have three basic types (N, NP and S). Other types can be derived - • Adjective – N/N • Determiner – NP/N • Intransitive verbs – NP\S • Transitive verbs - (NP\S)/NP The bad boy made that mess NP/N N/N N (NP\S)/NP NP/N N

  16. Combinatory Categorial Grammar • Combinatory categorial grammar (CCG) is an efficiently parseable, yet linguistically expressive grammar formalism. • CCG is mildly context sensitive. • Basic categorial grammar uses just forward and backward application combinators. • CCG also includes functional composition and type-raising combinators. • CCG provides incremental derivations (left to right) to the language.

  17. DEFINITION OF CCG A CCG G = (VT , VN, f, S,R) is defined as follows: – VT defines the finite set of all terminals. – VN defines the finite set of all nonterminals. These nonterminals are also called “atomic categories” which can be combined into more complex functional categories by using the backward operator \ or the forward operator /. – The function f maps terminals to sets of categories and corresponds to the first step in bottom-up parsing. – The unique starting symbol is denoted by S – R describes a finite set of combinatory rules

  18. Functional Application • The two basic rules used in Pure Categorial Grammar (AB Calculus) • Forward Application: (>) X/Y Y => X • Backward Application: (<) Y X\Y => X

  19. Functional Application (Example) • Brazil defeated Germany np (s\np)/np np ------------------------------ > s\np ----------------------------------------------- < s • The dog bit John np/n n (s\np)/np np ------------------ > ---------------------- > np s\np ------------------------------------- < s

  20. Functional Composition • Two functional types can compose if the domain of one type corresponds to the range of the other. • Forward Composition: (>B) X/Y Y/Z =>B X/Z • Backward Composition: (<B) Y\Z X\Y =>B X\Z

  21. Functional Composition(Example) • Ram likes football s/(s\np) (s\np)/np np ----------------------------- >B s/np ------------------------------- > s

  22. Type Raising • Type raising combinators take elementary syntactic types (primitive types) to functor types. • Forward Type-Raising: (>T) X =>T T/(T\X) • Backward Type-Raising: (<T) X =>T T\(T/X)

  23. Type Raising (Example) • Ram likes football np (s\np)/np np --------- >T s/(s\np) ----------------------------- >B s/np ------------------------------- > s

  24. Relies on hand-constructed rules that are to be acquired from language specialists requires only small amount of training data development could be very time consuming developers do not need language specialists expertise requires large amount of annotated training data (very large corpora) automated NLP Approaches (1) Rule-based Statistical-based

  25. some changes may be hard to accommodate not easy to obtain high coverage of the linguistic knowledge useful for limited domain Can be used with both well-formed and ill-formed input High quality based on solid linguistic some changes may require re-annotation of the entire training corpus Coverage depends on the training data Not easy to work with ill-formed input as both well-formed and ill-formed are still probable Less quality - does not explicitly deal with syntax NLP Approaches (2) Rule-based Statistical-based

  26. Content • Natural language understanding • Production rules • Semantic parsing • Natural logic • Conclusion

  27. Semantic Parsing natural language to logic • Semantic Parsing: Transforming natural language (NL) sentences into computer executablecomplete meaning representations (MRs) for domain-specific applications • Realistic semantic parsing currently entails domain dependence • Example application domains • ATIS: Air Travel Information Service • CLang: Robocup Coach Language • Geoquery: A Database Query Application

  28. Earlier Hand-Built Systems Which countries bordering the Mediterranean border Asian countries? Logical form: answer(C) <= country(C) & borders(C, mediterranean) & exists(C1, country(C1) & asian(C1) & borders(C, C1)) After query planning: answer(C) <= borders(C, mediterranean) & {country(C)} & {borders(C, C1) & {asian(C1) & {country(C1)}}} [Reads: Generate C bordering the mediterranean, then check that C is a country, and then check that it is possible to generate C1 bordering C, and then check that …]

  29. Learning Semantic Parsers Training Sentences & Meaning Representations Semantic Parser Learner Sentences Meaning Representations Semantic Parser

  30. Statistical Parsing • Find most likely meaning M0, given words W and history H • (M': pre-discourse meaning, T: parse tree) • Three successive stages: parsing, semantic interpretation, and discourse • Parsing model similar to Seneff (1992) • Requires annotated parse trees for training

  31. Statistical Parsing (Miller et al., 1996) /wh-question flight/np arrival/vp flight-constraints/rel-clause departure/vp flight/corenp departure/pp location/pp time/wh-head /aux /det flight/np-head /comp departure/vp-head departure/prep city/npr arrival/vp-head location/prep city/npr When do the flights that leave from Boston arrive in Atlanta

  32. Machine Translation • Translation from a natural-language source sentence to a formal-language target sentence • Papineni et al. (1997), Macherey et al. (2001) @destination @origin @train_determination @want_question @hello @yes ja tag ich eine nach von $CITY $CITY guten bräuchte Verbindung

  33. Semantic Parsing using CCG • Extend categories with semantic types • Functional application with semantics: Texas := NP : texas borders := (S \ NP) / NP : λx.λy.borders(y, x)

  34. Sample CCG Derivation Texas borders New Mexico NP texas (S \ NP) / NP λx.λy.borders(y, x) NP new_mexico > S \ NP λy.borders(y, new_mexico) < S borders(texas, new_mexico)

  35. Another Sample CCG Derivation Texas borders New Mexico NP texas (S \ NP) / NP λx.λy.borders(y, x) NP mexico > S \ NP λy.borders(y, mexico) < S borders(texas, mexico)

  36. Probabilistic CCG for Semantic Parsing • L (lexicon) = • w (feature weights)Features: • fi(x, d): Number of times lexical item i is used in derivation d • Log-linear model: Pw(d | x)  exp(w . f(x, d)) • Best derviation: d* = argmaxdw . f(x, d) • Consider all possible derivations d for the sentence x given the lexicon L Texas := NP : texas borders := (S \ NP) / NP : λx.λy.borders(y, x) Mexico :=NP : mexico New Mexico := NP : new_mexico

  37. Learning Probabilistic CCG Training Sentences & Logical Forms Lexical Generation Lexicon L Parameter Estimation Feature weights w Sentences Logical Forms CCG Parser

  38. Lexical Generation • Input: • Output lexicon: Texas borders New Mexico borders(texas, new_mexico) Texas := NP : texas borders := (S \ NP) / NP : λx.λy.borders(y, x) New Mexico := NP : new_mexico

  39. Input sentence: Texas borders New Mexico Output substrings: Texas borders New Mexico Texas borders borders New New Mexico Texas borders New … Input logical form: borders(texas, new_mexico) Output categories: NP : texas NP : new _mexico (S \ NP) / NP : λx.λy.borders(y, x) (S \ NP) / NP : λx.λy.borders(x, y) … Lexical Generation 

  40. Category Rules

  41. Parameter Estimation • Maximum conditional likelihood • Derivations d are not annotated, treated as hidden variables • Stochastic gradient ascent (LeCun et al., 1998) • Keep only those lexical items that occur in the highest scoring derivations of training set

  42. Content • Natural language understanding • Production rules • Semantic parsing • Natural logic • Conclusion

  43. Some Some no Natural logic natural language as logic P Every firm polled saw costs grow more than expected,even after adjusting for inflation. H Every big company in the poll reported cost increases. yes

  44. Introduction • A Theory of Natural Logic • The NatLog System • Experiments with FraCaS • Experiments with RTE • Conclusion 7 basic entailment relations Relations are defined for all semantic types: tiny⊏small, hover⊏fly, kick⊏strike,this morning⊏today, in Beijing⊏in China, everyone⊏someone, all⊏most⊏some

  45. Introduction • A Theory of Natural Logic • The NatLog System • Experiments with FraCaS • Experiments with RTE • Conclusion Entailment & semantic composition • Ordinarily, semantic composition preserves entailment relations: eat pork⊏eat meat, big bird | big fish • But many semantic functions behave differently:tango⊏dance  refuse to tango⊐refuse to danceFrench | German  not French _ not German • We categorize functions by how they project entailment • a generalization of monotonicity classes, implication signatures • e.g., not has projectivity {=:=, ⊏:⊐, ⊐:⊏, ^:^, |:_, _:|, #:#} • e.g., refuse has projectivity {=:=, ⊏:⊐, ⊐:⊏, ^:|, |:#, _:#, #:#}

  46. @ @ ⊐ ⊐ ⊐ @ @ ⊏ ⊏ @ @ Introduction • A Theory of Natural Logic • The NatLog System • Experiments with FraCaS • Experiments with RTE • Conclusion @ @ nobody nobody can can without without a shirt clothes enter enter Projecting entailment relations upward • If two compound expressions differ by a single atom, their entailment relation can be determined compositionally • Assume idealized semantic composition trees • Propagate entailment relation between atoms upward, according to projectivity class of each node on path to root

  47. Introduction • A Theory of Natural Logic • The NatLog System • Experiments with FraCaS • Experiments with RTE • Conclusion A (weak) inference procedure • Find sequence of edits connecting P and H • Insertions, deletions, substitutions, … • Determine lexical entailment relation for each edit • Substitutions: depends on meaning of substituends: cat | dog • Deletions: ⊏ by default: red socks⊏socks • But some deletions are special: not ill ^ ill, refuse to go | go • Insertions are symmetric to deletions: ⊐ by default • Project up to find entailment relation across each edit • Compose entailment relations across sequence of edits • à la Tarski’s relation algebra

  48. Introduction • A Theory of Natural Logic • The NatLog System • Experiments with FraCaS • Experiments with RTE • Conclusion The NatLog system NLI problem linguistic analysis 1 alignment 2 lexical entailment classification 3 entailment projection 4 entailment composition 5 prediction

  49. Content • Natural language understanding • Production rules • Semantic parsing • Natural logic • Conclusion

  50. Concluding Remarks • NLP/NLU is difficult and hot • Rule based NLP: advantages and disadvantages • Semantic parsing: from natural language to logic • Natural logic: natural language as logic • NLP/NLU vs ontology engineering • symbolic + statistical

More Related