320 likes | 471 Vues
This document provides a comprehensive overview of Combinatory Categorial Grammar (CCG) as a formalism for natural language processing. It explores the limitations of context-free grammar and introduces CCG as a "mildly context-sensitive" framework better equipped to handle linguistic dependencies. Key concepts discussed include the principle of compositionality, the various types of categories, functional application, functional composition, and type raising within CCG. By highlighting its advantages for syntactic and semantic analysis, the paper advocates the utility of CCG for advanced language parsing applications.
E N D
CS 460natural language processing Combinatory Categorial Grammar HemantNoval (08005017) SaurabhGoyal (08005016) MuditMalpani (08005020) PalakDalal (08005034) Guided by : Prof. Pushpak Bhattacharya
contents • Introduction • Motivation • Categorial Grammar • Combinatory Categorial Grammar • Three parts of formalism of CCG • Subtypes • Semantics • CCG and Parsing Algorithm • Conclusion
Introduction • Limitations of Context-free grammar • “Peter is from England and Paul from Sweden” • Knowledge about context needed due to missing verb. • Crossing dependencies cannot be resolved. • “John kicks skillfully the ball”
Motivation (1/2) • Models based on lexical dependencies • The dependencies are typically derived from a context-free phrase structure tree • Does not work well for the long-range dependencies “Ram ka yeh baar baar Shyaam ke ghar jaana mujhe pasand nahi” • CCG • “mildly context-sensitive” formalism • Provides the most linguistically satisfactory account of the dependencies • Is to facilitate recovery of unbounded dependencies Ref - nlp.korea.ac.kr/~hjchung/sprg/summary/021104.ppt
Motivation (2/2) • Principle of Compositionality – Meaning of a complex expression is determined by meaning of constituent expressions and rules used to combine them • CCG has close relation to (compositional) semantics - syntactic constituents combine as functions or according to a function-argument relationship. • Cross-linguistic generalizations can be made easily since the same set of rules always apply • Arguably psychologically plausible (since processing can proceed in a left-to-right fashion)
Categorial Grammar (1/2) • Categorial Grammar (CG) involves syntactic objects with well defined syntactic types or categories and rules for combining them. • The rules of grammar are entirely conditioned on lexical categories. • There are lots of categories and only a small set of applicable rules.
Categorial Grammar (2/2) • Categories • Primitive Categories : N, NP, S etc. • Man – N • The old man - NP • Functions : Combination of primitive categories, more specifically a function from one category to another. • S/NP • NP/N • (NP\S)/NP
Function Types A simple categorial grammar may have just two function types - • B/A - type of a phrase that results in a phrase of type B when followed (on the right) by a phrase of type A. • A\B - type of a phrase that results in a phrase of type B when preceded (on the left) by a phrase of type A.
Categorial Grammar • English grammar might have three basic types (N, NP and S). Other types can be derived - • Adjective – N/N • Determiner – NP/N • Intransitive verbs – NP\S • Transitive verbs - (NP\S)/NP The bad boy made that mess NP/N N/N N (NP\S)/NP NP/N N
Combinatory Categorial Grammar • Combinatory categorial grammar (CCG) is an efficiently parseable, yet linguistically expressive grammar formalism. • CCG is mildly context sensitive. • Basic categorial grammar uses just forward and backward application combinators. • CCG also includes functional composition and type-raising combinators. • CCG provides incremental derivations (left to right) to the language.
Definition of CCG • A CCG G = (VT , VN, f, S,R) is defined as follows: • – VT defines the finite set of all terminals. • – VN defines the finite set of all nonterminals. These nonterminals are also called “atomic categories” which can be combined into more complex functional categories by using the backward operator \ or the forward operator /. • – The function f maps terminals to sets of categories and corresponds to the first step in bottom-up parsing. • – The unique starting symbol is denoted by S • – R describes a finite set of combinatory rules
Functional Application • The two basic rules used in Pure Categorial Grammar (AB Calculus) • Forward Application: (>) X/Y Y => X • Backward Application: (<) Y X\Y => X
Functional Application (Example) • Brazil defeated Germany np (s\np)/np np ------------------------------ > s\np ----------------------------------------------- < s • The dog bit John np/n n (s\np)/np np ------------------ > ---------------------- > np s\np ------------------------------------- < s
Functional Composition • Two functional types can compose if the domain of one type corresponds to the range of the other. • Forward Composition: (>B) X/Y Y/Z =>B X/Z • Backward Composition: (<B) Y\Z X\Y =>B X\Z
Functional Composition(Example) • Ram likes football s/(s\np) (s\np)/np np ----------------------------- >B s/np ------------------------------- > s
Type Raising • Type raising combinators take elementary syntactic types (primitive types) to functor types. • Forward Type-Raising: (>T) X =>TT/(T\X) • Backward Type-Raising: (<T) X =>TT\(T/X)
Type Raising (Example) • Ram likes football np (s\np)/np np --------- >T s/(s\np) ----------------------------- >B s/np ------------------------------- > s
Modifications • Above rules are order preserving • In languages certain words can be permutes without changing the meaning of sentence. • E.g. Kahn blocked skillfully a powerful shot by Rivaldo instead of skillfully blocked. • Extra rules needed to parse such sentences.
Crossed Compostion • Forward Crossed Composition: (>Bx) X/Y Y\Z =>B X\Z • Forward crossed composition is generally considered to be inactive in the grammar of English because it can induce some highly ungrammatical scrambled orders • Backward Crossed Composition: (<Bx) Y/Z X\Y =>B X/Z
Substitution • Allows a single resource to be utilized by two different functors. • Forward Substitution: (>S) (X/Y)/Z Y/Z =>SX/Z • Backward Substitution: (<S) Y\Z (X\Y)\Z =>SX\Z
Example • team that I persuaded everyone to support n (n\n)/(s/np) np ((s\np)/(s\np))/np np/np (s\np)/np ------ >T ------------------------------------------- >B s/(s\np) ((s\np)/(s\np))/np ------------------------------------------------------------ >S (s\np)/np ----------------------------------------------------------- >B s/np --------------------------------------------- > n\n ------------------------------ < n
Subtypes (1/2) In the simple terms described earlier John run or many coffee are correct. • Introduce type hierarchies • For example , John is not only a NP it is also singular. • So we introduce NPsg as a subtype of NP. • Anything that requires a NP will accept NPsg as well. • But there can be specific requirement of NPsg where just NP will not fit. http://www.wellnowwhat.net/blog/?p=294
Subtypes (2/2) John runNPsg S\NPpl ––––––––––––– Cannot Apply John runsNPsg S\NPsg –––––––––––––< many coffeeNPpl/NplNmass –––––––––––––––– Cannot Apply http://www.wellnowwhat.net/blog/?p=294 much coffeeNPmass/NmassNmass ––––––––––––––––––––>NPmass
Semantics (1/2) • Most common way to represent semantics is through predicate calculus and lambdas • For example : • Each word has a semantic content. • The proper noun John has the content John’ (the ‘ distinguishing the semantic individual from the word represented with the same orthography). • Verb run would have the content λx.run’ x. • Rules are : http://www.wellnowwhat.net/blog/?p=294 Y X\Y y λv.p(v) –––––––––––––< X p(y) X/Y Y λv.p(v) y ––––––––––––> X p(y)
Semantics (1/2) John runs NP S\NP John’ λx.run’(x) ––––––––––––––––––> S run’(John’) John saw Frank NP (S\NP)/NP NP John’ λy.(λx.see’(x, y)) Frank’ –––––––––––––––––––––––––––> S\NPλx.see’(x, Frank’) ––––––––––––––––––––––––––––––< S see’(John’, Frank’) http://www.wellnowwhat.net/blog/?p=294
CCG and parsing algorithms • Normal CYK Algorithm (discussed in class) • It is exhaustive • It explores all possible analyses of all possible spans, irrespective of whether such analyses are likely to be part of the highest probability derivation. • Two methods Adaptive supertagging A* parsing
Adaptive supertagging • Treats the assignment of lexical categories (or supertags) as a sequence tagging problem. • Lexical categories are pruned to contain only those with high posterior probability. • It is the extensive pruning of lexical categories that leads to substantially faster parsing times • Relaxing the pruning threshold for lexical categories whenever the parser fails to find an analysis. • The process either succeeds and returns a parse after some iteration or gives up after a predefined number of iterations
A* parsing • A* search is an agenda-based best-first graph search algorithm • Finds the lowest cost parse exactly without necessarily traversing the entire search space • Items are processed from a priority queue, which orders them by the product of their inside probability and a heuristic estimate of their outside probability. • If heuristic is admissible then solution is guaranteed to be exact. Klein and Manning, 2003
CONCLUSION (1/2) • Accurate, efficient wide-coverage parsing is possible with CCG • Mildly context sensitive. • Uses functors and function rules for parsing of sentences. The semantics is analyzed using lambda calculus/combinatory logic.
References (1/2) • A Brief History of Grammar – Categorial Grammar (CG) and Combinatory Categorial Grammar (CCG) July 24th, 2009 (http://www.wellnowwhat.net/blog/?p=294) • Wikipedia : Combinatory categorial grammar (http://en.wikipedia.org/wiki/Combinatory_categorial_grammar ) • Efficient CCG Parsing: A* versus Adaptive Supertagging – Michael Auli and Adam Lopez ACL 2011 • Generative Models for Statistical Parsing with Combinatory Categorial Grammar 2002. 10. 23 Joon-Ho Lim NLP Lab., Korea University
References (2/2) • Identifying Semantic Roles Using Combinatory Categorial Grammar - Daniel Gildea and Julia Hockenmaier, University of Pennsylvania • Building deep dependency structures with a wide-coverage CCG parser Stephen Clark - ACL2002 • Multi-Modal Combinatory Categorial Grammar Jason Baldridge, Geert-Jan M. Kruijff