1 / 46

CSA3202: HLT

CSA3202: HLT. Sentence Grammar. Introduction. This lecture has two aims: Crash course in sentence-level grammar Show how different linguistic phenomena can be captured by grammar rules. See also Jurafsky and Martin Chapter 12

rashford
Télécharger la présentation

CSA3202: HLT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSA3202: HLT Sentence Grammar Sentence Grammar

  2. Introduction • This lecture has two aims: • Crash course in sentence-level grammar • Show how different linguistic phenomena can be captured by grammar rules. • See also • Jurafsky and Martin Chapter 12 • Internet Grammar of Englishhttp://www.ucl.ac.uk/internet-grammar/ Sentence Grammar

  3. Part 1 English Grammar Sentence Grammar

  4. Different Kinds of Linguistic Rules • Morphological rules.. govern how words may be composed: re+invest+ing = reinvesting. • Grammatical (syntax) rules .. govern how words and constituents combine to form grammatical sentences. • Semantic rules .. govern how meanings may be combined. Sentence Grammar

  5. Syntax: What? • Part of speech info • Syntactic structure • Grammatical relations, e.g. • X is the subject • Y is the object • Z is the main verb of sentence S Sentence Grammar

  6. Syntax: Why? • The grammatical relations provide strong clues to the computation of meaning. • For example, in a simple active sentence • The verb corresponds to a logical predicate • The subject to the first argument • The object to the second argument

  7. Grammatical Relationsand Logic Representation object Johnkicked Fido main verb subject kicked(John, Fido)

  8. Ideal Language - Logic Mapping sentence subject verb comp1 comp2 ,..., compN logic predicate(arg0, arg1,arg2,..., argN)

  9. Syntax: Why? • You need knowledge of syntax in many applications: • Full versus superficial analysis? Sentence Grammar

  10. Syntax: Why? • Knowledge of syntax important e.g. for • Spellchecking: POS helps detect errorcome over hear • Computing meaning for QAhow many pretty girls and boys in the room? • Information extractiona record breaking win for Arsenal in 2009 • TranslationI miss you/tu me manques (you miss me) • What depth of syntactic analysis is needed? Sentence Grammar

  11. Depth of Analysis • Full • Syntax tree of entire sentence • Typically deep trees • More appropriate for fine-grained QA • Superficial • Good for spotting NP chunks or Verb clusters • Relations e.g. coreference (Lawrence Gonzi, Larry) • Shallow trees • More appropriate for text classification Sentence Grammar

  12. Levels of Grammar Organisation • There are roughly three levels • Word Classes: different parts of speech (POS). • Phrase Classes: sequences of words inheriting the characteristics of certain word classes. • Clause Classes: sequences of phrases containing at least one verb phrase. • On the basis of these one may define: • Grammatical Relations: role played by constitutents e.g. subject; verb; object • Syntax-Semantics interface Sentence Grammar

  13. Word Classes • Closed classes. • determiners : the, a, an, four. • pronouns : it, he etc. • prepositions : by, on, with . • conjunctions : and, or, but. • Open classes. • nouns refer to objects or concepts: cat , beauty, Coke. • adjectives describe or qualify nouns: fried chickens. • verbs describe what the noun does: John jumps. • adverbs describe how it is done: John runs quickly. Sentence Grammar

  14. Word Class Characteristics • Different word classes have characteristic subclasses and properties Sentence Grammar

  15. Phrases • Longer phrases may be used rather than a single word, but fulfilling the same role in a sentence. • Noun phrases refer to objects: four fried chickens. • Verb phrases state what the noun phrase does: kicks the dog. • Adjective phrases describe/qualify an object: sickly sweet. • Adverbial phrases describe how it is done:very carefully. • Prepositional phrases: add information to a verb phrase: on the table Sentence Grammar

  16. Phrases can become Complexe.g. Noun Phrases • Proper Name or Pronoun: Monday; it • Specifiers, noun: the day • Specifier, premodifier, noun:the first wet day • Specifiers, qualifiers, noun, postmodifier:The first wet day that I enjoyed in June Sentence Grammar

  17. Monday It The day The first wet day The first wet day that I enjoyed in June was sunny. But they all fit the same context Sentence Grammar

  18. Clauses • A clause is a combination of noun phrases and verb phrases • Clauses can exist at the top level (main clause) or can be embedded (subordinate clause) • Top level clause is a sentence. E.g.The cat ate the mouse. • Embedded clause is subordinate e.g.John said that Sandy is sick. • Unlike phrases, whole sentences can be used to say something complete, e.g. to state a fact or ask a question. Sentence Grammar

  19. Different Kinds of Sentence • Assertion: John ate the cat. • Yes/No question: Did John eat the cat? • Wh- question: What did John eat? • Command: Eat the cat John! Sentence Grammar

  20. Part II Context Free Grammar Rules Sentence Grammar

  21. Formal Grammar • A formal grammar consists of • Terminal Symbols (T) • Non Terminal Symbols (NT, disjoint from TS) • Start Symbol (a distinguished NT) • Rewrite rules of the form , where  and  are strings of symbols • Different classes of grammar result from various restrictions on the form of rules Sentence Grammar

  22. Classes of Grammar of grammar type increasing strength Sentence Grammar

  23. English is not a Regular Language • Central embedding: • The bird sang • The bird the cat ate sang • the bird the cat the dog chased ate sang • the bird the cat the dog the man walked chased ate sang. • These sentences are of the form anbn • This language cannot be regular

  24. Restrictions on Rules For all rules  • Type 0 (unrestricted): no restrictions • Type 1 (context sensitive): |||| • Type 2 (context free): •  is a single NT symbol • Type 3 (regular) • Every rule is of the form A  aB or A  a where A,B NT and aT Sentence Grammar

  25. Example Grammar Cabinet discusses police chief’s case French gunman kills four s  np vp np  n np  adjn np  nnp vp  vnp Sentence Grammar

  26. Types of Grammar Symbol • NT – symbols appearing on the left • S  NT– symbol appearing only on the left from which every other symbol can be derived. • T – symbols appearing only on the right • To include words we also need a lexicon - special rule that rewrite T symbols e,g.n  [police]n  [gunman] Sentence Grammar

  27. Grammar inducesPhrase Structure s vp np np adj n v n French gunman kills four Sentence Grammar

  28. Phrase Structure • PS includes information about • Hierarchy (vertical) • Ordering (horizontal) • PS constitutes a trace of the rule applications used to derive a sentence but • NB. PS does not tell you the order in which the rules were actually used Sentence Grammar

  29. Handling Sentence Types with Rules • DeclarativesJohn left.S → NP VP • ImperativesLeave!S →VP • Yes-No QuestionsDid John leave?S →Aux NP VP • WH QuestionsWhen did John leave?S →Wh-word Aux NP VP Sentence Grammar

  30. Handling Recursive Structures • Flights to Miami • Flights to Miami from Boston • Flights to Miami from Boston in April • Flights to Miami from Boston in April on Friday • Flights to Miami from Boston in April on Friday under $300. • Flights to Miami from Boston in April on Friday under $300 with lunch. Sentence Grammar

  31. Recursive Rules NP → NP PP PP → Preposition NP NP → n Sentence Grammar

  32. Subcategorisation: verb types • Intransitive verb: no objectJohn disappearedJohn disappeared the cat* • Transitive verb: one objectJohn opened the windowJohn opened* • Ditransitive verb: two objectsJohn gave Mary the bookJohn promised Mary that he would comeJohn gave Mary* Sentence Grammar

  33. Subcategorisation Rules Intransitive verb: no objectVP → V Transitive verb: one objectVP → V NP Ditransitive verb: two objectsVP → V NP NP If you take account of the category of items following the verb, you end up with about 40 VP rules. November 2011 Sentence Grammar 33

  34. Subcategorisation Rules • Intransitive verb: no objectVP → V • Transitive verb: one objectVP → V X • Ditransitive verb: two objectsVP → V X Y • X,Y  {NP, PP, S} Sentence Grammar

  35. Which Language Class for NLP? • Type 3 (Regular). Good for morphology. Cannot handle central embedding of sentences.The man that John saw eating died. • Type 2(Context Free). OK but problems handling certain phenomena e.g. agreement, subcategorisation. • Type 1 (Context Sensitive). Computational properties not well understood. Too powerful. • Type 0 (Turing). Too powerful. Sentence Grammar

  36. Handling Agreement • NP → Determiner N • Include these days, this day • Exclude this days, these dayNP → NPSingNP → NPPlurNPPlur → DetSing NSingNPPlur → DetPlur NPlur • Agreement also includes number, gender, case. • Problem: proliferation of categories/rules. Sentence Grammar

  37. Generative Power of a Grammar L G G L overgeneration all but not only undergeneration only but not all L G all and only CLINT - Lecture 1

  38. Robustess versus Overgeneration In the design of a language processor it isnecessary to keep in mind two incompatible features maintain robustness avoidovergeneration Robustness is crucial for analysis andunderstanding: the system must sometimes be lenient wrt possible input. For analysis, robustness can be achieved by having an overgenerating (i.e. underspecified) grammar. But this comes at the price of more ambiguity For generation, overgeneration means many spurious sentences will be generated November 2011 Sentence Grammar 38

  39. Undergeneration • A grammar should generate all sentences in the language. There should not be sentences in the language that are not generated by the grammar. s  n vp vp  v n  [John] n  [gold] v  [found] Sentence Grammar

  40. Precision versus Undergeneration Analysis It is possible to achieve high precision by limiting the class of acceptable input to manageable cases. But there is a risk that some input will be ruled out Generation November 2011 Sentence Grammar 40

  41. s vp n v d a n John ate a juicy hamburger Appropriate Stuctures • A grammar should assign linguistically plausible structures. s vs. np vp np n v d a n John ate a juicy hamburger Sentence Grammar

  42. Ambiguity np  np pp pp  prep np (the man) (on the hill with a telescope by the sea) (the man on the hill) (with a telescope by the sea) (the man on the hill with a telescope)( by the sea) etc. Sentence Grammar

  43. Criteria for Evaluating Grammars • Does it undergenerate? Does this matter? • Does it overgenerate? Does this matter? • Does it assign appropriate/useful structures to sentences it generates? • Is it simple to understand? How many rules are there? • Does it contain generalisations or is it just a collection of special cases? • How ambiguous is it? How is ambiguity dealt with? Sentence Grammar

  44. Summary Regular grammars insufficien for sentence structure. CFG is good for handling constituent structure and embedding but there are shortcomings in handling certain phenomena: agreement subcategorisation coordination Criteria for evaluating grammars are complex November 2011 Sentence Grammar 44

  45. Summary • Context free grammar is good for modelling constituent structure for • Different sentence types (declarative, imperative, question etc) • Different phrase types • Verb subcategorisation • But agreement (subject/verb) presents certain problems which can only be resolved within CFG by inventing more rules. Sentence Grammar

More Related