1 / 58

Human Language Technology

Sentence Grammar. Human Language Technology. Introduction. This lecture has several themes: Crash course in sentence-level grammar Jurafsky and Martin 2nd ed. Chapter 12 Internet Grammar of English http://www.ucl.ac.uk/internet-grammar/

vbonilla
Télécharger la présentation

Human Language Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sentence Grammar Human Language Technology HLT - Sentence Grammar

  2. HLT - Sentence Grammar Introduction • This lecture has several themes: • Crash course in sentence-level grammar • Jurafsky and Martin 2nd ed. Chapter 12 • Internet Grammar of Englishhttp://www.ucl.ac.uk/internet-grammar/ • Show how different linguistic phenomena can be captured by grammar rules. • Dependency Parsing • Tagsets and Treebanks

  3. Grammar of English Part 1 HLT - Sentence Grammar

  4. HLT - Sentence Grammar Different Kinds of Rule • Morphological rules.. govern how words may be composed: re+invest+ing = reinvesting. • Syntactic rules .. govern how words and constituents combine to form grammatical sentences. • Semantic rules .. govern how meanings may be combined.

  5. HLT - Sentence Grammar Syntax: Why? • You need knowledge of syntax in many applications: • Parsing • Grammar checkers • Question answering/database access • Information extraction • Generation • Translation • Full versus superficial analysis?

  6. HLT - Sentence Grammar Levels of Grammar Organisation • Word Classes: different parts of speech (POS). • Phrase Classes: sequences of words inheriting the characteristics of certain word classes. • Clause Classes: sequences of phrases containing at least one verb phrase. On the basis of these one may define: • Grammatical Relations: role played by constitutents e.g. subject; predicate; object • Syntax-Semantics interface: mapping between syntactic structures and meaning

  7. HLT - Sentence Grammar Word Classes • Closed classes. • determiners : the, a, an, four. • pronouns : it, he etc. • prepositions : by, on, with . • conjunctions : and, or, but. • Open classes. • nouns refer to objects or concepts: cat , beauty , Coke. • adjectives describe or qualify nouns: fried chickens. • verbs describe what the noun does: John jumps. • adverbs describe how it is done: John runs quickly.

  8. HLT - Sentence Grammar Word Class Characteristics • Different word classes have characteristic subclasses and properties

  9. HLT - Sentence Grammar Phrases • Longer phrases may be used rather than a single word, but fulfilling the same role in a sentence. • Noun phrases refer to objects: four fried chickens. • Verb phrases state what the noun phrase does: kicks the dog. • Adjective phrases describe/qualify an object: sickly sweet. • Adverbial phrases describe how actions are done:very carefully. • prepositional phrases: add information to a verb phrase: on the table

  10. HLT - Sentence Grammar Phrases can be Complexe.g. Noun Phrases • Proper Name or Pronoun: Monday; it • Specifier, noun: the day • Specifiers, premodifier, noun:the first wet day • Specifiers, premodifier, noun, postmodifier:The first wet day that I enjoyed in June

  11. HLT - Sentence Grammar was sunny. But they all fit the same context • Monday • It • The day • The first wet day • The first wet day that I enjoyed in June

  12. HLT - Sentence Grammar Clauses • A clause is a combination of noun phrases and verb phrases • Clauses can exist at the top level (main clause) or can be embedded (subordinate clause) • Top level clause is a sentence. E.g.The catate the mouse. • Embedded clause is subordinate e.g.John said that Sandyis sick. • Unlike phrases, whole sentences can be used to say something complete, e.g. to state a fact or ask a question.

  13. HLT - Sentence Grammar Different Kinds of Sentences • Assertion: John ate the cat. • Yes/No question: Did John eat the cat? • Wh- question: What did John eat? • Command: Eat the cat John! • NB. All these forms share the same underlying semantic proposition.

  14. Context Free Grammar Rules Part II HLT - Sentence Grammar

  15. HLT - Sentence Grammar Formal Grammar • A formal grammar consists of • Terminal Symbols (T) • Non Terminal Symbols (NT, disjoint from TS) • Start Symbol (a distinguished NT) • Rewrite rules of the form , where  and  are strings of symbols

  16. Classes of Grammar HLT - Sentence Grammar

  17. HLT - Sentence Grammar Classes of Grammar • Learnability • Different classes of grammar result from various restrictions on the form of rules

  18. HLT - Sentence Grammar Restrictions on Rules • For all rules  • Type 0 (unrestricted): no restrictions • Type 1 (context sensitive): |||| • Type 2 (context free): •  is a single NT symbol • Type 3 (regular) • Every rule is of the form A  aB or A  a where A,B NT and aT

  19. HLT - Sentence Grammar Which Class for NLP? • Type 3 (Regular). Good for morphology. Cannot handle central embedding of sentences.The man that John saw eating died. • Type 2(Context Free). OK but problems handling certain phenomena e.g. agreement. • Type 1 (Context Sensitive). Computational properties not well understood. Too powerful. • Type 0 (Turing). Too powerful.

  20. HLT - Sentence Grammar Weak versus Strong • Grammar class that is too restrictive • cannot characterise/discriminate exactly NL sentence structures. • Grammar class that is too general • has the power to characterise/discriminate structures that don't exist in human languages. • More general, higher complexity→ less efficient computations.

  21. HLT - Sentence Grammar Example Grammar • Cabinet discusses police chief’s case • French gunman kills four • s  np vp • np  n • np  adj n • np  n np • vp  v np

  22. HLT - Sentence Grammar Classifying the Symbols • NT – symbols appearing on the left • Start – symbol appearing only on the left from which every other symbol can be derived. • T – symbols appearing only on the right • To include words we also need special rulessuch as n [police]n [gunman]n [four] • Latter rules define the lexicon or “dictionary interface”.

  23. HLT - Sentence Grammar Grammar InducesPhrase Structure s vp np np adj n v n French gunman kills four

  24. HLT - Sentence Grammar Phrase Structure • PS includes information about • precedence between constituents • dominance between constituents • PS constitutes a trace of the rule applications used to derive a sentence • PS does not tell you the order in which the rules were used

  25. HLT - Sentence Grammar Procedural versus Declarative • A grammar induces a structure but does not tell you how to discover that structure • A grammar is declarative • A parser is a procedure that, given a suitable representation of a grammar and a sentence, actually discovers the structure(s). • A parser is procedural

  26. HLT - Sentence Grammar Handling Linguistic Phenomena • Different sentence-types • Nested structures • Agreement • Multiwords • Subcategories of verb

  27. HLT - Sentence Grammar Different Sentence Types........Different Grammar Rules • DeclarativesJohn left.S → NP VP • ImperativesLeave!S →VP • Yes-No QuestionsDid John leave?S →Aux NP VP • WH QuestionsWhen did John leave?S →Wh-word Aux NP VP

  28. HLT - Sentence Grammar Recursively NestedStructures handled by .... • Flights to Miami • Flights to Miami from Boston • Flights to Miami from Boston in April • Flights to Miami from Boston in April on Friday • Flights to Miami from Boston in April on Friday under $300. • Flights to Miami from Boston in April on Friday under $300 with lunch.

  29. HLT - Sentence Grammar Recursive Rules • NP → N • NP → NP PP • PP → Preposition NP • Flightsfrom miami to boston

  30. HLT - Sentence Grammar Ambiguity • np  np pp • pp  prep np • (the man) (on the hill with a telescope by the sea) • (the man on the hill) (with a telescope by the sea) • (the man on the hill with a telescope)( by the sea) • etc.

  31. HLT - Sentence Grammar Handling Agreement • NP → Determiner N • Include these days, this day • Exclude this days, these dayNP → NPSingNP → NPPlurNPPlur → DetSing NSingNPPlur → DetPlur NPlur • Agreement also includes number, gender, case. • Danger: proliferation of categories/rules.

  32. HLT - Sentence Grammar Handling Multiwords • John ran up the stairs • John rang up the doctor • John ran the stairs up* • John rang the doctor up • John rang the doctor who lives in Paris up

  33. HLT - Sentence Grammar Ordinary CF rules don’t work • John rangup the doctor • VP → V NP • here V is multiword • John rang the doctor up • VP → V NP particle_from _V • here, multiword has split into two parts • challenge is to express the relation between the parts

  34. HLT - Sentence Grammar Subcategorisation • Intransitive verb: no objectJohn disappearedJohn disappeared the cat* • Transitive verb: one objectJohn opened the windowJohn opened* • Ditransitive verb: two objectsJohn gave Mary the bookJohn gave Mary*

  35. HLT - Sentence Grammar Subcategorisation Rules • Intransitive verb: no objectVP → V • Transitive verb: one objectVP → V NP • Ditransitive verb: two objectsVP → V NP NP • If you take account of the category of items following the verb, there are about 40 different patterns like this in English.

  36. HLT - Sentence Grammar Overgeneration • A grammar should generate only sentences in the language. • It should exclude sentences not in the language. s  n vp vp  v n  [John] v  [snore] v  [snores]

  37. HLT - Sentence Grammar Undergeneration • A grammar should generate all sentences in the language. • There should not be sentences in the language that are not generated by the grammar. s  n vp vp  v n  [John] n  [gold] v  [found]

  38. HLT - Sentence Grammar s vp n v d a n John ate a juicy hamburger Appropriate Stuctures • A grammar should assign linguistically plausible structures. s vs. np vp np n v d a n John ate a juicy hamburger

  39. HLT - Sentence Grammar Criteria for Evaluating Grammars • Does it undergenerate? • Does it overgenerate? • Does it assign appropriate structures to sentences it generates? • Is it simple to understand? How many rules are there? • Does it contain generalisations or is it just a collection of special cases? • How ambiguous is it?

  40. HLT - Sentence Grammar Tagsets • The main parts of speech reflect naturally occurring occurrence data. • Practical applications often make use of special tags which include additional information such as number and case. • One of the most commonly used tagsets is the 45-tag Penn Treebank tagset, used for the Brown corpus.

  41. HLT - Sentence Grammar Penn Treebank Tagset

  42. HLT - Sentence Grammar POS Tagging The grand jury commented on a number of other topics POS Tagger The/DT grand/JJ jury/NN commmented/VBD on/IN a/DT number/NN of/IN other/JJ topics/NNS ./.

  43. HLT - Sentence Grammar Treebanks • Treebanks are corpora in which each sentence has been paired with a parse tree (presumably the right one). • These are generally created • By first parsing the collection with an automatic parser • And then having human annotators correct each parse as necessary. • This requires detailed annotation guidelines that provide a • POS tagset, • a grammar and • instructions for how to deal with particular grammatical constructions.

  44. HLT - Sentence Grammar Penn Treebank • Penn Treebank is a widely used treebank maintained by the Linguistic Data Consortium. • The Penn Treebank Project annotates naturally-occurring text for linguistic structure. • Contains skeletal parses showing rough syntactic and semantic information -- a bank of linguistic trees. • Most well known is the Wall Street Journal section containing 1M words from the 1987-1989 Wall Street Journal.

  45. HLT - Sentence Grammar Penn Treebank Example

  46. HLT - Sentence Grammar Treebank Grammars • Treebanks implicitly define a grammar for the language covered in the treebank. • Simply take the local rules that make up the sub-trees in all the trees in the collection and you have a grammar. • Not complete, but if you have decent size corpus, you’ll have a grammar with decent coverage.

  47. HLT - Sentence Grammar Treebank Grammars • Such grammars tend to be very flat due to the fact that they tend to avoid recursion. • For example, the Penn Treebank has 4500 different rules for VPs. Among them...

  48. HLT - Sentence Grammar Heads in Trees • Finding heads in treebank trees is a task that arises frequently in many applications. • Particularly important in statistical parsing • We can visualize this task by annotating the nodes of a parse tree with the heads of each corresponding node.

  49. HLT - Sentence Grammar Lexically Decorated Tree

  50. HLT - Sentence Grammar Head Finding • The standard way to do head finding is to use a simple set of tree traversal rules specific to each non-terminal in the grammar.

More Related