1 / 36

Syntax

Syntax. Sudeshna Sarkar 25 Aug 2008. Some Fundamental Questions. What is Language? How to define a Language? What makes a language different from another? Is there anything common to all languages?. Syntax. Syntax: from Greek syntaxis, “setting out together, arrangmenet’

alika-hays
Télécharger la présentation

Syntax

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Syntax Sudeshna Sarkar 25 Aug 2008

  2. Some Fundamental Questions • What is Language? • How to define a Language? • What makes a language different from another? • Is there anything common to all languages?

  3. Syntax • Syntax: from Greek syntaxis, “setting out together, arrangmenet’ • Refers to the way words are arranged together, and the relationship between them. • Distinction: • Prescriptive grammar: how people ought to talk • Descriptive grammar: how they do talk • Goal of syntax is to model the knowledge of that people unconsciously have about the grammar of their native language

  4. Rationalists It’s all hardcoded in our brains Principle and Parameter Theory Poverty of Stimulus Recursion Empiricists Just a special kind of pattern recognition No different from other cognitive abilities like vision Language is a stochastic phenomenon The Two Schools

  5. The Generative Grammar “The grammatical principles underlying languages are innate and fixed, and the differences among the world's languages can be characterized in terms of parameter settings in the brain …” - www.wikipedia.org Noam Chomsky [1928-] Courtesy www.chomsky.info

  6. I & E Languages • I – Language: Mentally represented system of rules (I – internal) • E – Language: Observable external products of I-language (written text, utterances) • Language: Collective E-language of a very large group of speakers • Syntax: Study of the I-language from E-language

  7. The Chomsky Hierarchy Grammar Languages Automaton Production rules Type-0 Recursively enumerable Turing machine No restrictions Type-1 Context-sensitive Linear-bounded non-deterministic Turing machine αAβ → αγβ Type-2 Context-free Non-deterministic pushdown automaton A → γ Type-3 Regular Finite state automaton A → aBA → a

  8. From Formal to Natural Languages

  9. Some Observations on NLs • Constituency: A group of words acts as a single unit – phrases, clauses etc. • Grammatical Relations: Different words/ phrases are related to the main verb of the sentence – object, subject, instrument • Subcategorization and Dependency Relations: Not all verbs can take all type of arguments – transitive, intransitive etc.

  10. Syntax • Why should you care? • Grammar checkers • Question answering • Information extraction • Machine translation

  11. Why NLP is difficult:Newspaper headlines • Iraqi Head Seeks Arms • Juvenile Court to Try Shooting Defendant • Teacher Strikes Idle Kids • Stolen Painting Found by Tree • Local High School Dropouts Cut in Half • Red Tape Holds Up New Bridges • Clinton Wins on Budget, but More Lies Ahead • Hospitals Are Sued by 7 Foot Doctors • Kids Make Nutritious Snacks

  12. Why is NLU difficult? The hidden structure of language is hugely ambiguous • Tree for: Fed raises interest rates 0.5% in effort to control inflation (NYT headline 5/17/00)

  13. Where are the ambiguities?

  14. The bad effects of V/N ambiguities

  15. Context-Free Grammars • Capture constituency and ordering • Ordering is easy What are the rules that govern the ordering of words and bigger units in the language • What’s constituency? How words group into units and how the various kinds of units behave wrt one another

  16. Constituency We have NLP classes from 5:30 to 6:30 pm on Tuesday. On Tuesday we have NLP classes from 5:30 – 6:30 pm. From 5:30 to 6:30 pm on Tuesday we have NLP classes. We have NLP on Tuesday from 5:30 to 6:30 pm classes. On we have NLP classes from Tuesday 5:30 to 6:30 pm. From 5:30 we have to 6:30 pm on Tuesday NLP classes.

  17. Constituency We have NLP classesfrom 5:30 to 6:30 pmon Tuesday. On Tuesday we have NLP classesfrom 5:30 – 6:30 pm. From 5:30 to 6:30pmon Tuesday we have NLP classes. We haveNLPon Tuesdayfrom 5:30 to 6:30 pm classes. Onwe haveNLP classesfromTuesday5:30 to 6:30 pm. From 5:30we haveto 6:30 pmon TuesdayNLP classes.

  18. Phrases • Phrase: Group of words that act as a unit • Noun Phrase NP • A midsummer night’s dream, My experiments with truth, The man who knew infinity • Verb Phrase VP • Gone with the wind, Saving private Ryan • Prepositional Phrases PP • Of sons and lovers, to sir with love, Beyond the blue mountains, Into the heart of the mind

  19. Modelling the Syntax of English • Let us try CFGs • S  NP VP Ilove India. • S  VPLove your country. • S  Aux NP VPDoyoulove your country? • S  Wh-NP VPWholoves his country? • S  Wh-NP Aux NP VP Which countrydoyoulive in?

  20. Phrase Structure Grammar • Context Free Grammars are also called phrase structure grammars • Phrases are the building blocks of any PSG (i.e. CFG) • Phrases in turn are defined by CFG (PSG)

  21. Is CFG Necessary? • Can we model the syntax of English using Regular Grammar? • NO! we cannot model recursion in RG S  NP VP VP  Verb S I think that Einstein thought that Newton said …

  22. CFG Examples • S -> NP VP • NP -> Det NOMINAL • NOMINAL -> Noun • VP -> Verb • Det -> a • Noun -> flight • Verb -> left

  23. CFGs • S -> NP VP • This says that there are units called S, NP, and VP in this language • That an S consists of an NP followed immediately by a VP • Doesn’t say that that’s the only kind of S • Nor does it say that this is the only place that NPs and VPs occur

  24. Context Free Grammars • A CFG consists of a tuple (N,T,S,P) • N is a finite set of non-terminal symbols • T is a finite set of terminal symbols • S is the start symbol • P is a finite set of rules of the form X   where X  N and {N U T}*

  25. Phrase Structure Parsing • Phrase structure organizes words into phrases, often called constituents • This organization is hierarchical • For a given string there is often ambiguity about the correct phrase structure • This ambiguity often corresponds to semantic ambiguity

  26. Simple examples of a CFG • Take the non-terminals = {S, NP, VP, V} • And the terminals {boys, study, play, books, cricket) • Let the start symbol be S • Let the rule set be • S  NP VP • VP  V • VP  V NP • NP  boys • NP  books • NP  cricket • V study • V play This CFG licenses a finite number of tree sentences

  27. Generativity • As with FSAs and FSTs you can view these rules as either analysis or synthesis machines • Generate strings in the language • Reject strings not in the language • Impose structures (trees) on strings in the language

  28. Derivations • A derivation is a sequence of rules applied to a string that accounts for that string • Covers all the elements in the string • Covers only the elements in the string

  29. Derivations as Trees

  30. Two views of linguistic structure: 1. Constituency (phrase structure) • Phrase structure organizes words into nested constituents. • How do we know what is a constituent? (Not that linguists don't argue about some cases.) • Distribution: a constituent behaves as a unit that can appear in different places: • John talked [to the children] [about drugs]. • John talked [about drugs] [to the children]. • *John talked drugs to the children about • Substitution/expansion/pro-forms: • I sat [on the box/right on top of the box/there]. • Coordination, regular internal structure, no intrusion, fragments, semantics, …

  31. Two views of linguistic structure: 2. Dependency structure • Dependency structure shows which words depend on (modify or are arguments of) which other words. put boy on tortoise rug The the rug the The boy put the tortoise on the

  32. Parsing • Parsing is the process of taking a string and a grammar and returning a (many?) parse tree(s) for that string • It is completely analogous to running a finite-state transducer with a tape • It’s just more powerful • Remember this means that there are languages we can capture with CFGs that we can’t capture with finite-state methods

  33. Other Options • Regular languages (expressions) • Too weak • Context-sensitive or Turing equiv • Too powerful (maybe)

  34. Context? • The notion of context in CFGs has nothing to do with the ordinary meaning of the word context in language. • All it really means is that the non-terminal on the left-hand side of a rule is out there all by itself (free of context) A -> B C Means that • I can rewrite an A as a B followed by a C regardless of the context in which A is found • Or when I see a B followed by a C I can infer an A regardless of the surrounding context

  35. Key Constituents (English) • Sentences • Noun phrases • Verb phrases • Prepositional phrases

More Related