1 / 18

Introduction to Cognitive Science

Introduction to Cognitive Science. Topic: Formal Grammars: Generating and Parsing Lecturer: Dr Bodomo. Linguistics Component. Introduction.

jcaulfield
Télécharger la présentation

Introduction to Cognitive Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Cognitive Science • Topic:Formal Grammars: • Generating and Parsing • Lecturer:Dr Bodomo Linguistics Component

  2. Introduction • In my previous lectures, we discussedhow tacit linguistic knowledge can be represented at various levels of phonology, morphology, syntax, semantics, pragmatics, and their interfaces, including morphophonology, morphosyntax, and the syntax-semantics interrelationships. • In this lecture, we shall look closely at how these linguistic knowledge representations can be formalised into an algorithm, a computational procedure for processing this linguistic knowledge.

  3. Keywords • Constituent structure rules • initial symbol • terminal symbol • non-terminal symbol • generative grammar • formal grammar

  4. Formal devices and notation • The symbol ‘’ indicates that a node is ‘rewritten as…’ or ‘consists of’, or ‘has the constituents…’ • This is used in rewrite rules of the type: S  NP + VP • a sentence, S, has the constituents: noun phrase (NP) and verb phrase (VP) • Optionality in the grammar is expressed as {X, Y}. This means apply either X or Y but not both

  5. The symbol # is used to indicate constituent boundary e.g. # _ is word initial while_# is word final The notation X (Y) implies that X is obligatory and may be followed by Y Initial symbol: the symbol from which a rewrite rule begins (e.g. S) Terminal symbol: the end symbols from which no constituent structure can be further developed (N, V, Art). All others are non-terminal symbols (e.g. NP, VP). Formal devices and notation (cont.)

  6. Two main aspects of grammatical information processing:Generating and Parsing sentences • Before we begin let us illustrate with a simple grammar and lexicon, using the following sentence: • The students greeted the teacher .

  7. Grammar: S  NP +VP VP  V + NP NP  Art + N Lexicon 1: Greeted: V, - NP Students: N The: Art Teacher: N The students greeted the teacher. This grammar can also generate (i.e. produce) the following sentences: The teacher greeted the students The teacher scared the students The child ate an apple • But you have to augment i.e. increase the lexicon as follows: • Lexicon2: • An: Art Greeted, scared, ate: V, - NP • Apple: N Students: N • Child: N Teacher: N The: Art

  8. Sentence Generation:the algorithm • To produce a sentence we need three things: • A set of phrase structure rules (as illustrated above) • A lexicon (as illustrated above), and • A lexical insertion rule (as explained below) • A lexical insertion rule is an instruction to select the right word from a lexicon • The following is an example of a lexical rule:

  9. Lexical insertion rule • For each terminal symbol of a phrase structure rule, select a word from the lexicon that satisfies the following conditions: • It is a member of the class of terminal symbol (e.g. N, V) • its subcategorization frame matches that of the terminal symbol (e.g. V, _NP). Attach this word as the daughter of this terminal symbol. • The set of rules above constitutes what is known as a sentence generator.

  10. The whole procedure of beginning with an initial symbol and then working through phrase structure rules to adding the lexical items via lexical insertions rules is driven by an algorithm or a set of instructions. • Let us set out an algorithm for the generation (production) of the sentence: The students greeted the teacher, a grammar and a lexicon as follows:

  11. The students greeted the teacher Lexicon1: Greeted: V, - NP Students: N The: Art Teacher: N Grammar: PS Rule1: S  NP +VP PS Rule2: VP  V + NP PS Rule3: NP  Art + N • i. Start with the initial symbol, S. • ii. For every non-terminal symbol, X, find a phrase structure rule with X as left-hand symbol and others as the right hand symbol(s), and develop a rewrite rule with X as the mother and the right hand symbols as ordered daughters. • iii. Apply rule ii. until all branches end in terminal symbols. • iv. Apply lexical rule iteratively until every terminal symbol is replaced by a lexical item.

  12. S NP VP (PS Rule i) Art N V NP (PS Rule iii and ii) Art N (PS Rule iii) The teacher greeted the students (applying Rule iv) Illustrating the algorithm • Applying rule i: we get: S • Applying rule ii and iii. We get:

  13. From the above we can see that we have started from an initial string and have ended with terminal strings with lexical items as their daughters. A sentence has thus been generated (produced), telling us how this sentence is built up. • Now, let us see how we can begin with an existing sentence and then break it down into its component parts by applying rules.

  14. Sentence parsing: the algorithm • To parse a sentence means to analyse it into its constituent parts by the systematic application of lexical insertion rules and some phrase structure rules. It is like the reverse process of generation.

  15. Some sentence parsing rules which constitute a PARSER • For a sentence, S • i.Determine from the lexicon the word class of every item and develop a partial tree for each word where the word class label dominates the word. • ii.Find a PS rule of the type X  Y, Z and where the right hand symbols match some sequence of categories in the structure so far, and develop a partial tree with X as the mother and the right hand symbols as ordered daughters. • iii.Continue rule ii. until the root, S, is reached and there are no unattached strings.

  16. NP NP Art N V Art N The man drank the tea Art N V Art N The man drank the tea Applying Rule iii, we get: The man drank the tea. Lexicon1: drank: V, - NP man: N The: Art Tea: N Grammar: PS Rule1: S  NP +VP PS Rule2: VP  V + NP PS Rule3: NP  Art + N

  17. VP NP NP Art N V Art N The man drank the tea Applying Rule ii, we get: S NP VP NP Art N V Art N The man drank the tea Applying Rule i, we get:

  18. Conclusion • Parsing and generation of natural language data is a very important area of linguistics, especially in computer applications of natural languages which has become an important aspect of the computer or information processing industry. • In the next lecture, we shall be looking at the last topic of the linguistics segment i.e. how linguistic knowledge is acquired/learnt by speakers of a language, from the point of view of spoken language and from the point of literacy (reading and writing).

More Related